vignettes/changing_backend_file_parser.Rmd
changing_backend_file_parser.Rmd
RAthena
is dependent on data.table
to read
data into R
. This is down to the amazing speed
data.table
offers when reading files into R
.
However a new package, with equally impressive read speeds, has come
onto the scene called vroom
. As
vroom
has been designed to only read data into
R
, similarly to readr
, data.table
is still used for all of the heavy lifting. However if a user wishes to
use vroom
as the file parser, RAthena_options
function has been created to enable this:
library(DBI)
library(RAthena)
con = dbConnect(athena())
RAthena_options(file_parser = c("data.table", "vroom"))
By setting the file_parser
to "vroom"
then
the backend will change to allow vroom
’s file parser to be
used instead of data.table
.
data.table
To go back to using data.table
as the file parser it is
a simple as calling the RAthena_options
function:
# return to using data.table as file parser
RAthena_options()
This makes it very flexible to swap between each file parser even between each query execution:
library(DBI)
library(RAthena)
con = dbConnect(athena())
# upload data
dbWriteTable(con, "iris", iris)
# use default data.table file parser
df1 = dbGetQuery(con, "select * from iris")
# use vroom as file parser
RAthena_options("vroom")
df2 = dbGetQuery(con, "select * from iris")
# return back to data.table file parser
RAthena_options()
df3 = dbGetQuery(con, "select * from iris")
vroom
?
If you aren’t sure whether to use vroom
over
data.table
, I draw your attention to vroom
boasting a whopping 1.40GB/sec throughput.
Statistics taken from vroom’s github readme
package | version | time (sec) | speed-up | throughput |
---|---|---|---|---|
vroom | 1.1.0 | 1.14 | 58.44 | 1.40 GB/sec |
data.table | 1.12.8 | 11.88 | 5.62 | 134.13 MB/sec |
readr | 1.3.1 | 29.02 | 2.30 | 54.92 MB/sec |
read.delim | 3.6.2 | 66.74 | 1.00 | 23.88 MB/sec |