Utilises AWS Athena to convert AWS S3 backend file types. It also also to create more efficient file types i.e. "parquet" and "orc" from SQL queries.
dbConvertTable(conn, obj, name, ...)
# S4 method for AthenaConnection
dbConvertTable(
conn,
obj,
name,
partition = NULL,
s3.location = NULL,
file.type = c("NULL", "csv", "tsv", "parquet", "json", "orc"),
compress = TRUE,
data = TRUE,
...
)
An AthenaConnection
object, produced by [DBI::dbConnect()]
Athena table or SQL
DML query to be converted. For SQL
, the query need to be wrapped with DBI::SQL()
and
follow AWS Athena DML format link
Name of destination table
Extra parameters, currently not used
Partition Athena table
location to store output file, must be in s3 uri format for example ("s3://mybucket/data/").
File type for name
, currently support ["NULL","csv", "tsv", "parquet", "json", "orc"].
"NULL"
will let Athena set the file type for you.
Compress name
, currently can only compress ["parquet", "orc"] (AWS Athena CTAS)
If name
should be created with data or not.
dbConvertTable()
returns TRUE
but invisible.
if (FALSE) {
# Note:
# - Require AWS Account to run below example.
# - Different connection methods can be used please see `RAthena::dbConnect` documnentation
library(DBI)
library(RAthena)
# Demo connection to Athena using profile name
con <- dbConnect(athena())
# write iris table to Athena in defualt delimited format
dbWriteTable(con, "iris", iris)
# convert delimited table to parquet
dbConvertTable(con,
obj = "iris",
name = "iris_parquet",
file.type = "parquet"
)
# Create partitioned table from non-partitioned
# iris table using SQL DML query
dbConvertTable(con,
obj = SQL("select
iris.*,
date_format(current_date, '%Y%m%d') as time_stamp
from iris"),
name = "iris_orc_partitioned",
file.type = "orc",
partition = "time_stamp"
)
# disconnect from Athena
dbDisconnect(con)
}