Convenience functions for reading/writing DBMS tables
# S4 method for AthenaConnection,character,data.frame
dbWriteTable(
conn,
name,
value,
overwrite = FALSE,
append = FALSE,
row.names = NA,
field.types = NULL,
partition = NULL,
s3.location = NULL,
file.type = c("tsv", "csv", "parquet", "json"),
compress = FALSE,
max.batch = Inf,
...
)
# S4 method for AthenaConnection,Id,data.frame
dbWriteTable(
conn,
name,
value,
overwrite = FALSE,
append = FALSE,
row.names = NA,
field.types = NULL,
partition = NULL,
s3.location = NULL,
file.type = c("tsv", "csv", "parquet", "json"),
compress = FALSE,
max.batch = Inf,
...
)
# S4 method for AthenaConnection,SQL,data.frame
dbWriteTable(
conn,
name,
value,
overwrite = FALSE,
append = FALSE,
row.names = NA,
field.types = NULL,
partition = NULL,
s3.location = NULL,
file.type = c("tsv", "csv", "parquet", "json"),
compress = FALSE,
max.batch = Inf,
...
)
An AthenaConnection
object, produced by [DBI::dbConnect()]
A character string specifying a table name. Names will be automatically quoted so you can use any sequence of characters, not just any valid bare table name.
A data.frame to write to the database.
Allows overwriting the destination table. Cannot be TRUE
if append
is also TRUE
.
Allow appending to the destination table. Cannot be TRUE
if overwrite
is also TRUE
. Existing Athena DDL file type will be retained
and used when uploading data to AWS Athena. If parameter file.type
doesn't match AWS Athena DDL file type a warning message will be created
notifying user and RAthena
will use the file type for the Athena DDL. When appending to an Athena DDL that has been created outside of RAthena
.
RAthena
can support the following SerDes and Data Formats.
csv/tsv: LazySimpleSerDe
parquet: Parquet SerDe
json: JSON SerDe Libraries
Either TRUE
, FALSE
, NA
or a string.
If TRUE
, always translate row names to a column called "row_names".
If FALSE
, never translate row names. If NA
, translate
rownames only if they're a character vector.
A string is equivalent to TRUE
, but allows you to override the
default name.
For backward compatibility, NULL
is equivalent to FALSE
.
Additional field types used to override derived types.
Partition Athena table (needs to be a named list or vector) for example: c(var1 = "2019-20-13")
s3 bucket to store Athena table, must be set as a s3 uri for example ("s3://mybucket/data/").
By default, the s3.location is set to s3 staging directory from AthenaConnection
object. Note:
When creating a table for the first time s3.location
will be formatted from "s3://mybucket/data/"
to the following
syntax "s3://{mybucket/data}/{schema}/{table}/{parition}/"
this is to support tables with the same name but existing in different
schemas. If schema isn't specified in name
parameter then the schema from dbConnect
is used instead.
What file type to store data.frame on s3, RAthena currently supports ["tsv", "csv", "parquet", "json"]. Default delimited file type is "tsv", in previous versions
of RAthena (=< 1.6.0)
file type "csv" was used as default. The reason for the change is that columns containing Array/JSON
format cannot be written to
Athena due to the separating value ",". This would cause issues with AWS Athena.
Note: "parquet" format is supported by the arrow
package and it will need to be installed to utilise the "parquet" format.
"json" format is supported by jsonlite
package and it will need to be installed to utilise the "json" format.
FALSE | TRUE
To determine if to compress file.type. If file type is ["csv", "tsv"] then "gzip" compression is used, for file type "parquet"
"snappy" compression is used. Currently RAthena
doesn't support compression for "json" file type.
Split the data frame by max number of rows i.e. 100,000 so that multiple files can be uploaded into AWS S3. By default when compression
is set to TRUE
and file.type is "csv" or "tsv" max.batch will split data.frame into 20 batches. This is to help the
performance of AWS Athena when working with files compressed in "gzip" format. max.batch
will not split the data.frame
when loading file in parquet format. For more information please go to link
Other arguments used by individual methods.
dbWriteTable()
returns TRUE
, invisibly. If the table exists, and both append and overwrite
arguments are unset, or append = TRUE and the data frame with the new data has different column names,
an error is raised; the remote table remains unchanged.
if (FALSE) {
# Note:
# - Require AWS Account to run below example.
# - Different connection methods can be used please see `RAthena::dbConnect` documnentation
library(DBI)
# Demo connection to Athena using profile name
con <- dbConnect(RAthena::athena())
# List existing tables in Athena
dbListTables(con)
# Write data.frame to Athena table
dbWriteTable(con, "mtcars", mtcars,
partition=c("TIMESTAMP" = format(Sys.Date(), "%Y%m%d")),
s3.location = "s3://mybucket/data/")
# Read entire table from Athena
dbReadTable(con, "mtcars")
# List all tables in Athena after uploading new table to Athena
dbListTables(con)
# Checking if uploaded table exists in Athena
dbExistsTable(con, "mtcars")
# using default s3.location
dbWriteTable(con, "iris", iris)
# Read entire table from Athena
dbReadTable(con, "iris")
# List all tables in Athena after uploading new table to Athena
dbListTables(con)
# Checking if uploaded table exists in Athena
dbExistsTable(con, "iris")
# Disconnect from Athena
dbDisconnect(con)
}