It is never advised to hard-code credentials when making a connection to Athena (even though the option is there). Instead it is advised to use
profile_name
(set up by AWS Command Line Interface),
Amazon Resource Name roles or environmental variables. Here is a list
of supported environment variables:
AWS_ACCESS_KEY_ID: is equivalent to the dbConnect
parameter - aws_access_key_id
AWS_SECRET_ACCESS_KEY: is equivalent to the dbConnect
parameter - aws_secret_access_key
AWS_SESSION_TOKEN: is equivalent to the dbConnect
parameter - aws_session_token
AWS_EXPIRATION: is equivalent to the dbConnect
parameter - duration_seconds
AWS_ATHENA_S3_STAGING_DIR: is equivalent to the dbConnect
parameter - s3_staging_dir
AWS_ATHENA_WORK_GROUP: is equivalent to dbConnect
parameter - work_group
AWS_REGION: is equivalent to dbConnect
parameter - region_name
NOTE: If you have set any environmental variables in .Renviron
please restart your R in order for the changes to take affect.
# S4 method for AthenaDriver
dbConnect(
drv,
aws_access_key_id = NULL,
aws_secret_access_key = NULL,
aws_session_token = NULL,
schema_name = "default",
work_group = NULL,
poll_interval = NULL,
encryption_option = c("NULL", "SSE_S3", "SSE_KMS", "CSE_KMS"),
kms_key = NULL,
profile_name = NULL,
role_arn = NULL,
role_session_name = sprintf("RAthena-session-%s", as.integer(Sys.time())),
duration_seconds = 3600L,
s3_staging_dir = NULL,
region_name = NULL,
botocore_session = NULL,
bigint = c("integer64", "integer", "numeric", "character"),
binary = c("raw", "character"),
json = c("auto", "character"),
timezone = "UTC",
keyboard_interrupt = TRUE,
rstudio_conn_tab = TRUE,
endpoint_override = NULL,
...
)
an object that inherits from DBIDriver, or an existing DBIConnection object (in order to clone an existing connection).
AWS access key ID
AWS secret access key
AWS temporary session token
The schema_name to which the connection belongs
The name of the work group to run Athena queries , Currently defaulted to NULL
.
Amount of time took when checking query execution status. Default set to a random interval between 0.5 - 1 seconds.
Athena encryption at rest link. Supported Amazon S3 Encryption Options ["NULL", "SSE_S3", "SSE_KMS", "CSE_KMS"]. Connection will default to NULL, usually changing this option is not required.
AWS Key Management Service, please refer to link for more information around the concept.
The name of a profile to use. If not given, then the default profile is used. To set profile name, the AWS Command Line Interface (AWS CLI) will need to be configured. To configure AWS CLI please refer to: Configuring the AWS CLI.
The Amazon Resource Name (ARN) of the role to assume (such as arn:aws:sts::123456789012:assumed-role/role_name/role_session_name
)
An identifier for the assumed role session. By default `RAthena` creates a session name sprintf("RAthena-session-%s", as.integer(Sys.time()))
The duration, in seconds, of the role session. The value can range from 900 seconds (15 minutes) up to the maximum session duration setting for the role. This setting can have a value from 1 hour to 12 hours. By default duration is set to 3600 seconds (1 hour).
The location in Amazon S3 where your query results are stored, such as s3://path/to/query/bucket/
Default region when creating new connections. Please refer to link for
AWS region codes (region code example: Region = EU (Ireland) region_name = "eu-west-1"
)
Use this Botocore session instead of creating a new default one.
The R type that 64-bit integer types should be mapped to, default is [bit64::integer64], which allows the full range of 64 bit integers.
The R type that [binary/varbinary] types should be mapped to, default is [raw]. If the mapping fails R will resort to [character] type. To ignore data type conversion set to ["character"].
Attempt to converts AWS Athena data types [arrays, json] using jsonlite:parse_json
. If the mapping fails R will resort to [character] type.
Custom Json parsers can be provide by using a function with data frame parameter.
To ignore data type conversion set to ["character"].
Sets the timezone for the connection. The default is `UTC`. If `NULL` then no timezone is set, which defaults to the server's time zone. `AWS Athena` accepted time zones: https://docs.aws.amazon.com/athena/latest/ug/athena-supported-time-zones.html.
Stops AWS Athena process when R gets a keyboard interrupt, currently defaults to TRUE
Optional to get AWS Athena Schema from AWS Glue Catalogue and display it in RStudio's Connections Tab.
Default set to TRUE
. For large `AWS Glue Catalogue` it is recommended to set `rstudio_conn_tab=FALSE` to ensure a fast connection.
(character/list) The complete URL to use for the constructed client. Normally,
botocore
will automatically construct the appropriate URL to use when communicating with a
service. You can specify a complete URL (including the "http/https" scheme) to override this
behaviour. If endpoint_override
is a character then AWS Athena endpoint is overridden. To override
AWS S3 or AWS Glue endpoints a named list needs to be provided. The list can only have the following names ['athena', 's3', glue']
for example list(glue = "https://glue.eu-west-1.amazonaws.com")
Passes parameters to boto3.session.Session
and client
.
boto3.session.Session
botocore_session (botocore.session.Session): Use this Botocore session instead of creating a new default one.
client
config (botocore.client.Config) -- Advanced client configuration options. If region_name is specified in the client config, its value will take precedence over environment variables and configuration values, but not over a region_name value passed explicitly to the method. See botocore config documentation for more details.
api_version (string) -- The API version to use. By default, botocore will use the latest API version when creating a client. You only need to specify this parameter if you want to use a previous API version of the client.
use_ssl (boolean) -- Whether or not to use SSL. By default, SSL is used. Note that not all services support non-ssl connections.
verify (boolean/string) -- Whether or not to verify SSL certificates. By default SSL certificates are verified. You can provide the following values:
False - do not validate SSL certificates. SSL will still be used (unless use_ssl is False), but SSL certificates will not be verified.
path/to/cert/bundle.pem - A filename of the CA cert bundle to uses. You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.
dbConnect()
returns a s4 class. This object is used to communicate with AWS Athena.
if (FALSE) {
# Connect to Athena using your aws access keys
library(DBI)
con <- dbConnect(RAthena::athena(),
aws_access_key_id='YOUR_ACCESS_KEY_ID', #
aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
s3_staging_dir='s3://path/to/query/bucket/',
region_name='us-west-2')
dbDisconnect(con)
# Connect to Athena using your profile name
# Profile name can be created by using AWS CLI
con <- dbConnect(RAthena::athena(),
profile_name = "YOUR_PROFILE_NAME",
s3_staging_dir = 's3://path/to/query/bucket/')
dbDisconnect(con)
# Connect to Athena using ARN role
con <- dbConnect(RAthena::athena(),
profile_name = "YOUR_PROFILE_NAME",
role_arn = "arn:aws:sts::123456789012:assumed-role/role_name/role_session_name",
s3_staging_dir = 's3://path/to/query/bucket/')
dbDisconnect(con)
}