Select Object Content
s3_select_object_content | R Documentation |
This operation is not supported by directory buckets¶
Description¶
This operation is not supported by directory buckets.
The SelectObjectContent operation is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the operation as usual. Learn more
This action filters the contents of an Amazon S3 object based on a simple structured query language (SQL) statement. In the request, along with the SQL expression, you must also specify a data serialization format (JSON, CSV, or Apache Parquet) of the object. Amazon S3 uses this format to parse object data into records, and returns only records that match the specified SQL expression. You must also specify the data serialization format for the response.
This functionality is not supported for Amazon S3 on Outposts.
For more information about Amazon S3 Select, see Selecting Content from Objects and SELECT Command in the Amazon S3 User Guide.
Permissions¶
You must have the s3:GetObject
permission for this operation. Amazon
S3 Select does not support anonymous access. For more information about
permissions, see Specifying Permissions in a
Policy
in the Amazon S3 User Guide.
Object Data Formats¶
You can use Amazon S3 Select to query objects that have the following format properties:
-
CSV, JSON, and Parquet - Objects must be in CSV, JSON, or Parquet format.
-
UTF-8 - UTF-8 is the only encoding type Amazon S3 Select supports.
-
GZIP or BZIP2 - CSV and JSON files can be compressed using GZIP or BZIP2. GZIP and BZIP2 are the only compression formats that Amazon S3 Select supports for CSV and JSON files. Amazon S3 Select supports columnar compression for Parquet using GZIP or Snappy. Amazon S3 Select does not support whole-object compression for Parquet objects.
-
Server-side encryption - Amazon S3 Select supports querying objects that are protected with server-side encryption.
For objects that are encrypted with customer-provided encryption keys (SSE-C), you must use HTTPS, and you must use the headers that are documented in the
get_object
. For more information about SSE-C, see Server-Side Encryption (Using Customer-Provided Encryption Keys) in the Amazon S3 User Guide.For objects that are encrypted with Amazon S3 managed keys (SSE-S3) and Amazon Web Services KMS keys (SSE-KMS), server-side encryption is handled transparently, so you don't need to specify anything. For more information about server-side encryption, including SSE-S3 and SSE-KMS, see Protecting Data Using Server-Side Encryption in the Amazon S3 User Guide.
Working with the Response Body¶
Given the response size is unknown, Amazon S3 Select streams the
response as a series of messages and includes a Transfer-Encoding
header with chunked
as its value in the response. For more
information, see Appendix: SelectObjectContent
Response.
GetObject Support¶
The select_object_content
action does not support the following
get_object
functionality. For more information, see get_object
.
-
Range
: Although you can specify a scan range for an Amazon S3 Select request (see SelectObjectContentRequest - ScanRange in the request parameters), you cannot specify the range of bytes of an object to return. -
The
GLACIER
,DEEP_ARCHIVE
, andREDUCED_REDUNDANCY
storage classes, or theARCHIVE_ACCESS
andDEEP_ARCHIVE_ACCESS
access tiers of theINTELLIGENT_TIERING
storage class: You cannot query objects in theGLACIER
,DEEP_ARCHIVE
, orREDUCED_REDUNDANCY
storage classes, nor objects in theARCHIVE_ACCESS
orDEEP_ARCHIVE_ACCESS
access tiers of theINTELLIGENT_TIERING
storage class. For more information about storage classes, see Using Amazon S3 storage classes in the Amazon S3 User Guide.
Special Errors¶
For a list of special errors for this operation, see List of SELECT Object Content Error Codes
The following operations are related to select_object_content
:
-
get_object
-
get_bucket_lifecycle_configuration
-
put_bucket_lifecycle_configuration
Usage¶
s3_select_object_content(Bucket, Key, SSECustomerAlgorithm,
SSECustomerKey, SSECustomerKeyMD5, Expression, ExpressionType,
RequestProgress, InputSerialization, OutputSerialization, ScanRange,
ExpectedBucketOwner)
Arguments¶
Bucket
[required] The S3 bucket.
Key
[required] The object key.
SSECustomerAlgorithm
The server-side encryption (SSE) algorithm used to encrypt the object. This parameter is needed only when the object was created using a checksum algorithm. For more information, see Protecting data using SSE-C keys in the Amazon S3 User Guide.
SSECustomerKey
The server-side encryption (SSE) customer managed key. This parameter is needed only when the object was created using a checksum algorithm. For more information, see Protecting data using SSE-C keys in the Amazon S3 User Guide.
SSECustomerKeyMD5
The MD5 server-side encryption (SSE) customer managed key. This parameter is needed only when the object was created using a checksum algorithm. For more information, see Protecting data using SSE-C keys in the Amazon S3 User Guide.
Expression
[required] The expression that is used to query the object.
ExpressionType
[required] The type of the provided expression (for example, SQL).
RequestProgress
Specifies if periodic request progress information should be enabled.
InputSerialization
[required] Describes the format of the data in the object that is being queried.
OutputSerialization
[required] Describes the format of the data that you want Amazon S3 to return in response.
ScanRange
Specifies the byte range of the object to get the records from. A record is processed when its first byte is contained by the range. This parameter is optional, but when specified, it must not be empty. See RFC 2616, Section 14.35.1 about how to specify the start and end of the range.
ScanRange
may be used in the following ways:<scanrange><start>50</start><end>100</end></scanrange>
- process only the records starting between the bytes 50 and 100 (inclusive, counting from zero)<scanrange><start>50</start></scanrange>
- process only the records starting after the byte 50<scanrange><end>50</end></scanrange>
- process only the records within the last 50 bytes of the file.
ExpectedBucketOwner
The account ID of the expected bucket owner. If the account ID that you provide does not match the actual owner of the bucket, the request fails with the HTTP status code
403 Forbidden
(access denied).
Value¶
A list with the following syntax:
list(
Payload = list(
Records = list(
Payload = raw
),
Stats = list(
Details = list(
BytesScanned = 123,
BytesProcessed = 123,
BytesReturned = 123
)
),
Progress = list(
Details = list(
BytesScanned = 123,
BytesProcessed = 123,
BytesReturned = 123
)
),
Cont = list(),
End = list()
)
)
Request syntax¶
svc$select_object_content(
Bucket = "string",
Key = "string",
SSECustomerAlgorithm = "string",
SSECustomerKey = raw,
SSECustomerKeyMD5 = "string",
Expression = "string",
ExpressionType = "SQL",
RequestProgress = list(
Enabled = TRUE|FALSE
),
InputSerialization = list(
CSV = list(
FileHeaderInfo = "USE"|"IGNORE"|"NONE",
Comments = "string",
QuoteEscapeCharacter = "string",
RecordDelimiter = "string",
FieldDelimiter = "string",
QuoteCharacter = "string",
AllowQuotedRecordDelimiter = TRUE|FALSE
),
CompressionType = "NONE"|"GZIP"|"BZIP2",
JSON = list(
Type = "DOCUMENT"|"LINES"
),
Parquet = list()
),
OutputSerialization = list(
CSV = list(
QuoteFields = "ALWAYS"|"ASNEEDED",
QuoteEscapeCharacter = "string",
RecordDelimiter = "string",
FieldDelimiter = "string",
QuoteCharacter = "string"
),
JSON = list(
RecordDelimiter = "string"
)
),
ScanRange = list(
Start = 123,
End = 123
),
ExpectedBucketOwner = "string"
)