Batch Put Document
kendra_batch_put_document | R Documentation |
Adds one or more documents to an index¶
Description¶
Adds one or more documents to an index.
The batch_put_document
API enables you to ingest inline documents or a
set of documents stored in an Amazon S3 bucket. Use this API to ingest
your text and unstructured text into an index, add custom attributes to
the documents, and to attach an access control list to the documents
added to the index.
The documents are indexed asynchronously. You can see the progress of
the batch using Amazon Web Services CloudWatch. Any error messages
related to processing the batch are sent to your Amazon Web Services
CloudWatch log. You can also use the batch_get_document_status
API to
monitor the progress of indexing your documents.
For an example of ingesting inline documents using Python and Java SDKs, see Adding files directly to an index.
Usage¶
Arguments¶
IndexId
[required] The identifier of the index to add the documents to. You need to create the index first using the
create_index
API.RoleArn
The Amazon Resource Name (ARN) of an IAM role with permission to access your S3 bucket. For more information, see IAM access roles for Amazon Kendra.
Documents
[required] One or more documents to add to the index.
Documents have the following file size limits.
50 MB total size for any file
5 MB extracted text for any file
For more information, see Quotas.
CustomDocumentEnrichmentConfiguration
Configuration information for altering your document metadata and content during the document ingestion process when you use the
batch_put_document
API.For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.
Value¶
A list with the following syntax:
list(
FailedDocuments = list(
list(
Id = "string",
ErrorCode = "InternalError"|"InvalidRequest",
ErrorMessage = "string"
)
)
)
Request syntax¶
svc$batch_put_document(
IndexId = "string",
RoleArn = "string",
Documents = list(
list(
Id = "string",
Title = "string",
Blob = raw,
S3Path = list(
Bucket = "string",
Key = "string"
),
Attributes = list(
list(
Key = "string",
Value = list(
StringValue = "string",
StringListValue = list(
"string"
),
LongValue = 123,
DateValue = as.POSIXct(
"2015-01-01"
)
)
)
),
AccessControlList = list(
list(
Name = "string",
Type = "USER"|"GROUP",
Access = "ALLOW"|"DENY",
DataSourceId = "string"
)
),
HierarchicalAccessControlList = list(
list(
PrincipalList = list(
list(
Name = "string",
Type = "USER"|"GROUP",
Access = "ALLOW"|"DENY",
DataSourceId = "string"
)
)
)
),
ContentType = "PDF"|"HTML"|"MS_WORD"|"PLAIN_TEXT"|"PPT"|"RTF"|"XML"|"XSLT"|"MS_EXCEL"|"CSV"|"JSON"|"MD",
AccessControlConfigurationId = "string"
)
),
CustomDocumentEnrichmentConfiguration = list(
InlineConfigurations = list(
list(
Condition = list(
ConditionDocumentAttributeKey = "string",
Operator = "GreaterThan"|"GreaterThanOrEquals"|"LessThan"|"LessThanOrEquals"|"Equals"|"NotEquals"|"Contains"|"NotContains"|"Exists"|"NotExists"|"BeginsWith",
ConditionOnValue = list(
StringValue = "string",
StringListValue = list(
"string"
),
LongValue = 123,
DateValue = as.POSIXct(
"2015-01-01"
)
)
),
Target = list(
TargetDocumentAttributeKey = "string",
TargetDocumentAttributeValueDeletion = TRUE|FALSE,
TargetDocumentAttributeValue = list(
StringValue = "string",
StringListValue = list(
"string"
),
LongValue = 123,
DateValue = as.POSIXct(
"2015-01-01"
)
)
),
DocumentContentDeletion = TRUE|FALSE
)
),
PreExtractionHookConfiguration = list(
InvocationCondition = list(
ConditionDocumentAttributeKey = "string",
Operator = "GreaterThan"|"GreaterThanOrEquals"|"LessThan"|"LessThanOrEquals"|"Equals"|"NotEquals"|"Contains"|"NotContains"|"Exists"|"NotExists"|"BeginsWith",
ConditionOnValue = list(
StringValue = "string",
StringListValue = list(
"string"
),
LongValue = 123,
DateValue = as.POSIXct(
"2015-01-01"
)
)
),
LambdaArn = "string",
S3Bucket = "string"
),
PostExtractionHookConfiguration = list(
InvocationCondition = list(
ConditionDocumentAttributeKey = "string",
Operator = "GreaterThan"|"GreaterThanOrEquals"|"LessThan"|"LessThanOrEquals"|"Equals"|"NotEquals"|"Contains"|"NotContains"|"Exists"|"NotExists"|"BeginsWith",
ConditionOnValue = list(
StringValue = "string",
StringListValue = list(
"string"
),
LongValue = 123,
DateValue = as.POSIXct(
"2015-01-01"
)
)
),
LambdaArn = "string",
S3Bucket = "string"
),
RoleArn = "string"
)
)