Skip to content

Batch Put Document

kendra_batch_put_document R Documentation

Adds one or more documents to an index

Description

Adds one or more documents to an index.

The batch_put_document API enables you to ingest inline documents or a set of documents stored in an Amazon S3 bucket. Use this API to ingest your text and unstructured text into an index, add custom attributes to the documents, and to attach an access control list to the documents added to the index.

The documents are indexed asynchronously. You can see the progress of the batch using Amazon Web Services CloudWatch. Any error messages related to processing the batch are sent to your Amazon Web Services CloudWatch log. You can also use the batch_get_document_status API to monitor the progress of indexing your documents.

For an example of ingesting inline documents using Python and Java SDKs, see Adding files directly to an index.

Usage

kendra_batch_put_document(IndexId, RoleArn, Documents,
  CustomDocumentEnrichmentConfiguration)

Arguments

IndexId

[required] The identifier of the index to add the documents to. You need to create the index first using the create_index API.

RoleArn

The Amazon Resource Name (ARN) of an IAM role with permission to access your S3 bucket. For more information, see IAM access roles for Amazon Kendra.

Documents

[required] One or more documents to add to the index.

Documents have the following file size limits.

  • 50 MB total size for any file

  • 5 MB extracted text for any file

For more information, see Quotas.

CustomDocumentEnrichmentConfiguration

Configuration information for altering your document metadata and content during the document ingestion process when you use the batch_put_document API.

For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.

Value

A list with the following syntax:

list(
  FailedDocuments = list(
    list(
      Id = "string",
      ErrorCode = "InternalError"|"InvalidRequest",
      ErrorMessage = "string"
    )
  )
)

Request syntax

svc$batch_put_document(
  IndexId = "string",
  RoleArn = "string",
  Documents = list(
    list(
      Id = "string",
      Title = "string",
      Blob = raw,
      S3Path = list(
        Bucket = "string",
        Key = "string"
      ),
      Attributes = list(
        list(
          Key = "string",
          Value = list(
            StringValue = "string",
            StringListValue = list(
              "string"
            ),
            LongValue = 123,
            DateValue = as.POSIXct(
              "2015-01-01"
            )
          )
        )
      ),
      AccessControlList = list(
        list(
          Name = "string",
          Type = "USER"|"GROUP",
          Access = "ALLOW"|"DENY",
          DataSourceId = "string"
        )
      ),
      HierarchicalAccessControlList = list(
        list(
          PrincipalList = list(
            list(
              Name = "string",
              Type = "USER"|"GROUP",
              Access = "ALLOW"|"DENY",
              DataSourceId = "string"
            )
          )
        )
      ),
      ContentType = "PDF"|"HTML"|"MS_WORD"|"PLAIN_TEXT"|"PPT"|"RTF"|"XML"|"XSLT"|"MS_EXCEL"|"CSV"|"JSON"|"MD",
      AccessControlConfigurationId = "string"
    )
  ),
  CustomDocumentEnrichmentConfiguration = list(
    InlineConfigurations = list(
      list(
        Condition = list(
          ConditionDocumentAttributeKey = "string",
          Operator = "GreaterThan"|"GreaterThanOrEquals"|"LessThan"|"LessThanOrEquals"|"Equals"|"NotEquals"|"Contains"|"NotContains"|"Exists"|"NotExists"|"BeginsWith",
          ConditionOnValue = list(
            StringValue = "string",
            StringListValue = list(
              "string"
            ),
            LongValue = 123,
            DateValue = as.POSIXct(
              "2015-01-01"
            )
          )
        ),
        Target = list(
          TargetDocumentAttributeKey = "string",
          TargetDocumentAttributeValueDeletion = TRUE|FALSE,
          TargetDocumentAttributeValue = list(
            StringValue = "string",
            StringListValue = list(
              "string"
            ),
            LongValue = 123,
            DateValue = as.POSIXct(
              "2015-01-01"
            )
          )
        ),
        DocumentContentDeletion = TRUE|FALSE
      )
    ),
    PreExtractionHookConfiguration = list(
      InvocationCondition = list(
        ConditionDocumentAttributeKey = "string",
        Operator = "GreaterThan"|"GreaterThanOrEquals"|"LessThan"|"LessThanOrEquals"|"Equals"|"NotEquals"|"Contains"|"NotContains"|"Exists"|"NotExists"|"BeginsWith",
        ConditionOnValue = list(
          StringValue = "string",
          StringListValue = list(
            "string"
          ),
          LongValue = 123,
          DateValue = as.POSIXct(
            "2015-01-01"
          )
        )
      ),
      LambdaArn = "string",
      S3Bucket = "string"
    ),
    PostExtractionHookConfiguration = list(
      InvocationCondition = list(
        ConditionDocumentAttributeKey = "string",
        Operator = "GreaterThan"|"GreaterThanOrEquals"|"LessThan"|"LessThanOrEquals"|"Equals"|"NotEquals"|"Contains"|"NotContains"|"Exists"|"NotExists"|"BeginsWith",
        ConditionOnValue = list(
          StringValue = "string",
          StringListValue = list(
            "string"
          ),
          LongValue = 123,
          DateValue = as.POSIXct(
            "2015-01-01"
          )
        )
      ),
      LambdaArn = "string",
      S3Bucket = "string"
    ),
    RoleArn = "string"
  )
)