Classify Document
comprehend_classify_document | R Documentation |
Creates a classification request to analyze a single document in real-time¶
Description¶
Creates a classification request to analyze a single document in
real-time. classify_document
supports the following model types:
-
Custom classifier - a custom model that you have created and trained. For input, you can provide plain text, a single-page document (PDF, Word, or image), or Amazon Textract API output. For more information, see Custom classification in the Amazon Comprehend Developer Guide.
-
Prompt safety classifier - Amazon Comprehend provides a pre-trained model for classifying input prompts for generative AI applications. For input, you provide English plain text input. For prompt safety classification, the response includes only the
Classes
field. For more information about prompt safety classifiers, see Prompt safety classification in the Amazon Comprehend Developer Guide.
If the system detects errors while processing a page in the input
document, the API response includes an Errors
field that describes the
errors.
If the system detects a document-level error in your input document, the
API returns an InvalidRequestException
error response. For details
about this exception, see Errors in semi-structured
documents in the
Comprehend Developer Guide.
Usage¶
Arguments¶
Text
The document text to be analyzed. If you enter text using this parameter, do not use the
Bytes
parameter.EndpointArn
[required] The Amazon Resource Number (ARN) of the endpoint.
For prompt safety classification, Amazon Comprehend provides the endpoint ARN. For more information about prompt safety classifiers, see Prompt safety classification in the Amazon Comprehend Developer Guide
For custom classification, you create an endpoint for your custom model. For more information, see Using Amazon Comprehend endpoints.
Bytes
Use the
Bytes
parameter to input a text, PDF, Word or image file.When you classify a document using a custom model, you can also use the
Bytes
parameter to input an Amazon TextractDetectDocumentText
orAnalyzeDocument
output file.To classify a document using the prompt safety classifier, use the
Text
parameter for input.Provide the input document as a sequence of base64-encoded bytes. If your code uses an Amazon Web Services SDK to classify documents, the SDK may encode the document file bytes for you.
The maximum length of this field depends on the input document type. For details, see Inputs for real-time custom analysis in the Comprehend Developer Guide.
If you use the
Bytes
parameter, do not use theText
parameter.DocumentReaderConfig
Provides configuration parameters to override the default actions for extracting text from PDF documents and image files.
Value¶
A list with the following syntax:
list(
Classes = list(
list(
Name = "string",
Score = 123.0,
Page = 123
)
),
Labels = list(
list(
Name = "string",
Score = 123.0,
Page = 123
)
),
DocumentMetadata = list(
Pages = 123,
ExtractedCharacters = list(
list(
Page = 123,
Count = 123
)
)
),
DocumentType = list(
list(
Page = 123,
Type = "NATIVE_PDF"|"SCANNED_PDF"|"MS_WORD"|"IMAGE"|"PLAIN_TEXT"|"TEXTRACT_DETECT_DOCUMENT_TEXT_JSON"|"TEXTRACT_ANALYZE_DOCUMENT_JSON"
)
),
Errors = list(
list(
Page = 123,
ErrorCode = "TEXTRACT_BAD_PAGE"|"TEXTRACT_PROVISIONED_THROUGHPUT_EXCEEDED"|"PAGE_CHARACTERS_EXCEEDED"|"PAGE_SIZE_EXCEEDED"|"INTERNAL_SERVER_ERROR",
ErrorMessage = "string"
)
),
Warnings = list(
list(
Page = 123,
WarnCode = "INFERENCING_PLAINTEXT_WITH_NATIVE_TRAINED_MODEL"|"INFERENCING_NATIVE_DOCUMENT_WITH_PLAINTEXT_TRAINED_MODEL",
WarnMessage = "string"
)
)
)