Skip to content

Create Dataset

comprehend_create_dataset R Documentation

Creates a dataset to upload training or test data for a model associated with a flywheel

Description

Creates a dataset to upload training or test data for a model associated with a flywheel. For more information about datasets, see Flywheel overview in the Amazon Comprehend Developer Guide.

Usage

comprehend_create_dataset(FlywheelArn, DatasetName, DatasetType,
  Description, InputDataConfig, ClientRequestToken, Tags)

Arguments

FlywheelArn

[required] The Amazon Resource Number (ARN) of the flywheel of the flywheel to receive the data.

DatasetName

[required] Name of the dataset.

DatasetType

The dataset type. You can specify that the data in a dataset is for training the model or for testing the model.

Description

Description of the dataset.

InputDataConfig

[required] Information about the input data configuration. The type of input data varies based on the format of the input and whether the data is for a classifier model or an entity recognition model.

ClientRequestToken

A unique identifier for the request. If you don't set the client request token, Amazon Comprehend generates one.

Tags

Tags for the dataset.

Value

A list with the following syntax:

list(
  DatasetArn = "string"
)

Request syntax

svc$create_dataset(
  FlywheelArn = "string",
  DatasetName = "string",
  DatasetType = "TRAIN"|"TEST",
  Description = "string",
  InputDataConfig = list(
    AugmentedManifests = list(
      list(
        AttributeNames = list(
          "string"
        ),
        S3Uri = "string",
        AnnotationDataS3Uri = "string",
        SourceDocumentsS3Uri = "string",
        DocumentType = "PLAIN_TEXT_DOCUMENT"|"SEMI_STRUCTURED_DOCUMENT"
      )
    ),
    DataFormat = "COMPREHEND_CSV"|"AUGMENTED_MANIFEST",
    DocumentClassifierInputDataConfig = list(
      S3Uri = "string",
      LabelDelimiter = "string"
    ),
    EntityRecognizerInputDataConfig = list(
      Annotations = list(
        S3Uri = "string"
      ),
      Documents = list(
        S3Uri = "string",
        InputFormat = "ONE_DOC_PER_FILE"|"ONE_DOC_PER_LINE"
      ),
      EntityList = list(
        S3Uri = "string"
      )
    )
  ),
  ClientRequestToken = "string",
  Tags = list(
    list(
      Key = "string",
      Value = "string"
    )
  )
)