Create Labeling Job
sagemaker_create_labeling_job | R Documentation |
Creates a job that uses workers to label the data objects in your input dataset¶
Description¶
Creates a job that uses workers to label the data objects in your input dataset. You can use the labeled data to train machine learning models.
You can select your workforce from one of three providers:
-
A private workforce that you create. It can include employees, contractors, and outside experts. Use a private workforce when want the data to stay within your organization or when a specific set of skills is required.
-
One or more vendors that you select from the Amazon Web Services Marketplace. Vendors provide expertise in specific areas.
-
The Amazon Mechanical Turk workforce. This is the largest workforce, but it should only be used for public data or data that has been stripped of any personally identifiable information.
You can also use automated data labeling to reduce the number of data objects that need to be labeled by a human. Automated data labeling uses active learning to determine if a data object can be labeled by machine or if it needs to be sent to a human worker. For more information, see Using Automated Data Labeling.
The data objects to be labeled are contained in an Amazon S3 bucket. You create a manifest file that describes the location of each object. For more information, see Using Input and Output Data.
The output can be used as the manifest file for another labeling job or as training data for your machine learning models.
You can use this operation to create a static labeling job or a
streaming labeling job. A static labeling job stops if all data objects
in the input manifest file identified in ManifestS3Uri
have been
labeled. A streaming labeling job runs perpetually until it is manually
stopped, or remains idle for 10 days. You can send new data objects to
an active (InProgress
) streaming labeling job in real time. To learn
how to create a static labeling job, see Create a Labeling Job
(API)
in the Amazon SageMaker Developer Guide. To learn how to create a
streaming labeling job, see Create a Streaming Labeling
Job.
Usage¶
sagemaker_create_labeling_job(LabelingJobName, LabelAttributeName,
InputConfig, OutputConfig, RoleArn, LabelCategoryConfigS3Uri,
StoppingConditions, LabelingJobAlgorithmsConfig, HumanTaskConfig, Tags)
Arguments¶
LabelingJobName |
[required] The name of the labeling job. This name is used to
identify the job in a list of labeling jobs. Labeling job names must be
unique within an Amazon Web Services account and region.
|
LabelAttributeName |
[required] The attribute name to use for the label in the output
manifest file. This is the key for the key/value pair formed with the
label that a worker assigns to the object. The
If you are creating an adjustment or verification labeling job, you
must use a different |
InputConfig |
[required] Input data for the labeling job, such as the Amazon S3 location of the data objects and the location of the manifest file that describes the data objects. You must specify at least one of the following:
If you use the Amazon Mechanical Turk workforce, your input data
should not include confidential information, personal information or
protected health information. Use |
OutputConfig |
[required] The location of the output data and the Amazon Web Services Key Management Service key ID for the key used to encrypt the output data, if any. |
RoleArn |
[required] The Amazon Resource Number (ARN) that Amazon SageMaker assumes to perform tasks on your behalf during data labeling. You must grant this role the necessary permissions so that Amazon SageMaker can successfully complete data labeling. |
LabelCategoryConfigS3Uri |
The S3 URI of the file, referred to as a label category configuration file, that defines the categories used to label the data objects. For 3D point cloud and video frame task types, you can add label category attributes and frame attributes to your label category configuration file. To learn how, see Create a Labeling Category Configuration File for 3D Point Cloud Labeling Jobs. For named entity recognition jobs, in addition to
For all other built-in
task types and custom
tasks, your label category configuration file must be a JSON file in
the following format. Identify the labels you want to use by replacing
Note the following about the label category configuration file:
|
StoppingConditions |
A set of conditions for stopping the labeling job. If any of the conditions are met, the job is automatically stopped. You can use these conditions to control the cost of data labeling. |
LabelingJobAlgorithmsConfig |
Configures the information required to perform automated data labeling. |
HumanTaskConfig |
[required] Configures the labeling task and how it is presented to workers; including, but not limited to price, keywords, and batch size (task count). |
Tags |
An array of key/value pairs. For more information, see Using Cost Allocation Tags in the Amazon Web Services Billing and Cost Management User Guide. |
Value¶
A list with the following syntax:
list(
LabelingJobArn = "string"
)
Request syntax¶
svc$create_labeling_job(
LabelingJobName = "string",
LabelAttributeName = "string",
InputConfig = list(
DataSource = list(
S3DataSource = list(
ManifestS3Uri = "string"
),
SnsDataSource = list(
SnsTopicArn = "string"
)
),
DataAttributes = list(
ContentClassifiers = list(
"FreeOfPersonallyIdentifiableInformation"|"FreeOfAdultContent"
)
)
),
OutputConfig = list(
S3OutputPath = "string",
KmsKeyId = "string",
SnsTopicArn = "string"
),
RoleArn = "string",
LabelCategoryConfigS3Uri = "string",
StoppingConditions = list(
MaxHumanLabeledObjectCount = 123,
MaxPercentageOfInputDatasetLabeled = 123
),
LabelingJobAlgorithmsConfig = list(
LabelingJobAlgorithmSpecificationArn = "string",
InitialActiveLearningModelArn = "string",
LabelingJobResourceConfig = list(
VolumeKmsKeyId = "string",
VpcConfig = list(
SecurityGroupIds = list(
"string"
),
Subnets = list(
"string"
)
)
)
),
HumanTaskConfig = list(
WorkteamArn = "string",
UiConfig = list(
UiTemplateS3Uri = "string",
HumanTaskUiArn = "string"
),
PreHumanTaskLambdaArn = "string",
TaskKeywords = list(
"string"
),
TaskTitle = "string",
TaskDescription = "string",
NumberOfHumanWorkersPerDataObject = 123,
TaskTimeLimitInSeconds = 123,
TaskAvailabilityLifetimeInSeconds = 123,
MaxConcurrentTaskCount = 123,
AnnotationConsolidationConfig = list(
AnnotationConsolidationLambdaArn = "string"
),
PublicWorkforceTaskPrice = list(
AmountInUsd = list(
Dollars = 123,
Cents = 123,
TenthFractionsOfACent = 123
)
)
),
Tags = list(
list(
Key = "string",
Value = "string"
)
)
)