Create Auto Ml Job V2
sagemaker_create_auto_ml_job_v2 | R Documentation |
Creates an Autopilot job also referred to as Autopilot experiment or AutoML job V2¶
Description¶
Creates an Autopilot job also referred to as Autopilot experiment or AutoML job V2.
An AutoML job in SageMaker AI is a fully automated process that allows you to build machine learning models with minimal effort and machine learning expertise. When initiating an AutoML job, you provide your data and optionally specify parameters tailored to your use case. SageMaker AI then automates the entire model development lifecycle, including data preprocessing, model training, tuning, and evaluation. AutoML jobs are designed to simplify and accelerate the model building process by automating various tasks and exploring different combinations of machine learning algorithms, data preprocessing techniques, and hyperparameter values. The output of an AutoML job comprises one or more trained models ready for deployment and inference. Additionally, SageMaker AI AutoML jobs generate a candidate model leaderboard, allowing you to select the best-performing model for deployment.
For more information about AutoML jobs, see https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html in the SageMaker AI developer guide.
AutoML jobs V2 support various problem types such as regression, binary, and multiclass classification with tabular data, text and image classification, time-series forecasting, and fine-tuning of large language models (LLMs) for text generation.
create_auto_ml_job_v2
and describe_auto_ml_job_v2
are new versions
of create_auto_ml_job
and describe_auto_ml_job
which offer backward
compatibility.
create_auto_ml_job_v2
can manage tabular problem types identical to
those of its previous version create_auto_ml_job
, as well as
time-series forecasting, non-tabular problem types such as image or text
classification, and text generation (LLMs fine-tuning).
Find guidelines about how to migrate a create_auto_ml_job
to
create_auto_ml_job_v2
in Migrate a CreateAutoMLJob to
CreateAutoMLJobV2.
For the list of available problem types supported by
create_auto_ml_job_v2
, see
AutoMLProblemTypeConfig.
You can find the best-performing model after you run an AutoML job V2 by
calling describe_auto_ml_job_v2
.
Usage¶
sagemaker_create_auto_ml_job_v2(AutoMLJobName, AutoMLJobInputDataConfig,
OutputDataConfig, AutoMLProblemTypeConfig, RoleArn, Tags,
SecurityConfig, AutoMLJobObjective, ModelDeployConfig, DataSplitConfig,
AutoMLComputeConfig)
Arguments¶
AutoMLJobName |
[required] Identifies an Autopilot job. The name must be unique to your account and is case insensitive. |
AutoMLJobInputDataConfig |
[required] An array of channel objects describing the input data
and their location. Each channel is a named input source. Similar to the
InputDataConfig
attribute in the
|
OutputDataConfig |
[required] Provides information about encryption and the Amazon S3 output path needed to store artifacts from an AutoML job. |
AutoMLProblemTypeConfig |
[required] Defines the configuration settings of one of the supported problem types. |
RoleArn |
[required] The ARN of the role that is used to access the data. |
Tags |
An array of key-value pairs. You can use tags to categorize your Amazon Web Services resources in different ways, such as by purpose, owner, or environment. For more information, see Tagging Amazon Web ServicesResources. Tag keys must be unique per resource. |
SecurityConfig |
The security configuration for traffic encryption or Amazon VPC settings. |
AutoMLJobObjective |
Specifies a metric to minimize or maximize as the objective of a job. If not specified, the default objective metric depends on the problem type. For the list of default values per problem type, see AutoMLJobObjective.
|
ModelDeployConfig |
Specifies how to generate the endpoint name for an automatic one-click Autopilot model deployment. |
DataSplitConfig |
This structure specifies how to split the data into train and validation datasets. The validation and training datasets must contain the same headers.
For jobs created by calling This attribute must not be set for the time-series forecasting problem type, as Autopilot automatically splits the input dataset into training and validation sets. |
AutoMLComputeConfig |
Specifies the compute configuration for the AutoML job V2. |
Value¶
A list with the following syntax:
list(
AutoMLJobArn = "string"
)
Request syntax¶
svc$create_auto_ml_job_v2(
AutoMLJobName = "string",
AutoMLJobInputDataConfig = list(
list(
ChannelType = "training"|"validation",
ContentType = "string",
CompressionType = "None"|"Gzip",
DataSource = list(
S3DataSource = list(
S3DataType = "ManifestFile"|"S3Prefix"|"AugmentedManifestFile",
S3Uri = "string"
)
)
)
),
OutputDataConfig = list(
KmsKeyId = "string",
S3OutputPath = "string"
),
AutoMLProblemTypeConfig = list(
ImageClassificationJobConfig = list(
CompletionCriteria = list(
MaxCandidates = 123,
MaxRuntimePerTrainingJobInSeconds = 123,
MaxAutoMLJobRuntimeInSeconds = 123
)
),
TextClassificationJobConfig = list(
CompletionCriteria = list(
MaxCandidates = 123,
MaxRuntimePerTrainingJobInSeconds = 123,
MaxAutoMLJobRuntimeInSeconds = 123
),
ContentColumn = "string",
TargetLabelColumn = "string"
),
TimeSeriesForecastingJobConfig = list(
FeatureSpecificationS3Uri = "string",
CompletionCriteria = list(
MaxCandidates = 123,
MaxRuntimePerTrainingJobInSeconds = 123,
MaxAutoMLJobRuntimeInSeconds = 123
),
ForecastFrequency = "string",
ForecastHorizon = 123,
ForecastQuantiles = list(
"string"
),
Transformations = list(
Filling = list(
list(
"string"
)
),
Aggregation = list(
"sum"|"avg"|"first"|"min"|"max"
)
),
TimeSeriesConfig = list(
TargetAttributeName = "string",
TimestampAttributeName = "string",
ItemIdentifierAttributeName = "string",
GroupingAttributeNames = list(
"string"
)
),
HolidayConfig = list(
list(
CountryCode = "string"
)
),
CandidateGenerationConfig = list(
AlgorithmsConfig = list(
list(
AutoMLAlgorithms = list(
"xgboost"|"linear-learner"|"mlp"|"lightgbm"|"catboost"|"randomforest"|"extra-trees"|"nn-torch"|"fastai"|"cnn-qr"|"deepar"|"prophet"|"npts"|"arima"|"ets"
)
)
)
)
),
TabularJobConfig = list(
CandidateGenerationConfig = list(
AlgorithmsConfig = list(
list(
AutoMLAlgorithms = list(
"xgboost"|"linear-learner"|"mlp"|"lightgbm"|"catboost"|"randomforest"|"extra-trees"|"nn-torch"|"fastai"|"cnn-qr"|"deepar"|"prophet"|"npts"|"arima"|"ets"
)
)
)
),
CompletionCriteria = list(
MaxCandidates = 123,
MaxRuntimePerTrainingJobInSeconds = 123,
MaxAutoMLJobRuntimeInSeconds = 123
),
FeatureSpecificationS3Uri = "string",
Mode = "AUTO"|"ENSEMBLING"|"HYPERPARAMETER_TUNING",
GenerateCandidateDefinitionsOnly = TRUE|FALSE,
ProblemType = "BinaryClassification"|"MulticlassClassification"|"Regression",
TargetAttributeName = "string",
SampleWeightAttributeName = "string"
),
TextGenerationJobConfig = list(
CompletionCriteria = list(
MaxCandidates = 123,
MaxRuntimePerTrainingJobInSeconds = 123,
MaxAutoMLJobRuntimeInSeconds = 123
),
BaseModelName = "string",
TextGenerationHyperParameters = list(
"string"
),
ModelAccessConfig = list(
AcceptEula = TRUE|FALSE
)
)
),
RoleArn = "string",
Tags = list(
list(
Key = "string",
Value = "string"
)
),
SecurityConfig = list(
VolumeKmsKeyId = "string",
EnableInterContainerTrafficEncryption = TRUE|FALSE,
VpcConfig = list(
SecurityGroupIds = list(
"string"
),
Subnets = list(
"string"
)
)
),
AutoMLJobObjective = list(
MetricName = "Accuracy"|"MSE"|"F1"|"F1macro"|"AUC"|"RMSE"|"BalancedAccuracy"|"R2"|"Recall"|"RecallMacro"|"Precision"|"PrecisionMacro"|"MAE"|"MAPE"|"MASE"|"WAPE"|"AverageWeightedQuantileLoss"
),
ModelDeployConfig = list(
AutoGenerateEndpointName = TRUE|FALSE,
EndpointName = "string"
),
DataSplitConfig = list(
ValidationFraction = 123.0
),
AutoMLComputeConfig = list(
EmrServerlessComputeConfig = list(
ExecutionRoleARN = "string"
)
)
)