Skip to content

Create Auto Ml Job

sagemaker_create_auto_ml_job R Documentation

Creates an Autopilot job also referred to as Autopilot experiment or AutoML job

Description

Creates an Autopilot job also referred to as Autopilot experiment or AutoML job.

An AutoML job in SageMaker is a fully automated process that allows you to build machine learning models with minimal effort and machine learning expertise. When initiating an AutoML job, you provide your data and optionally specify parameters tailored to your use case. SageMaker then automates the entire model development lifecycle, including data preprocessing, model training, tuning, and evaluation. AutoML jobs are designed to simplify and accelerate the model building process by automating various tasks and exploring different combinations of machine learning algorithms, data preprocessing techniques, and hyperparameter values. The output of an AutoML job comprises one or more trained models ready for deployment and inference. Additionally, SageMaker AutoML jobs generate a candidate model leaderboard, allowing you to select the best-performing model for deployment.

For more information about AutoML jobs, see https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html in the SageMaker developer guide.

We recommend using the new versions create_auto_ml_job_v2 and describe_auto_ml_job_v2, which offer backward compatibility.

create_auto_ml_job_v2 can manage tabular problem types identical to those of its previous version create_auto_ml_job, as well as time-series forecasting, non-tabular problem types such as image or text classification, and text generation (LLMs fine-tuning).

Find guidelines about how to migrate a create_auto_ml_job to create_auto_ml_job_v2 in Migrate a CreateAutoMLJob to CreateAutoMLJobV2.

You can find the best-performing model after you run an AutoML job by calling describe_auto_ml_job_v2 (recommended) or describe_auto_ml_job.

Usage

sagemaker_create_auto_ml_job(AutoMLJobName, InputDataConfig,
  OutputDataConfig, ProblemType, AutoMLJobObjective, AutoMLJobConfig,
  RoleArn, GenerateCandidateDefinitionsOnly, Tags, ModelDeployConfig)

Arguments

AutoMLJobName

[required] Identifies an Autopilot job. The name must be unique to your account and is case insensitive.

InputDataConfig

[required] An array of channel objects that describes the input data and its location. Each channel is a named input source. Similar to InputDataConfig supported by HyperParameterTrainingJobDefinition. Format(s) supported: CSV, Parquet. A minimum of 500 rows is required for the training dataset. There is not a minimum number of rows required for the validation dataset.

OutputDataConfig

[required] Provides information about encryption and the Amazon S3 output path needed to store artifacts from an AutoML job. Format(s) supported: CSV.

ProblemType

Defines the type of supervised learning problem available for the candidates. For more information, see SageMaker Autopilot problem types.

AutoMLJobObjective

Specifies a metric to minimize or maximize as the objective of a job. If not specified, the default objective metric depends on the problem type. See AutoMLJobObjective for the default values.

AutoMLJobConfig

A collection of settings used to configure an AutoML job.

RoleArn

[required] The ARN of the role that is used to access the data.

GenerateCandidateDefinitionsOnly

Generates possible candidates without training the models. A candidate is a combination of data preprocessors, algorithms, and algorithm parameter settings.

Tags

An array of key-value pairs. You can use tags to categorize your Amazon Web Services resources in different ways, for example, by purpose, owner, or environment. For more information, see Tagging Amazon Web ServicesResources. Tag keys must be unique per resource.

ModelDeployConfig

Specifies how to generate the endpoint name for an automatic one-click Autopilot model deployment.

Value

A list with the following syntax:

list(
  AutoMLJobArn = "string"
)

Request syntax

svc$create_auto_ml_job(
  AutoMLJobName = "string",
  InputDataConfig = list(
    list(
      DataSource = list(
        S3DataSource = list(
          S3DataType = "ManifestFile"|"S3Prefix"|"AugmentedManifestFile",
          S3Uri = "string"
        )
      ),
      CompressionType = "None"|"Gzip",
      TargetAttributeName = "string",
      ContentType = "string",
      ChannelType = "training"|"validation",
      SampleWeightAttributeName = "string"
    )
  ),
  OutputDataConfig = list(
    KmsKeyId = "string",
    S3OutputPath = "string"
  ),
  ProblemType = "BinaryClassification"|"MulticlassClassification"|"Regression",
  AutoMLJobObjective = list(
    MetricName = "Accuracy"|"MSE"|"F1"|"F1macro"|"AUC"|"RMSE"|"BalancedAccuracy"|"R2"|"Recall"|"RecallMacro"|"Precision"|"PrecisionMacro"|"MAE"|"MAPE"|"MASE"|"WAPE"|"AverageWeightedQuantileLoss"
  ),
  AutoMLJobConfig = list(
    CompletionCriteria = list(
      MaxCandidates = 123,
      MaxRuntimePerTrainingJobInSeconds = 123,
      MaxAutoMLJobRuntimeInSeconds = 123
    ),
    SecurityConfig = list(
      VolumeKmsKeyId = "string",
      EnableInterContainerTrafficEncryption = TRUE|FALSE,
      VpcConfig = list(
        SecurityGroupIds = list(
          "string"
        ),
        Subnets = list(
          "string"
        )
      )
    ),
    CandidateGenerationConfig = list(
      FeatureSpecificationS3Uri = "string",
      AlgorithmsConfig = list(
        list(
          AutoMLAlgorithms = list(
            "xgboost"|"linear-learner"|"mlp"|"lightgbm"|"catboost"|"randomforest"|"extra-trees"|"nn-torch"|"fastai"|"cnn-qr"|"deepar"|"prophet"|"npts"|"arima"|"ets"
          )
        )
      )
    ),
    DataSplitConfig = list(
      ValidationFraction = 123.0
    ),
    Mode = "AUTO"|"ENSEMBLING"|"HYPERPARAMETER_TUNING"
  ),
  RoleArn = "string",
  GenerateCandidateDefinitionsOnly = TRUE|FALSE,
  Tags = list(
    list(
      Key = "string",
      Value = "string"
    )
  ),
  ModelDeployConfig = list(
    AutoGenerateEndpointName = TRUE|FALSE,
    EndpointName = "string"
  )
)