Create Profile Job

gluedatabrew_create_profile_job

R Documentation

Creates a new job to analyze a dataset and create its data profile¶

Description¶

Creates a new job to analyze a dataset and create its data profile.

Usage¶

gluedatabrew_create_profile_job(DatasetName, EncryptionKeyArn,
  EncryptionMode, Name, LogSubscription, MaxCapacity, MaxRetries,
  OutputLocation, Configuration, ValidationConfigurations, RoleArn, Tags,
  Timeout, JobSample)

Arguments¶

DatasetName

[required] The name of the dataset that this job is to act upon.

EncryptionKeyArn

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.

EncryptionMode

The encryption mode for the job, which can be one of the following:

SSE-KMS - SSE-KMS - Server-side encryption with KMS-managed keys.
SSE-S3 - Server-side encryption with keys managed by Amazon S3.

Name

[required] The name of the job to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

LogSubscription

Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.

MaxCapacity

The maximum number of nodes that DataBrew can use when the job processes data.

MaxRetries

The maximum number of times to retry the job after a job run fails.

OutputLocation

[required]

Configuration

Configuration for profile jobs. Used to select columns, do evaluations, and override default parameters of evaluations. When configuration is null, the profile job will run with default settings.

ValidationConfigurations

List of validation configurations that are applied to the profile job.

RoleArn

[required] The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.

Tags

Metadata tags to apply to this job.

Timeout

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

JobSample

Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed. If a JobSample value is not provided, the default value will be used. The default value is CUSTOM_ROWS for the mode parameter and 20000 for the size parameter.

Value¶

A list with the following syntax:

list(
  Name = "string"
)

Request syntax¶

svc$create_profile_job(
  DatasetName = "string",
  EncryptionKeyArn = "string",
  EncryptionMode = "SSE-KMS"|"SSE-S3",
  Name = "string",
  LogSubscription = "ENABLE"|"DISABLE",
  MaxCapacity = 123,
  MaxRetries = 123,
  OutputLocation = list(
    Bucket = "string",
    Key = "string",
    BucketOwner = "string"
  ),
  Configuration = list(
    DatasetStatisticsConfiguration = list(
      IncludedStatistics = list(
        "string"
      ),
      Overrides = list(
        list(
          Statistic = "string",
          Parameters = list(
            "string"
          )
        )
      )
    ),
    ProfileColumns = list(
      list(
        Regex = "string",
        Name = "string"
      )
    ),
    ColumnStatisticsConfigurations = list(
      list(
        Selectors = list(
          list(
            Regex = "string",
            Name = "string"
          )
        ),
        Statistics = list(
          IncludedStatistics = list(
            "string"
          ),
          Overrides = list(
            list(
              Statistic = "string",
              Parameters = list(
                "string"
              )
            )
          )
        )
      )
    ),
    EntityDetectorConfiguration = list(
      EntityTypes = list(
        "string"
      ),
      AllowedStatistics = list(
        list(
          Statistics = list(
            "string"
          )
        )
      )
    )
  ),
  ValidationConfigurations = list(
    list(
      RulesetArn = "string",
      ValidationMode = "CHECK_ALL"
    )
  ),
  RoleArn = "string",
  Tags = list(
    "string"
  ),
  Timeout = 123,
  JobSample = list(
    Mode = "FULL_DATASET"|"CUSTOM_ROWS",
    Size = 123
  )
)