Skip to content

Start Data Quality Ruleset Evaluation Run

glue_start_data_quality_ruleset_evaluation_run R Documentation

Description

Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table). The evaluation computes results which you can retrieve with the get_data_quality_result API.

Usage

glue_start_data_quality_ruleset_evaluation_run(DataSource, Role,
  NumberOfWorkers, Timeout, ClientToken, AdditionalRunOptions,
  RulesetNames, AdditionalDataSources)

Arguments

DataSource

[required] The data source (Glue table) associated with this run.

Role

[required] An IAM role supplied to encrypt the results of the run.

NumberOfWorkers

The number of G.1X workers to be used in the run. The default is 5.

Timeout

The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

ClientToken

Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.

AdditionalRunOptions

Additional run options you can specify for an evaluation run.

RulesetNames

[required] A list of ruleset names.

AdditionalDataSources

A map of reference strings to additional data sources you can specify for an evaluation run.

Value

A list with the following syntax:

list(
  RunId = "string"
)

Request syntax

svc$start_data_quality_ruleset_evaluation_run(
  DataSource = list(
    GlueTable = list(
      DatabaseName = "string",
      TableName = "string",
      CatalogId = "string",
      ConnectionName = "string",
      AdditionalOptions = list(
        "string"
      )
    )
  ),
  Role = "string",
  NumberOfWorkers = 123,
  Timeout = 123,
  ClientToken = "string",
  AdditionalRunOptions = list(
    CloudWatchMetricsEnabled = TRUE|FALSE,
    ResultsS3Prefix = "string",
    CompositeRuleEvaluationMethod = "COLUMN"|"ROW"
  ),
  RulesetNames = list(
    "string"
  ),
  AdditionalDataSources = list(
    list(
      GlueTable = list(
        DatabaseName = "string",
        TableName = "string",
        CatalogId = "string",
        ConnectionName = "string",
        AdditionalOptions = list(
          "string"
        )
      )
    )
  )
)