Start Data Quality Ruleset Evaluation Run
glue_start_data_quality_ruleset_evaluation_run | R Documentation |
Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table)¶
Description¶
Once you have a ruleset definition (either recommended or your own), you
call this operation to evaluate the ruleset against a data source (Glue
table). The evaluation computes results which you can retrieve with the
get_data_quality_result
API.
Usage¶
glue_start_data_quality_ruleset_evaluation_run(DataSource, Role,
NumberOfWorkers, Timeout, ClientToken, AdditionalRunOptions,
RulesetNames, AdditionalDataSources)
Arguments¶
DataSource
[required] The data source (Glue table) associated with this run.
Role
[required] An IAM role supplied to encrypt the results of the run.
NumberOfWorkers
The number of
G.1X
workers to be used in the run. The default is 5.Timeout
The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours).ClientToken
Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
AdditionalRunOptions
Additional run options you can specify for an evaluation run.
RulesetNames
[required] A list of ruleset names.
AdditionalDataSources
A map of reference strings to additional data sources you can specify for an evaluation run.
Value¶
A list with the following syntax:
Request syntax¶
svc$start_data_quality_ruleset_evaluation_run(
DataSource = list(
GlueTable = list(
DatabaseName = "string",
TableName = "string",
CatalogId = "string",
ConnectionName = "string",
AdditionalOptions = list(
"string"
)
)
),
Role = "string",
NumberOfWorkers = 123,
Timeout = 123,
ClientToken = "string",
AdditionalRunOptions = list(
CloudWatchMetricsEnabled = TRUE|FALSE,
ResultsS3Prefix = "string",
CompositeRuleEvaluationMethod = "COLUMN"|"ROW"
),
RulesetNames = list(
"string"
),
AdditionalDataSources = list(
list(
GlueTable = list(
DatabaseName = "string",
TableName = "string",
CatalogId = "string",
ConnectionName = "string",
AdditionalOptions = list(
"string"
)
)
)
)
)