Start Data Quality Ruleset Evaluation Run
| glue_start_data_quality_ruleset_evaluation_run | R Documentation |
Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table)¶
Description¶
Once you have a ruleset definition (either recommended or your own), you
call this operation to evaluate the ruleset against a data source (Glue
table). The evaluation computes results which you can retrieve with the
get_data_quality_result API.
Usage¶
glue_start_data_quality_ruleset_evaluation_run(DataSource, Role,
NumberOfWorkers, Timeout, ClientToken, AdditionalRunOptions,
RulesetNames, AdditionalDataSources)
Arguments¶
DataSource[required] The data source (Glue table) associated with this run.
Role[required] An IAM role supplied to encrypt the results of the run.
NumberOfWorkersThe number of
G.1Xworkers to be used in the run. The default is 5.TimeoutThe timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters
TIMEOUTstatus. The default is 2,880 minutes (48 hours).ClientTokenUsed for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
AdditionalRunOptionsAdditional run options you can specify for an evaluation run.
RulesetNames[required] A list of ruleset names.
AdditionalDataSourcesA map of reference strings to additional data sources you can specify for an evaluation run.
Value¶
A list with the following syntax:
Request syntax¶
svc$start_data_quality_ruleset_evaluation_run(
DataSource = list(
GlueTable = list(
DatabaseName = "string",
TableName = "string",
CatalogId = "string",
ConnectionName = "string",
AdditionalOptions = list(
"string"
)
)
),
Role = "string",
NumberOfWorkers = 123,
Timeout = 123,
ClientToken = "string",
AdditionalRunOptions = list(
CloudWatchMetricsEnabled = TRUE|FALSE,
ResultsS3Prefix = "string",
CompositeRuleEvaluationMethod = "COLUMN"|"ROW"
),
RulesetNames = list(
"string"
),
AdditionalDataSources = list(
list(
GlueTable = list(
DatabaseName = "string",
TableName = "string",
CatalogId = "string",
ConnectionName = "string",
AdditionalOptions = list(
"string"
)
)
)
)
)