Skip to content

Update Crawler

glue_update_crawler R Documentation

Updates a crawler

Description

Updates a crawler. If a crawler is running, you must stop it using stop_crawler before updating it.

Usage

glue_update_crawler(Name, Role, DatabaseName, Description, Targets,
  Schedule, Classifiers, TablePrefix, SchemaChangePolicy, RecrawlPolicy,
  LineageConfiguration, LakeFormationConfiguration, Configuration,
  CrawlerSecurityConfiguration)

Arguments

Name

[required] Name of the new crawler.

Role

The IAM role or Amazon Resource Name (ARN) of an IAM role that is used by the new crawler to access customer resources.

DatabaseName

The Glue database where results are stored, such as: ⁠arn:aws:daylight:us-east-1::database/sometable/*⁠.

Description

A description of the new crawler.

Targets

A list of targets to crawl.

Schedule

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: ⁠cron(15 12 * * ? *)⁠.

Classifiers

A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.

TablePrefix

The table prefix used for catalog tables that are created.

SchemaChangePolicy

The policy for the crawler's update and deletion behavior.

RecrawlPolicy

A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.

LineageConfiguration

Specifies data lineage configuration settings for the crawler.

LakeFormationConfiguration

Specifies Lake Formation configuration settings for the crawler.

Configuration

Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.

CrawlerSecurityConfiguration

The name of the SecurityConfiguration structure to be used by this crawler.

Value

An empty list.

Request syntax

svc$update_crawler(
  Name = "string",
  Role = "string",
  DatabaseName = "string",
  Description = "string",
  Targets = list(
    S3Targets = list(
      list(
        Path = "string",
        Exclusions = list(
          "string"
        ),
        ConnectionName = "string",
        SampleSize = 123,
        EventQueueArn = "string",
        DlqEventQueueArn = "string"
      )
    ),
    JdbcTargets = list(
      list(
        ConnectionName = "string",
        Path = "string",
        Exclusions = list(
          "string"
        ),
        EnableAdditionalMetadata = list(
          "COMMENTS"|"RAWTYPES"
        )
      )
    ),
    MongoDBTargets = list(
      list(
        ConnectionName = "string",
        Path = "string",
        ScanAll = TRUE|FALSE
      )
    ),
    DynamoDBTargets = list(
      list(
        Path = "string",
        scanAll = TRUE|FALSE,
        scanRate = 123.0
      )
    ),
    CatalogTargets = list(
      list(
        DatabaseName = "string",
        Tables = list(
          "string"
        ),
        ConnectionName = "string",
        EventQueueArn = "string",
        DlqEventQueueArn = "string"
      )
    ),
    DeltaTargets = list(
      list(
        DeltaTables = list(
          "string"
        ),
        ConnectionName = "string",
        WriteManifest = TRUE|FALSE,
        CreateNativeDeltaTable = TRUE|FALSE
      )
    ),
    IcebergTargets = list(
      list(
        Paths = list(
          "string"
        ),
        ConnectionName = "string",
        Exclusions = list(
          "string"
        ),
        MaximumTraversalDepth = 123
      )
    ),
    HudiTargets = list(
      list(
        Paths = list(
          "string"
        ),
        ConnectionName = "string",
        Exclusions = list(
          "string"
        ),
        MaximumTraversalDepth = 123
      )
    )
  ),
  Schedule = "string",
  Classifiers = list(
    "string"
  ),
  TablePrefix = "string",
  SchemaChangePolicy = list(
    UpdateBehavior = "LOG"|"UPDATE_IN_DATABASE",
    DeleteBehavior = "LOG"|"DELETE_FROM_DATABASE"|"DEPRECATE_IN_DATABASE"
  ),
  RecrawlPolicy = list(
    RecrawlBehavior = "CRAWL_EVERYTHING"|"CRAWL_NEW_FOLDERS_ONLY"|"CRAWL_EVENT_MODE"
  ),
  LineageConfiguration = list(
    CrawlerLineageSettings = "ENABLE"|"DISABLE"
  ),
  LakeFormationConfiguration = list(
    UseLakeFormationCredentials = TRUE|FALSE,
    AccountId = "string"
  ),
  Configuration = "string",
  CrawlerSecurityConfiguration = "string"
)