Create Data Source From S3
machinelearning_create_data_source_from_s3 | R Documentation |
Creates a DataSource object¶
Description¶
Creates a DataSource
object. A DataSource
references data that can
be used to perform create_ml_model
, create_evaluation
, or
create_batch_prediction
operations.
create_data_source_from_s3
is an asynchronous operation. In response
to create_data_source_from_s3
, Amazon Machine Learning (Amazon ML)
immediately returns and sets the DataSource
status to PENDING
. After
the DataSource
has been created and is ready for use, Amazon ML sets
the Status
parameter to COMPLETED
. DataSource
in the COMPLETED
or PENDING
state can be used to perform only create_ml_model
,
create_evaluation
or create_batch_prediction
operations.
If Amazon ML can't accept the input source, it sets the Status
parameter to FAILED
and includes an error message in the Message
attribute of the get_data_source
operation response.
The observation data used in a DataSource
should be ready to use; that
is, it should have a consistent structure, and missing data values
should be kept to a minimum. The observation data must reside in one or
more .csv files in an Amazon Simple Storage Service (Amazon S3)
location, along with a schema that describes the data items by name and
type. The same schema must be used for all of the data files referenced
by the DataSource
.
After the DataSource
has been created, it's ready to use in
evaluations and batch predictions. If you plan to use the DataSource
to train an MLModel
, the DataSource
also needs a recipe. A recipe
describes how each input variable will be used in training an MLModel
.
Will the variable be included or excluded from training? Will the
variable be manipulated; for example, will it be combined with another
variable or will it be split apart into word combinations? The recipe
provides answers to these questions.
Usage¶
machinelearning_create_data_source_from_s3(DataSourceId, DataSourceName,
DataSpec, ComputeStatistics)
Arguments¶
DataSourceId |
[required] A user-supplied identifier that uniquely identifies
the |
DataSourceName |
A user-supplied name or description of the
|
DataSpec |
[required] The data specification of a
|
ComputeStatistics |
The compute statistics for a |
Value¶
A list with the following syntax:
list(
DataSourceId = "string"
)
Request syntax¶
svc$create_data_source_from_s3(
DataSourceId = "string",
DataSourceName = "string",
DataSpec = list(
DataLocationS3 = "string",
DataRearrangement = "string",
DataSchema = "string",
DataSchemaLocationS3 = "string"
),
ComputeStatistics = TRUE|FALSE
)