Create Data Source from S3
| machinelearning_create_data_source_from_s3 | R Documentation |
Creates a DataSource object¶
Description¶
Creates a DataSource object. A DataSource references data that can
be used to perform create_ml_model, create_evaluation, or
create_batch_prediction operations.
create_data_source_from_s3 is an asynchronous operation. In response
to create_data_source_from_s3, Amazon Machine Learning (Amazon ML)
immediately returns and sets the DataSource status to PENDING. After
the DataSource has been created and is ready for use, Amazon ML sets
the Status parameter to COMPLETED. DataSource in the COMPLETED
or PENDING state can be used to perform only create_ml_model,
create_evaluation or create_batch_prediction operations.
If Amazon ML can't accept the input source, it sets the Status
parameter to FAILED and includes an error message in the Message
attribute of the get_data_source operation response.
The observation data used in a DataSource should be ready to use; that
is, it should have a consistent structure, and missing data values
should be kept to a minimum. The observation data must reside in one or
more .csv files in an Amazon Simple Storage Service (Amazon S3)
location, along with a schema that describes the data items by name and
type. The same schema must be used for all of the data files referenced
by the DataSource.
After the DataSource has been created, it's ready to use in
evaluations and batch predictions. If you plan to use the DataSource
to train an MLModel, the DataSource also needs a recipe. A recipe
describes how each input variable will be used in training an MLModel.
Will the variable be included or excluded from training? Will the
variable be manipulated; for example, will it be combined with another
variable or will it be split apart into word combinations? The recipe
provides answers to these questions.
Usage¶
machinelearning_create_data_source_from_s3(DataSourceId, DataSourceName,
DataSpec, ComputeStatistics)
Arguments¶
DataSourceId[required] A user-supplied identifier that uniquely identifies the
DataSource.DataSourceNameA user-supplied name or description of the
DataSource.DataSpec[required] The data specification of a
DataSource:DataLocationS3 - The Amazon S3 location of the observation data.
DataSchemaLocationS3 - The Amazon S3 location of the
DataSchema.DataSchema - A JSON string representing the schema. This is not required if
DataSchemaUriis specified.DataRearrangement - A JSON string that represents the splitting and rearrangement requirements for the
Datasource.Sample -
"{\"splitting\":{\"percentBegin\":10,\"percentEnd\":60}}"
ComputeStatisticsThe compute statistics for a
DataSource. The statistics are generated from the observation data referenced by aDataSource. Amazon ML uses the statistics internally duringMLModeltraining. This parameter must be set totrueif theDataSourceneeds to be used forMLModeltraining.
Value¶
A list with the following syntax: