Skip to content

Instantly share code, notes, and snippets.

@sengstacken
Last active November 24, 2020 14:23
Show Gist options
  • Save sengstacken/a3b4b56784a99bcb673e8c263af11d15 to your computer and use it in GitHub Desktop.
Save sengstacken/a3b4b56784a99bcb673e8c263af11d15 to your computer and use it in GitHub Desktop.
How to upload and prepare training data for SageMaker
import sagemaker
s3_bucket = 'ENTER BUCKET NAME'
sagemaker_session = sagemaker.Session()
# upload
sagemaker_session.upload_data(path='val', bucket=s3_bucket, key_prefix='data/val_annotation')
sagemaker_session.upload_data(path='test', bucket=s3_bucket, key_prefix='data/test_annotation')
sagemaker_session.upload_data(path='train', bucket=s3_bucket, key_prefix='data/train_annotation')
sagemaker_session.upload_data(path='trainaug', bucket=s3_bucket, key_prefix='data/trainaug_annotation')
# Define Training Input
s3_train_data = f's3://{s3_bucket}/data/val_rec'
train_data = TrainingInput(s3_train_data, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix',input_mode='Pipe',shuffle_config=sagemaker.inputs.ShuffleConfig(1234))
s3_val_data = f's3://{s3_bucket}/data/test_rec'
val_data = TrainingInput(s3_val_data, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix',input_mode='Pipe')
# model data for incremental training
s3_model_data = 's3 path' # s3 path of the saved model data (model.tar.gz), can use .model_data from recent trained extimator if needed
model_data = TrainingInput(s3_model_data, distribution='FullyReplicated', content_type='application/x-sagemaker-model', s3_data_type='S3Prefix',input_mode='File')
data_channels = {'train': train_data, 'validation': val_data, 'model': model_data}
# output path
s3_output_location = f's3://{s3_bucket}/output'
# checkpoint path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment