Skip to content

Instantly share code, notes, and snippets.

@conzy
Created September 1, 2021 19:28
Show Gist options
  • Save conzy/259be0d34b2f6c19d9a6175d948a0819 to your computer and use it in GitHub Desktop.
Save conzy/259be0d34b2f6c19d9a6175d948a0819 to your computer and use it in GitHub Desktop.
Example of using a Step Function to orchestrate a containerised Sagemaker processing job
{
"Processing": {
"Resource": "arn:aws:states:::sagemaker:createProcessingJob.sync",
"Parameters": {
"ProcessingResources": {
"ClusterConfig": {
"InstanceCount": 1,
"InstanceType": "ml.m5.large",
"VolumeSizeInGB": 10
}
},
"ProcessingInputs": [
{
"InputName": "input-1",
"S3Input": {
"S3Uri": "s3://some-bucket/input/stuff",
"LocalPath": "/opt/ml/processing/input",
"S3DataType": "S3Prefix",
"S3InputMode": "File",
"S3DataDistributionType": "FullyReplicated",
"S3CompressionType": "None"
}
}
],
"ProcessingOutputConfig": {
"Outputs": [
{
"OutputName": "output",
"S3Output": {
"S3Uri": "s3://some-bucket/output/",
"LocalPath": "/opt/ml/processing/output",
"S3UploadMode": "EndOfJob"
}
}
]
},
"AppSpecification": {
"ImageUri": "some-image-in-ecr"
},
"StoppingCondition": {
"MaxRuntimeInSeconds": 300
},
"RoleArn": "${role}",
"ProcessingJobName.$": "$$.Execution.Name"
},
"Type": "Task",
"End": true
}
}
@conzy
Copy link
Author

conzy commented Sep 1, 2021

This will make the data in the s3 key s3://some-bucket/input/stuff available on the container filesystem at /opt/ml/processing/input

No need to use boto to load the data or mess with mounts. You can write your output to /opt/ml/processing/output and it will be persisted to the s3 key s3://some-bucket/output/

This is a nice approach as the runtime code does not need to "know about" S3 at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment