Created
September 1, 2021 19:28
-
-
Save conzy/259be0d34b2f6c19d9a6175d948a0819 to your computer and use it in GitHub Desktop.
Example of using a Step Function to orchestrate a containerised Sagemaker processing job
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"Processing": { | |
"Resource": "arn:aws:states:::sagemaker:createProcessingJob.sync", | |
"Parameters": { | |
"ProcessingResources": { | |
"ClusterConfig": { | |
"InstanceCount": 1, | |
"InstanceType": "ml.m5.large", | |
"VolumeSizeInGB": 10 | |
} | |
}, | |
"ProcessingInputs": [ | |
{ | |
"InputName": "input-1", | |
"S3Input": { | |
"S3Uri": "s3://some-bucket/input/stuff", | |
"LocalPath": "/opt/ml/processing/input", | |
"S3DataType": "S3Prefix", | |
"S3InputMode": "File", | |
"S3DataDistributionType": "FullyReplicated", | |
"S3CompressionType": "None" | |
} | |
} | |
], | |
"ProcessingOutputConfig": { | |
"Outputs": [ | |
{ | |
"OutputName": "output", | |
"S3Output": { | |
"S3Uri": "s3://some-bucket/output/", | |
"LocalPath": "/opt/ml/processing/output", | |
"S3UploadMode": "EndOfJob" | |
} | |
} | |
] | |
}, | |
"AppSpecification": { | |
"ImageUri": "some-image-in-ecr" | |
}, | |
"StoppingCondition": { | |
"MaxRuntimeInSeconds": 300 | |
}, | |
"RoleArn": "${role}", | |
"ProcessingJobName.$": "$$.Execution.Name" | |
}, | |
"Type": "Task", | |
"End": true | |
} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This will make the data in the s3 key
s3://some-bucket/input/stuff
available on the container filesystem at/opt/ml/processing/input
No need to use boto to load the data or mess with mounts. You can write your output to
/opt/ml/processing/output
and it will be persisted to the s3 keys3://some-bucket/output/
This is a nice approach as the runtime code does not need to "know about" S3 at all.