Skip to content

Instantly share code, notes, and snippets.

@ranjitiyer
Created February 26, 2019 20:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ranjitiyer/9568906c49bc0e58ff5d0428583baaaa to your computer and use it in GitHub Desktop.
Save ranjitiyer/9568906c49bc0e58ff5d0428583baaaa to your computer and use it in GitHub Desktop.
EMR Steps
Hadoop Streaming step
{
"jobFlowId": "j-1JKGNMYXT59DP",
"steps": [
{
"name": "EMRHadoopLogPushingTest",
"actionOnFailure": "CANCEL_AND_WAIT",
"hadoopJarStep": {
"jar": "command-runner.jar",
"args": [
"/usr/bin/hadoop-streaming",
"-Dmapred.child.java.opts=-Xmx1024m",
"-files",
"<s3 location to py file containing mapper and reduer>",
"-input",
"<s3 location to input file>",
"-output",
"<s3 location for output>",
"-mapper",
"<py file containing the mapper>",
"-reducer",
"aggregate"
]
}
}
],
"progressListener": {},
"requestClientOptions": {
"markers": {},
"readLimit": 131073
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment