Skip to content

Instantly share code, notes, and snippets.

@lenards
Created February 3, 2014 16:18
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lenards/8786944 to your computer and use it in GitHub Desktop.
Save lenards/8786944 to your computer and use it in GitHub Desktop.
S3DistCp failure research
This shows you that you can configure the memory for a daemon, but doesn't show jobtracker:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/MemoryTuning.html
how to configure the jobtracker:
https://github.com/commoncrawl/commoncrawl-crawler/blob/master/bin/launch_emr_parse_job.py#L80
Default Memory setups by EC2 instance type:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HadoopMemoryDefault_H1.0.3.html
## Problems
S3DistCp - "Child Error / Task process exit with nonzero status of 137"
It seems to be a memory issue - likely related to the task trackers have more memory
allocated to them than is available to the overall node:
http://stackoverflow.com/questions/17090951/error-in-simple-hadoop-map-reduce
https://github.com/wbsg/ldif/wiki/Hadoop-troubleshooting
On the AWS forums for EMR, one of the engineers says this:
Exit code 137 means the process received SIGKILL (kill -9),
which usually means it was selected as a victim by the Linux OOM killer.
Oversubscribing memory seems to be a common suspect to the problem, it's mentioned here too:
http://cloudcelebrity.wordpress.com/2013/08/21/hadoop-mapreduce-job-failure-with-java-io-ioexception-task-process-exit-with-nonzero-status-of-137/
The specific Java source code throwing the error can be found here:
https://github.com/apache/hadoop-common/blob/release-1.0.3/src/mapred/org/apache/hadoop/mapred/TaskRunner.java#L251-L260
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment