Created
February 3, 2014 16:18
-
-
Save lenards/8786944 to your computer and use it in GitHub Desktop.
S3DistCp failure research
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This shows you that you can configure the memory for a daemon, but doesn't show jobtracker: | |
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/MemoryTuning.html | |
how to configure the jobtracker: | |
https://github.com/commoncrawl/commoncrawl-crawler/blob/master/bin/launch_emr_parse_job.py#L80 | |
Default Memory setups by EC2 instance type: | |
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HadoopMemoryDefault_H1.0.3.html | |
## Problems | |
S3DistCp - "Child Error / Task process exit with nonzero status of 137" | |
It seems to be a memory issue - likely related to the task trackers have more memory | |
allocated to them than is available to the overall node: | |
http://stackoverflow.com/questions/17090951/error-in-simple-hadoop-map-reduce | |
https://github.com/wbsg/ldif/wiki/Hadoop-troubleshooting | |
On the AWS forums for EMR, one of the engineers says this: | |
Exit code 137 means the process received SIGKILL (kill -9), | |
which usually means it was selected as a victim by the Linux OOM killer. | |
Oversubscribing memory seems to be a common suspect to the problem, it's mentioned here too: | |
http://cloudcelebrity.wordpress.com/2013/08/21/hadoop-mapreduce-job-failure-with-java-io-ioexception-task-process-exit-with-nonzero-status-of-137/ | |
The specific Java source code throwing the error can be found here: | |
https://github.com/apache/hadoop-common/blob/release-1.0.3/src/mapred/org/apache/hadoop/mapred/TaskRunner.java#L251-L260 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment