Skip to content

Instantly share code, notes, and snippets.

@ceteri
Created September 24, 2012 19:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ceteri/3777924 to your computer and use it in GitHub Desktop.
Save ceteri/3777924 to your computer and use it in GitHub Desktop.
Debugging the Cascasding / Pig comparison - part 4
bash-3.2$ java -version
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)
bash-3.2$ pig -version
Warning: $HADOOP_HOME is deprecated.
Apache Pig version 0.10.0 (r1328203)
compiled Apr 19 2012, 22:54:12
bash-3.2$ cat src/scripts/wc.pig
SET pig.exec.mapPartAgg true
docPipe = LOAD '$docPath' USING PigStorage('\t', 'tagsource') AS (doc_id, text);
docPipe = FILTER docPipe BY doc_id != 'doc_id';
stopPipe = LOAD '$stopPath' USING PigStorage('\t', 'tagsource') AS (stop:chararray);
stopPipe = FILTER stopPipe BY stop != 'stop';
-- specify a regex operation to split the "document" text lines into a token stream
tokenPipe = FOREACH docPipe GENERATE doc_id, FLATTEN(TOKENIZE(LOWER(text), ' [](),.')) AS token;
tokenPipe = FILTER tokenPipe BY token MATCHES '\\w.*';
--- perform a left join to remove stop words, discarding the rows
--- which joined with stop words, i.e., were non-null after left join
tokenPipe = JOIN tokenPipe BY token LEFT, stopPipe BY stop USING 'replicated';
tokenPipe = FILTER tokenPipe BY stopPipe::stop IS NULL;
-- determine the word counts
tokenGroups = GROUP tokenPipe BY token;
wcPipe = FOREACH tokenGroups GENERATE group AS token, COUNT(tokenPipe) AS count;
-- output
STORE wcPipe INTO '$wcPath' using PigStorage('\t', 'tagsource');
-- explain wcPipe
-- EXPLAIN -out dot/wc_pig.dot -dot wcPipe;
bash-3.2$
bash-3.2$ pig -p docPath=./data/rain.txt -p wcPath=./output/wc -p stopPath=./data/en.stop ./src/scripts/wc.pig
Warning: $HADOOP_HOME is deprecated.
2012-09-24 12:45:07,700 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
2012-09-24 12:45:07,701 [main] INFO org.apache.pig.Main - Logging error messages to: /Users/ceteri/src/concur/Impatient/part4/pig_1348515907698.log
2012-09-24 12:45:07.798 java[27450:1903] Unable to load realm info from SCDynamicStore
2012-09-24 12:45:08,011 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2012-09-24 12:45:08,604 [main] WARN org.apache.pig.PigServer - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s).
2012-09-24 12:45:08,604 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s).
2012-09-24 12:45:08,610 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: REPLICATED_JOIN,GROUP_BY,FILTER
2012-09-24 12:45:08,744 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-09-24 12:45:08,748 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - number of input files: 0
2012-09-24 12:45:08,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2012-09-24 12:45:08,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 2
2012-09-24 12:45:08,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2
2012-09-24 12:45:08,791 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-09-24 12:45:08,805 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-09-24 12:45:08,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4803978330787456834.jar
2012-09-24 12:45:12,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4803978330787456834.jar created
2012-09-24 12:45:12,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-09-24 12:45:12,842 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-09-24 12:45:12,851 [Thread-5] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2012-09-24 12:45:12,962 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-09-24 12:45:12,962 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-09-24 12:45:12,968 [Thread-5] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2012-09-24 12:45:12,970 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-09-24 12:45:13,173 [Thread-6] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : null
2012-09-24 12:45:13,187 [Thread-6] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/Users/ceteri/src/concur/Impatient/part4/data/en.stop:0+544
2012-09-24 12:45:13,226 [Thread-6] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
2012-09-24 12:45:13,229 [Thread-6] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-09-24 12:45:13,230 [Thread-6] INFO org.apache.hadoop.mapred.Task - Task attempt_local_0001_m_000000_0 is allowed to commit now
2012-09-24 12:45:13,232 [Thread-6] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp-784464709/tmp-1562075934
2012-09-24 12:45:13,343 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2012-09-24 12:45:13,344 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-09-24 12:45:16,152 [Thread-6] INFO org.apache.hadoop.mapred.LocalJobRunner -
2012-09-24 12:45:16,153 [Thread-6] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_000000_0' done.
2012-09-24 12:45:16,153 [Thread-6] WARN org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in cleanup
2012-09-24 12:45:18,351 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-09-24 12:45:18,354 [main] WARN org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for job job_local_0001
2012-09-24 12:45:18,355 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2012-09-24 12:45:18,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-09-24 12:45:18,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7986604753498455503.jar
2012-09-24 12:45:22,159 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7986604753498455503.jar created
2012-09-24 12:45:22,165 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-09-24 12:45:22,175 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=510
2012-09-24 12:45:22,175 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
2012-09-24 12:45:22,207 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2012-09-24 12:45:22,321 [Thread-8] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-09-24 12:45:22,321 [Thread-8] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-09-24 12:45:22,322 [Thread-8] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-09-24 12:45:22,361 [Thread-8] INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager - Creating tmp-1562075934 in /tmp/hadoop-ceteri/mapred/local/archive/960849357550769533_1137499495_2043666331/file/tmp/temp-784464709-work--4096855929590518796 with rwxr-xr-x
2012-09-24 12:45:22,368 [Thread-8] INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager - Cached file:/tmp/temp-784464709/tmp-1562075934#pigrepl_scope-30_1784681719_1348515922166_1 as /tmp/hadoop-ceteri/mapred/local/archive/960849357550769533_1137499495_2043666331/file/tmp/temp-784464709/tmp-1562075934
2012-09-24 12:45:22,379 [Thread-8] INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager - Cached file:/tmp/temp-784464709/tmp-1562075934#pigrepl_scope-30_1784681719_1348515922166_1 as /tmp/hadoop-ceteri/mapred/local/archive/960849357550769533_1137499495_2043666331/file/tmp/temp-784464709/tmp-1562075934
2012-09-24 12:45:22,379 [Thread-8] WARN org.apache.hadoop.mapred.LocalJobRunner - LocalJobRunner does not support symlinking into current working dir.
2012-09-24 12:45:22,379 [Thread-8] INFO org.apache.hadoop.mapred.TaskRunner - Creating symlink: /tmp/hadoop-ceteri/mapred/local/archive/960849357550769533_1137499495_2043666331/file/tmp/temp-784464709/tmp-1562075934 <- /tmp/hadoop-ceteri/mapred/local/localRunner/pigrepl_scope-30_1784681719_1348515922166_1
2012-09-24 12:45:22,457 [Thread-13] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : null
2012-09-24 12:45:22,463 [Thread-13] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/Users/ceteri/src/concur/Impatient/part4/data/rain.txt:0+510
2012-09-24 12:45:22,469 [Thread-13] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2012-09-24 12:45:22,586 [Thread-13] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2012-09-24 12:45:22,586 [Thread-13] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2012-09-24 12:45:22,656 [Thread-13] WARN org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in cleanup
2012-09-24 12:45:22,657 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:125)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.setUpHashMap(POFRJoin.java:337)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNext(POFRJoin.java:212)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNext(POPreCombinerLocalRearrange.java:126)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPartialAgg.getNext(POPartialAgg.java:159)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/ceteri/src/concur/Impatient/part4/pigrepl_scope-30_1784681719_1348515922166_1
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:154)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:116)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:93)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:121)
... 21 more
2012-09-24 12:45:22,708 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
2012-09-24 12:45:27,719 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0002 has failed! Stop running all dependent jobs
2012-09-24 12:45:27,720 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-09-24 12:45:27,720 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2012-09-24 12:45:27,722 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.3 0.10.0 ceteri 2012-09-24 12:45:08 2012-09-24 12:45:27 REPLICATED_JOIN,GROUP_BY,FILTER
Some jobs have failed! Stop running all dependent jobs
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
job_local_0001 1 0 n/a n/a n/a 0 0 0 stopPipe MAP_ONLY
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0002 docPipe,tokenGroups,tokenPipe,wcPipe REPLICATED_JOIN,GROUP_BY,COMBINER,MAP_PARTIALAGG Message: Job failed! Error - NA file:///Users/ceteri/src/concur/Impatient/part4/output/wc,
Input(s):
Successfully read 0 records from: "file:///Users/ceteri/src/concur/Impatient/part4/data/en.stop"
Failed to read data from "file:///Users/ceteri/src/concur/Impatient/part4/data/rain.txt"
Output(s):
Failed to produce result in "file:///Users/ceteri/src/concur/Impatient/part4/output/wc"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local_0001 -> job_local_0002,
job_local_0002
2012-09-24 12:45:27,722 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2012-09-24 12:45:27,725 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /Users/ceteri/src/concur/Impatient/part4/pig_1348515907698.log
bash-3.2$
bash-3.2$ cat pig_1348515907698.log
Pig Stack Trace
---------------
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
================================================================================
bash-3.2$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment