Created
September 24, 2012 19:46
-
-
Save ceteri/3777924 to your computer and use it in GitHub Desktop.
Debugging the Cascasding / Pig comparison - part 4
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ java -version | |
java version "1.6.0_35" | |
Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) | |
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) | |
bash-3.2$ pig -version | |
Warning: $HADOOP_HOME is deprecated. | |
Apache Pig version 0.10.0 (r1328203) | |
compiled Apr 19 2012, 22:54:12 | |
bash-3.2$ cat src/scripts/wc.pig | |
SET pig.exec.mapPartAgg true | |
docPipe = LOAD '$docPath' USING PigStorage('\t', 'tagsource') AS (doc_id, text); | |
docPipe = FILTER docPipe BY doc_id != 'doc_id'; | |
stopPipe = LOAD '$stopPath' USING PigStorage('\t', 'tagsource') AS (stop:chararray); | |
stopPipe = FILTER stopPipe BY stop != 'stop'; | |
-- specify a regex operation to split the "document" text lines into a token stream | |
tokenPipe = FOREACH docPipe GENERATE doc_id, FLATTEN(TOKENIZE(LOWER(text), ' [](),.')) AS token; | |
tokenPipe = FILTER tokenPipe BY token MATCHES '\\w.*'; | |
--- perform a left join to remove stop words, discarding the rows | |
--- which joined with stop words, i.e., were non-null after left join | |
tokenPipe = JOIN tokenPipe BY token LEFT, stopPipe BY stop USING 'replicated'; | |
tokenPipe = FILTER tokenPipe BY stopPipe::stop IS NULL; | |
-- determine the word counts | |
tokenGroups = GROUP tokenPipe BY token; | |
wcPipe = FOREACH tokenGroups GENERATE group AS token, COUNT(tokenPipe) AS count; | |
-- output | |
STORE wcPipe INTO '$wcPath' using PigStorage('\t', 'tagsource'); | |
-- explain wcPipe | |
-- EXPLAIN -out dot/wc_pig.dot -dot wcPipe; | |
bash-3.2$ | |
bash-3.2$ pig -p docPath=./data/rain.txt -p wcPath=./output/wc -p stopPath=./data/en.stop ./src/scripts/wc.pig | |
Warning: $HADOOP_HOME is deprecated. | |
2012-09-24 12:45:07,700 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12 | |
2012-09-24 12:45:07,701 [main] INFO org.apache.pig.Main - Logging error messages to: /Users/ceteri/src/concur/Impatient/part4/pig_1348515907698.log | |
2012-09-24 12:45:07.798 java[27450:1903] Unable to load realm info from SCDynamicStore | |
2012-09-24 12:45:08,011 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// | |
2012-09-24 12:45:08,604 [main] WARN org.apache.pig.PigServer - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s). | |
2012-09-24 12:45:08,604 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s). | |
2012-09-24 12:45:08,610 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: REPLICATED_JOIN,GROUP_BY,FILTER | |
2012-09-24 12:45:08,744 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false | |
2012-09-24 12:45:08,748 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - number of input files: 0 | |
2012-09-24 12:45:08,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner | |
2012-09-24 12:45:08,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 2 | |
2012-09-24 12:45:08,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 | |
2012-09-24 12:45:08,791 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job | |
2012-09-24 12:45:08,805 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 | |
2012-09-24 12:45:08,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4803978330787456834.jar | |
2012-09-24 12:45:12,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4803978330787456834.jar created | |
2012-09-24 12:45:12,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job | |
2012-09-24 12:45:12,842 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. | |
2012-09-24 12:45:12,851 [Thread-5] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
2012-09-24 12:45:12,962 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 | |
2012-09-24 12:45:12,962 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 | |
2012-09-24 12:45:12,968 [Thread-5] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded | |
2012-09-24 12:45:12,970 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 | |
2012-09-24 12:45:13,173 [Thread-6] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : null | |
2012-09-24 12:45:13,187 [Thread-6] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/Users/ceteri/src/concur/Impatient/part4/data/en.stop:0+544 | |
2012-09-24 12:45:13,226 [Thread-6] INFO org.apache.hadoop.mapred.Task - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting | |
2012-09-24 12:45:13,229 [Thread-6] INFO org.apache.hadoop.mapred.LocalJobRunner - | |
2012-09-24 12:45:13,230 [Thread-6] INFO org.apache.hadoop.mapred.Task - Task attempt_local_0001_m_000000_0 is allowed to commit now | |
2012-09-24 12:45:13,232 [Thread-6] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp-784464709/tmp-1562075934 | |
2012-09-24 12:45:13,343 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001 | |
2012-09-24 12:45:13,344 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete | |
2012-09-24 12:45:16,152 [Thread-6] INFO org.apache.hadoop.mapred.LocalJobRunner - | |
2012-09-24 12:45:16,153 [Thread-6] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local_0001_m_000000_0' done. | |
2012-09-24 12:45:16,153 [Thread-6] WARN org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in cleanup | |
2012-09-24 12:45:18,351 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete | |
2012-09-24 12:45:18,354 [main] WARN org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for job job_local_0001 | |
2012-09-24 12:45:18,355 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job | |
2012-09-24 12:45:18,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 | |
2012-09-24 12:45:18,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7986604753498455503.jar | |
2012-09-24 12:45:22,159 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7986604753498455503.jar created | |
2012-09-24 12:45:22,165 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job | |
2012-09-24 12:45:22,175 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=510 | |
2012-09-24 12:45:22,175 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1 | |
2012-09-24 12:45:22,207 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. | |
2012-09-24 12:45:22,321 [Thread-8] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 | |
2012-09-24 12:45:22,321 [Thread-8] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 | |
2012-09-24 12:45:22,322 [Thread-8] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 | |
2012-09-24 12:45:22,361 [Thread-8] INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager - Creating tmp-1562075934 in /tmp/hadoop-ceteri/mapred/local/archive/960849357550769533_1137499495_2043666331/file/tmp/temp-784464709-work--4096855929590518796 with rwxr-xr-x | |
2012-09-24 12:45:22,368 [Thread-8] INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager - Cached file:/tmp/temp-784464709/tmp-1562075934#pigrepl_scope-30_1784681719_1348515922166_1 as /tmp/hadoop-ceteri/mapred/local/archive/960849357550769533_1137499495_2043666331/file/tmp/temp-784464709/tmp-1562075934 | |
2012-09-24 12:45:22,379 [Thread-8] INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager - Cached file:/tmp/temp-784464709/tmp-1562075934#pigrepl_scope-30_1784681719_1348515922166_1 as /tmp/hadoop-ceteri/mapred/local/archive/960849357550769533_1137499495_2043666331/file/tmp/temp-784464709/tmp-1562075934 | |
2012-09-24 12:45:22,379 [Thread-8] WARN org.apache.hadoop.mapred.LocalJobRunner - LocalJobRunner does not support symlinking into current working dir. | |
2012-09-24 12:45:22,379 [Thread-8] INFO org.apache.hadoop.mapred.TaskRunner - Creating symlink: /tmp/hadoop-ceteri/mapred/local/archive/960849357550769533_1137499495_2043666331/file/tmp/temp-784464709/tmp-1562075934 <- /tmp/hadoop-ceteri/mapred/local/localRunner/pigrepl_scope-30_1784681719_1348515922166_1 | |
2012-09-24 12:45:22,457 [Thread-13] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : null | |
2012-09-24 12:45:22,463 [Thread-13] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/Users/ceteri/src/concur/Impatient/part4/data/rain.txt:0+510 | |
2012-09-24 12:45:22,469 [Thread-13] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100 | |
2012-09-24 12:45:22,586 [Thread-13] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720 | |
2012-09-24 12:45:22,586 [Thread-13] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680 | |
2012-09-24 12:45:22,656 [Thread-13] WARN org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in cleanup | |
2012-09-24 12:45:22,657 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0002 | |
org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function. | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:125) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.setUpHashMap(POFRJoin.java:337) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNext(POFRJoin.java:212) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPreCombinerLocalRearrange.getNext(POPreCombinerLocalRearrange.java:126) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPartialAgg.getNext(POPartialAgg.java:159) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) | |
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271) | |
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266) | |
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) | |
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) | |
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) | |
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) | |
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) | |
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/Users/ceteri/src/concur/Impatient/part4/pigrepl_scope-30_1784681719_1348515922166_1 | |
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) | |
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) | |
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) | |
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:154) | |
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:116) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:93) | |
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:121) | |
... 21 more | |
2012-09-24 12:45:22,708 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002 | |
2012-09-24 12:45:27,719 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0002 has failed! Stop running all dependent jobs | |
2012-09-24 12:45:27,720 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete | |
2012-09-24 12:45:27,720 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! | |
2012-09-24 12:45:27,722 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: | |
HadoopVersion PigVersion UserId StartedAt FinishedAt Features | |
1.0.3 0.10.0 ceteri 2012-09-24 12:45:08 2012-09-24 12:45:27 REPLICATED_JOIN,GROUP_BY,FILTER | |
Some jobs have failed! Stop running all dependent jobs | |
Job Stats (time in seconds): | |
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs | |
job_local_0001 1 0 n/a n/a n/a 0 0 0 stopPipe MAP_ONLY | |
Failed Jobs: | |
JobId Alias Feature Message Outputs | |
job_local_0002 docPipe,tokenGroups,tokenPipe,wcPipe REPLICATED_JOIN,GROUP_BY,COMBINER,MAP_PARTIALAGG Message: Job failed! Error - NA file:///Users/ceteri/src/concur/Impatient/part4/output/wc, | |
Input(s): | |
Successfully read 0 records from: "file:///Users/ceteri/src/concur/Impatient/part4/data/en.stop" | |
Failed to read data from "file:///Users/ceteri/src/concur/Impatient/part4/data/rain.txt" | |
Output(s): | |
Failed to produce result in "file:///Users/ceteri/src/concur/Impatient/part4/output/wc" | |
Counters: | |
Total records written : 0 | |
Total bytes written : 0 | |
Spillable Memory Manager spill count : 0 | |
Total bags proactively spilled: 0 | |
Total records proactively spilled: 0 | |
Job DAG: | |
job_local_0001 -> job_local_0002, | |
job_local_0002 | |
2012-09-24 12:45:27,722 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs | |
2012-09-24 12:45:27,725 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message | |
Details at logfile: /Users/ceteri/src/concur/Impatient/part4/pig_1348515907698.log | |
bash-3.2$ | |
bash-3.2$ cat pig_1348515907698.log | |
Pig Stack Trace | |
--------------- | |
ERROR 2244: Job failed, hadoop does not return any error message | |
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message | |
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140) | |
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193) | |
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) | |
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) | |
at org.apache.pig.Main.run(Main.java:555) | |
at org.apache.pig.Main.main(Main.java:111) | |
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) | |
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) | |
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) | |
at java.lang.reflect.Method.invoke(Method.java:597) | |
at org.apache.hadoop.util.RunJar.main(RunJar.java:156) | |
================================================================================ | |
bash-3.2$ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment