Skip to content

Instantly share code, notes, and snippets.

@kimutansk
Created April 24, 2016 13:46
Show Gist options
  • Save kimutansk/6ae3520416b15a7831622b42185bb863 to your computer and use it in GitHub Desktop.
Save kimutansk/6ae3520416b15a7831622b42185bb863 to your computer and use it in GitHub Desktop.
Pig on Tez on CDHでGPLライブラリを追加 ref: http://qiita.com/kimutansk/items/35371169e3d7c5022959
/opt/pig/bin/pig -x tez -f test-lzo-count.pig
2016-04-24 20:13:44,815 [main] INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.6.0-cdh5.7.0
PigVersion: 0.15.0
TezVersion: 0.7.0
UserId: build
FileName: test-lzo-count.pig
StartedAt: 2016-04-24 20:13:26
FinishedAt: 2016-04-24 20:13:44
Features: GROUP_BY
Failed!
DAG PigLatin:test-lzo-count.pig-0_scope-0:
ApplicationId: job_1461490474710_0015
TotalLaunchedTasks: 4
FileBytesRead: 0
FileBytesWritten: 0
HdfsBytesRead: 0
HdfsBytesWritten: 0
Input(s):
Failed to read data from "/user/pub/example-lzo"
Output(s):
Failed to produce result in "hdfs://hadoophost01:8020/tmp/temp-1843801261/tmp-1479312519"
2016-04-24 20:13:44,912 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias logs_count. Backend error : Vertex failed, vertexName=scope-10, vertexId=vertex_1461490474710_0015_1_00, diagnostics=[Task failed, taskId=task_1461490474710_0015_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: exceptionThrown=java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
/opt/pig/bin/pig -x tez -f test-lzo-count.pig
2016-04-24 22:16:17,439 [main] INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.6.0-cdh5.7.0
PigVersion: 0.15.0
TezVersion: 0.7.0
UserId: build
FileName: test-lzo-count.pig
StartedAt: 2016-04-24 22:16:06
FinishedAt: 2016-04-24 22:16:17
Features: GROUP_BY
Success!
DAG PigLatin:test-lzo-count.pig-0_scope-0:
ApplicationId: job_1461490474710_0025
TotalLaunchedTasks: 2
FileBytesRead: 99
FileBytesWritten: 67
HdfsBytesRead: 61480
HdfsBytesWritten: 7
Input(s):
Successfully read 1744 records (61480 bytes) from: "/user/pub/example-lzo"
Output(s):
Successfully stored 1 records (7 bytes) in: "hdfs://hadoophost01:8020/tmp/temp-1901693688/tmp-778527532"
(1744)
(1871-01-01,4.44,0.26,0.4,12.46,5.32,84.52,4.95,7.61,,,,,,,)
(1871-02-01,4.5,0.26,0.4,12.84,5.32,83.12,4.8,7.39,,,,,,,)
(1871-03-01,4.61,0.26,0.4,13.03,5.33,83.91,4.73,7.28,,,,,,,)
(1871-04-01,4.74,0.26,0.4,12.56,5.33,89.54,4.91,7.56,,,,,,,)
(1871-05-01,4.86,0.26,0.4,12.27,5.33,93.95,5.03,7.73,,,,,,,)
(1871-06-01,4.82,0.26,0.4,12.08,5.34,94.64,5.11,7.85,,,,,,,)
(1871-07-01,4.73,0.26,0.4,12.08,5.34,92.87,5.11,7.85,,,,,,,)
(1871-08-01,4.79,0.26,0.4,11.89,5.34,95.56,5.19,7.98,,,,,,,)
(1871-09-01,4.84,0.26,0.4,12.18,5.35,94.29,5.07,7.79,,,,,,,)
(1871-10-01,4.59,0.26,0.4,12.37,5.35,88.04,4.99,7.67,,,,,,,)
$ sudo yum install -y lzop
$ lzop data_noheader.csv
$ echo "tez.cluster.additional.classpath.prefix=/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar" >> /opt/pig/conf/pig.properties
/opt/pig/bin/pig -x tez -f test-lzo-count.pig
/opt/pig/bin/pig -x tez -f test-lzo-count.pig
2016-04-24 18:44:47,043 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=FAILED, progress=TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 1 Killed: 1 FailedTaskAttempts: 4, diagnostics=Vertex failed, vertexName=scope-10, vertexId=vertex_1461490474710_0001_1_00, diagnostics=[Task failed, taskId=task_1461490474710_0001_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: exceptionThrown=java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:87)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181)
at org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:152)
at org.apache.tez.mapreduce.lib.MRReaderMapReduce.setSplit(MRReaderMapReduce.java:85)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:614)
at org.apache.tez.mapreduce.input.MRInput.processSplitEvent(MRInput.java:566)
at org.apache.tez.mapreduce.input.MRInput.handleEvents(MRInput.java:530)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:631)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:98)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:694)
at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
... 13 more
(省略)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1461490474710_0001_1_00 [scope-10] killed/failed due to:null]
Vertex killed, vertexName=scope-11, vertexId=vertex_1461490474710_0001_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1461490474710_0001_1_01 [scope-11] killed/failed due to:null]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1, counters=Counters: 4
org.apache.tez.common.counters.DAGCounter
NUM_FAILED_TASKS=4
TOTAL_LAUNCHED_TASKS=4
AM_CPU_MILLISECONDS=1730
AM_GC_TIME_MILLIS=56
$ ln -s /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar /opt/tez-lib/hadoop-lzo.jar
/opt/pig/bin/pig -x tez -f test-lzo-count.pig
(省略)
2016-04-24 20:03:40,460 [main] INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.6.0-cdh5.7.0
PigVersion: 0.15.0
TezVersion: 0.7.0
UserId: hdfs
FileName: /tmp/test-lzo-count.pig
StartedAt: 2016-04-24 20:03:28
FinishedAt: 2016-04-24 20:03:40
Features: GROUP_BY
Success!
DAG PigLatin:test-lzo-count.pig-0_scope-0:
ApplicationId: job_1461490474710_0014
TotalLaunchedTasks: 2
FileBytesRead: 99
FileBytesWritten: 67
HdfsBytesRead: 61480
HdfsBytesWritten: 7
Input(s):
Successfully read 1744 records (61480 bytes) from: "/user/pub/example-lzo"
Output(s):
Successfully stored 1 records (7 bytes) in: "hdfs://hadoophost01:8020/tmp/temp-1823417929/tmp552056591"
(1744)
(1871-01-01,4.44,0.26,0.4,12.46,5.32,84.52,4.95,7.61,,,,,,,)
(1871-02-01,4.5,0.26,0.4,12.84,5.32,83.12,4.8,7.39,,,,,,,)
(1871-03-01,4.61,0.26,0.4,13.03,5.33,83.91,4.73,7.28,,,,,,,)
(1871-04-01,4.74,0.26,0.4,12.56,5.33,89.54,4.91,7.56,,,,,,,)
(1871-05-01,4.86,0.26,0.4,12.27,5.33,93.95,5.03,7.73,,,,,,,)
(1871-06-01,4.82,0.26,0.4,12.08,5.34,94.64,5.11,7.85,,,,,,,)
(1871-07-01,4.73,0.26,0.4,12.08,5.34,92.87,5.11,7.85,,,,,,,)
(1871-08-01,4.79,0.26,0.4,11.89,5.34,95.56,5.19,7.98,,,,,,,)
(1871-09-01,4.84,0.26,0.4,12.18,5.35,94.29,5.07,7.79,,,,,,,)
(1871-10-01,4.59,0.26,0.4,12.37,5.35,88.04,4.99,7.67,,,,,,,)
loadresult = LOAD '/user/pub/example-lzo';
limitedresult = LIMIT loadresult 10;
logs_count = FOREACH (GROUP loadresult ALL) GENERATE COUNT(loadresult);
DUMP logs_count
DUMP limitedresult;
<property>
<name>tez.cluster.additional.classpath.prefix</name>
<value>/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar</value>
</property>
Successfully read 1744 records (61480 bytes) from: "/user/pub/example-lzo"
Successfully read 1744 records (116662 bytes) from: "/user/pub/example-data"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment