Skip to content

Instantly share code, notes, and snippets.

@kimutansk
Last active March 23, 2017 05:10
Show Gist options
  • Save kimutansk/a8f9d8ddd21543d819ac4a577be72608 to your computer and use it in GitHub Desktop.
Save kimutansk/a8f9d8ddd21543d819ac4a577be72608 to your computer and use it in GitHub Desktop.
Pig on TezをCDHクラスタで動作させる ref: http://qiita.com/kimutansk/items/8a91baf476fff4232634
$ /opt/pig/bin/pig -x tez -f test-count.pig
16/04/16 22:21:28 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
16/04/16 22:21:28 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
16/04/16 22:21:28 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
16/04/16 22:21:28 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
16/04/16 22:21:28 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
2016-04-16 22:21:28,858 [main] INFO org.apache.pig.Main - Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35
(省略)
2016-04-16 22:21:32,336 [PigTezLauncher-0] INFO org.apache.pig.tools.pigstats.tez.TezScriptState - Pig script settings are added to the job
2016-04-16 22:21:32,615 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Tez Client Version: [ component=tez-api, version=0.7.0, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20150527-0953 ]
2016-04-16 22:21:32,679 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoophost01/192.168.100.231:8032
2016-04-16 22:21:32,767 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Session mode. Starting session.
2016-04-16 22:21:32,767 [PigTezLauncher-0] INFO org.apache.tez.client.TezClientUtils - Using tez.lib.uris value from configuration: hdfs://hadoophost01:8020/user/app/tez-0.7.0.tar.gz
(省略)
2016-04-16 22:21:43,318 [main] INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.6.0-cdh5.7.0
PigVersion: 0.15.0
TezVersion: 0.7.0
UserId: build
FileName: test-count.pig
StartedAt: 2016-04-16 22:21:31
FinishedAt: 2016-04-16 22:21:43
Features: GROUP_BY
Success!
DAG PigLatin:test-count.pig-0_scope-0:
ApplicationId: job_1460385153614_0004
TotalLaunchedTasks: 2
FileBytesRead: 99
FileBytesWritten: 67
HdfsBytesRead: 116662
HdfsBytesWritten: 7
Input(s):
Successfully read 1744 records (116662 bytes) from: "/user/pub/example-data"
(省略)
(1744) # レコードの個数
(省略)
(1871-01-01,4.44,0.26,0.4,12.46,5.32,84.52,4.95,7.61,,,,,,,)
(1871-02-01,4.5,0.26,0.4,12.84,5.32,83.12,4.8,7.39,,,,,,,)
(1871-03-01,4.61,0.26,0.4,13.03,5.33,83.91,4.73,7.28,,,,,,,)
(1871-04-01,4.74,0.26,0.4,12.56,5.33,89.54,4.91,7.56,,,,,,,)
(1871-05-01,4.86,0.26,0.4,12.27,5.33,93.95,5.03,7.73,,,,,,,)
(1871-06-01,4.82,0.26,0.4,12.08,5.34,94.64,5.11,7.85,,,,,,,)
(1871-07-01,4.73,0.26,0.4,12.08,5.34,92.87,5.11,7.85,,,,,,,)
(1871-08-01,4.79,0.26,0.4,11.89,5.34,95.56,5.19,7.98,,,,,,,)
(1871-09-01,4.84,0.26,0.4,12.18,5.35,94.29,5.07,7.79,,,,,,,)
(1871-10-01,4.59,0.26,0.4,12.37,5.35,88.04,4.99,7.67,,,,,,,)
$ wget https://raw.githubusercontent.com/datasets/s-and-p-500/master/data/data.csv
$ cat data.csv | sed 1d > data_noheader.csv
$ wget https://raw.githubusercontent.com/datasets/s-and-p-500/master/data/data.csv
$ cat data.csv | sed 1d > data_noheader.csv
/opt/pig/bin/pig -x tez -f test-count.pig
/opt/pig/bin/pig -x tez -f test-count.pig
$ wget http://ftp.tsukuba.wide.ad.jp/software/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
$ tar xvzf apache-maven-3.3.9-bin.tar.gz
$ mv apache-maven-3.3.9 /opt/
$ ln -s /opt/apache-maven-3.3.9 /opt/apache-maven
$ echo "export MAVEN_HOME=/opt/apache-maven" >> ~/.bashrc
$ echo -e 'export PATH=${PATH}:${MAVEN_HOME}/bin' >> ~/.bashrc
$ source ~/.bashrc
JAVA_HOME=/usr/java/jdk1.8.0_60
export TEZ_CONF_DIR=/opt/tez-conf
export TEZ_JARS=/opt/tez-lib/*:/opt/tez-lib/lib/*
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}
$ wget http://ftp.tsukuba.wide.ad.jp/software/apache/pig/pig-0.15.0/pig-0.15.0.tar.gz
$ tar xvpf pig-0.15.0.tar.gz
$ mv pig-0.15.0 /opt/
$ ln -s /opt/pig-0.15.0 /opt/pig
$ wget ftp://ftp.muug.mb.ca/mirror/centos/7.2.1511/os/x86_64/Packages/protobuf-2.5.0-8.el7.x86_64.rpm
$ wget ftp://ftp.muug.mb.ca/mirror/centos/7.2.1511/os/x86_64/Packages/protobuf-compiler-2.5.0-8.el7.x86_64.rpm
$ sudo rpm -ivh protobuf-2.5.0-8.el7.x86_64.rpm
$ sudo rpm -ivh protobuf-compiler-2.5.0-8.el7.x86_64.rpm
loadresult = LOAD '/user/pub/example-data';
limitedresult = LIMIT loadresult 10;
logs_count = FOREACH (GROUP loadresult ALL) GENERATE COUNT(loadresult);
DUMP logs_count
DUMP limitedresult;
<configuration>
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/user/app/tez-0.7.0.tar.gz</value>
</property>
</configuration>
$ wget http://ftp.jaist.ac.jp/pub/apache/tez/0.7.0/apache-tez-0.7.0-src.tar.gz
$ tar xvpf apache-tez-0.7.0-src.tar.gz
$ cd apache-tez-0.7.0-src
$ sed -i -e "s|<pig.version>0.13.0</pig.version>|<pig.version>0.15.0</pig.version>|g" pom.xml
$ mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment