Last active
August 29, 2015 14:06
-
-
Save tomthetrainer/cdfc1ef4d7a890bdd20b to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fellow instructors, | |
I am teaching the DataAnalyst class so I thought I would write up a demo and share it. I tried when possible to make it fit into the labs and demo collections that the DA class has already. I hope you find this useful. | |
A student in class today asked me to provide a demo of pig over hbase. | |
It took me a few tries to get things to work, so my script here got a little muddy. I thought I would include it anyhow, I think I got the important steps recorded in this email. | |
I am teaching the DataAnalyst class so I thought I would write up a demo and share it. I tried when possible to make it fit into the labs and demo collections that the DA class has already. I hope you find this useful. | |
Here are the steps. | |
The VM does not start HBase by default, you will have to start it manually. I believe that once you start it, it will continue to start when VM is rebooted. | |
### Pig History | |
cd /apps/hive/warehouse/emp | |
ls | |
cat 'emp.txt' | |
cat emp.txt | |
emp = load 'emp.tx' as (name:chararray, state:chararray); | |
emp = load 'emp.txt' as (name:chararray, state:chararray); | |
ca_emp = filter emp by (state='ca'); | |
ca_emp = filter emp by state='ca'; | |
ca_emp = filter emp by (state=='ca'); | |
store ca_emp into 'ca_emp' | |
; | |
quit | |
wd | |
pwd | |
cd /apps/hive/warehouse/emp | |
quit | |
quit; | |
ls | |
quit | |
cd ca_emp | |
ls | |
pwd | |
ls | |
cd /apps/hive/warehouse/emp | |
ls | |
cd ca_emp | |
ls | |
cat part-m-00000 | |
quit | |
A = LOAD '/user/root/whitehouse/' USING | |
TextLoader(); | |
describe A; | |
A_limit = limit A 10; | |
dump A_limit; | |
B = LOAD '/user/root/whitehouse/visits.txt' USING | |
PigStorage(',') AS ( | |
lname:chararray, | |
fname:chararray, | |
mname:chararray, | |
id:chararray, | |
status:chararray, | |
state:chararray, | |
arrival:chararray | |
); | |
store B into 'whouse_tab' using PigStorage('\t'); | |
ls whouse_tab | |
cat whouse_tab/part-m-00000 | |
### | |
Create hive table | |
set hbase.zookeeper.quorum=localhost; | |
set hive.zookeeper.client.port=2181; | |
set zookeeper.znode.parent=/hbase-unsecure; | |
create external table pigdemo(id int, name string, state string) | |
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' | |
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,a:name,a:state') | |
TBLPROPERTIES ('hbase.table.name' = 'pigdemo'); | |
### | |
Pig script to read from hbase | |
REGISTER /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/hbase-server-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/hbase-protocol-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/htrace-core-2.01.jar | |
REGISTER /usr/lib/hbase/lib/zookeeper.jar | |
REGISTER /usr/lib/hbase/lib/guava-12.0.1.jar | |
REGISTER /usr/lib/hbase/lib/zookeeper.jar | |
REGISTER /usr/lib/hbase/lib/hbase-*.jar | |
REGISTER /usr/lib/hadoop/hadoop*.jar | |
REGISTER /usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar | |
emp = load 'hbase://pigdemo' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('a:state,a:name','-loadKey true') | |
AS(id:bytearray, state:chararray,name:chararray); | |
emp_ca = filter emp by(id==1); | |
dump emp_ca; | |
### | |
Hive table.. | |
create external table pigdemo(id int, name string, state string) | |
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' | |
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,a:name,a:state') | |
show tables; | |
select * from pigdemo; | |
I dislike running things as root, but that is the only user available in the DA vm so I will run with it. | |
As root run | |
/root/start_hbase.sh | |
You will need to create a table in HBase. | |
Launch the hbase shell. The HBase shell is a Jruby shell so a single quit with no quotes or semicolon is how you leave it. A newline is the command delimiter. | |
# hbase shell | |
create ‘pigdemo’, ‘a’ | |
List table to confirm. | |
hbase(main):001:0> list | |
TABLE | |
ambarismoketest | |
emp | |
pigdemo | |
3 row(s) in 1.4410 seconds | |
=> ["ambarismoketest", "emp", "pigdemo"] | |
That is the first HBase part done. | |
Summary: an HBase table needs nothing more than a table name and a single column family. In this case table name pig demo column family "a" | |
Now the pig part. Modify the pigdemo.txt so we have a good rowkey. | |
1 SD Rich | |
2 NV Barry | |
3 CO George | |
4 CA Ulf | |
5 IL Danielle | |
6 OH Tom | |
7 CA manish | |
8 CA Brian | |
8 CO Mark | |
Put the ID,state,name dataset into hdfs. | |
hadoop fs -put pigdemo.txt pigdemo.txt | |
*note I changed the original pigdemo.txt to have the id field | |
Write a pig script that looks like this. | |
REGISTER /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/hbase-server-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/hbase-protocol-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/htrace-core-2.01.jar | |
REGISTER /usr/lib/hbase/lib/zookeeper.jar | |
REGISTER /usr/lib/hbase/lib/guava-12.0.1.jar | |
REGISTER /usr/lib/hbase/lib/zookeeper.jar | |
REGISTER /usr/lib/hbase/lib/hbase-*.jar | |
REGISTER /usr/lib/hadoop/hadoop*.jar | |
REGISTER /usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar | |
emp = load 'pigdemo.txt' as(id:chararray, state:chararray, name:chararray); | |
store emp into 'hbase://pigdemo' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('a:state,a:name'); | |
Run the pig script | |
pig pigscript.txt | |
Verify the data in hbase. | |
hbase(main):002:0> scan 'pigdemo' | |
ROW COLUMN+CELL | |
1 column=a:name, timestamp=1403065549589, value=Rich | |
1 column=a:state, timestamp=1403065549589, value=SD | |
2 column=a:name, timestamp=1403065549590, value=Barry | |
2 column=a:state, timestamp=1403065549590, value=NV | |
3 column=a:name, timestamp=1403065549591, value=George | |
3 column=a:state, timestamp=1403065549591, value=CO | |
4 column=a:name, timestamp=1403065549591, value=Ulf | |
4 column=a:state, timestamp=1403065549591, value=CA | |
5 column=a:name, timestamp=1403065549591, value=Danielle | |
5 column=a:state, timestamp=1403065549591, value=IL | |
6 column=a:name, timestamp=1403065549591, value=Tom | |
6 column=a:state, timestamp=1403065549591, value=OH | |
7 column=a:name, timestamp=1403065549591, value=manish | |
7 column=a:state, timestamp=1403065549591, value=CA | |
8 column=a:name, timestamp=1403065549591, value=Mark | |
8 column=a:state, timestamp=1403065549591, value=CO | |
To read from hbase into pig. | |
REGISTER /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/hbase-server-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/hbase-protocol-0.96.0.2.0.6.0-76-hadoop2.jar | |
REGISTER /usr/lib/hbase/lib/htrace-core-2.01.jar | |
REGISTER /usr/lib/hbase/lib/zookeeper.jar | |
REGISTER /usr/lib/hbase/lib/guava-12.0.1.jar | |
REGISTER /usr/lib/hbase/lib/zookeeper.jar | |
REGISTER /usr/lib/hbase/lib/hbase-*.jar | |
REGISTER /usr/lib/hadoop/hadoop*.jar | |
REGISTER /usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar | |
emp = load 'hbase://pigdemo' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('a:state,a:name','-loadKey true') | |
AS(id:bytearray, state:chararray,name:chararray); | |
Some other settings that might be needed. Only make these changes if the above example fails.. | |
/etc/hadoop/conf/hadoop_env.sh | |
## added these lines. | |
HBASE_JARS= | |
for f in $HBASE_HOME/lib/*.jar; do | |
HBASE_JARS=${HBASE_JARS}:$f; | |
done | |
export HADOOP_CLASSPATH=$HBASE_JARS:$HADOOP_CLASSPATH | |
/usr/lib/pig/conf/pig-env.sh | |
## added | |
export HBASE_HOME=/usr/lib/hbase | |
To get hive working I did the following.. | |
mkdir /usr/lib/hive/auxlib | |
cp /usr/lib/hive/lib/hive-hbase-handler-0.12.0.2.0.6.0-76.jar /usr/lib/hive/auxlib | |
cp /usr/lib/zookeeper/zookeeper-3.4.5.2.0.6.0-76.jar /usr/lib/hive/auxlib | |
cp /usr/lib/hbase/lib/hbase*.jar /usr/lib/hive/auxlib | |
cp /usr/lib/hbase/lib/guava-12.0.1.jar /usr/lib/hive/auxlib | |
Create the hive table.. | |
set hbase.zookeeper.quorum=localhost; | |
set hive.zookeeper.client.port=2181; | |
set zookeeper.znode.parent=/hbase-unsecure; | |
create external table pigdemo(id int, name string, state string) | |
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' | |
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,a:name,a:state') | |
TBLPROPERTIES ('hbase.table.name' = 'pigdemo'); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
http://git.io/0EEptw