Skip to content

Instantly share code, notes, and snippets.

@Condla
Last active March 30, 2018 11:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Condla/0d170ef4bd611554cd620ff98236d6fa to your computer and use it in GitHub Desktop.
Save Condla/0d170ef4bd611554cd620ff98236d6fa to your computer and use it in GitHub Desktop.
An Apache Pig script that shows how to read data from Apache HBase, sort it by some value and store it as CSV.

Pig Examples

You can run the pig examples below with the following commands. Note: You need to have Pig, Tez, HDFS, YARN setup, HBase and Hive tables must exist with the name used in the scripts.

hive_to_hbase.pig

Run:

pig -Dtez.queue.name=myQueue -x tez -useHCatalog -param "my_datetime=2018-03-30_13:05:21" -f hive_to_hbase.pig 

hbase_to_csv.pig

pig -x tez -f hbase_to_csv.pig
test = LOAD 'hbase://condla:test'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:col1, f1:col2, f1:col3', '-loadKey true')
AS ( id:bytearray, col1, col2, col3);
sorted_test = ORDER test BY col2;
STORE sorted_test INTO '/user/condla/test' USING PigStorage('\t');
test = LOAD 'condla.test' USING org.apache.hive.hcatalog.pig.HCatLoader();
hbase_dump = FOREACH test
GENERATE
CONCAT($4, '_', (chararray)ToUnixTime($0), '_',(chararray)ToUnixTime($1), '_', $5) as row_id,
*;
hbase_dump_filtered = FILTER hbase_dump BY call_datetime >= ToDate('$my_datetime', 'yyyy-MM-dd_HH:mm:ss');
copy = STORE hbase_dump_filtered INTO 'hbase://condla:test'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf1:col1 cf1:col2 cf2:col1 cf3:col1 cf3:col2');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment