Skip to content

Instantly share code, notes, and snippets.

@kzhangkzhang
Last active October 9, 2019 13:36
Show Gist options
  • Save kzhangkzhang/b4c976cb00b5e1200f161c8d155ae687 to your computer and use it in GitHub Desktop.
Save kzhangkzhang/b4c976cb00b5e1200f161c8d155ae687 to your computer and use it in GitHub Desktop.
Cheat Sheet for Big Data Pig Programming & Pig Latin

Apache Pig Cheatsheet

Example Script

Example 1: Load data file into HDFS

Example source: Take Ctl of Your BigData with Hue in Cloudera CDH (Xavier Morera, PluralSight)

mytags = LOAD 'stackexchange/tags-no-header.csv' USING PigStorage(',') as (Id,TagName,CountTags:int,ExcerptPostId,WikiPostId);

thetags = FOREACH mytags GENERATE Id,TagName,CountTags;

orderedtags = ORDER thetags BY CountTags DESC;

ILLUSTRATE mytags;

STORE orderedtags INTO 'stackexchange/tagstsv';

Example 2: Loading tab delimite data file (in HDFS) into HBase table

Example source: Take Ctl of Your BigData with Hue in Cloudera CDH (Xavier Morera, PluralSight)

mytags = LOAD 'stackexchange/votes-no-header.tsv' 
as (Id,PostId,VoteTypeId:chararray, CreationDate:chararray);

mymappedtags = FOREACH mytags GENERATE TOTUPLE(PostId, TOMAP('Date', CreationDate, 'Vote', VoteTypeId));

STORE mymappedtags into 'hbase://votesimport'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('voted:*');
@kzhangkzhang
Copy link
Author

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment