Skip to content

Instantly share code, notes, and snippets.

@meyarivan
Created February 8, 2014 03:30
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save meyarivan/8876299 to your computer and use it in GitHub Desktop.
Save meyarivan/8876299 to your computer and use it in GitHub Desktop.
# pig -param orig=/user/bcolloran/data/fhrFullDump_2014-01-31/ -param fetchids=/tmp/sample_list.txt -param jointype=merge -param output=DEST_PATH fetch_reports.pig
register '/opt/cloudera/parcels/CDH/lib/pig/piggybank.jar';
fulldump = LOAD '$orig' USING org.apache.pig.piggybank.storage.SequenceFileLoader AS (key:chararray, value:chararray);
ids_to_fetch_raw = LOAD '$fetchids' USING PigStorage() AS (key:chararray, ign:chararray);
ids_to_fetch = ORDER ids_to_fetch_raw BY key;
common = JOIN fulldump by key, ids_to_fetch by key USING '$jointype';
todump = FOREACH common GENERATE $0, $1;
STORE todump INTO '$output' USING PigStorage();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment