Skip to content

Instantly share code, notes, and snippets.

@milindjagre
Created April 20, 2017 01:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save milindjagre/eb4a965a30cfee2e3398bb0897aac534 to your computer and use it in GitHub Desktop.
Save milindjagre/eb4a965a30cfee2e3398bb0897aac534 to your computer and use it in GitHub Desktop.
creating pig script for doing group operations
--GROUP OPERATION IN APACHE PIG
--loading weather data in weather relation
weather = LOAD '/hdpcd/input/post15/post15.csv' USING PigStorage(',');
--performing group operation based on station name
--station name is the first column in weather relation, therefore $0
grouped_data = GROUP weather BY $0;
--generating output data with FOREACH...GENERATE command
--output contains station name as the group and rest of the columns in weather relation
output_data = FOREACH grouped_data GENERATE group,weather;
--storing the final output in HDFS
STORE output_data INTO '/hdpcd/output/post15/';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment