Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@milindjagre
Created May 9, 2017 14:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save milindjagre/8352853429f8c8751a49db92cb34070c to your computer and use it in GitHub Desktop.
Save milindjagre/8352853429f8c8751a49db92cb34070c to your computer and use it in GitHub Desktop.
this pig script is used for removing the duplicate tuples from pig relation
-- this file is used for removing the duplicate tuples from a pig relation
-- LOAD command is used for loading the data in input file to input_data pig relation
-- we are not passing any custom schema in this case
input_data = LOAD '/hdpcd/input/post20/post20.csv' USING PigStorage(',');
-- DISTINCT command is used removing the duplicate tuples from the pig relation
-- output is stored in unique_data pig relation
unique_data = DISTINCT input_data;
-- final output is stored in
STORE unique_data INTO '/hdpcd/output/post20';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment