Created
May 9, 2017 14:49
-
-
Save milindjagre/8352853429f8c8751a49db92cb34070c to your computer and use it in GitHub Desktop.
this pig script is used for removing the duplicate tuples from pig relation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- this file is used for removing the duplicate tuples from a pig relation | |
-- LOAD command is used for loading the data in input file to input_data pig relation | |
-- we are not passing any custom schema in this case | |
input_data = LOAD '/hdpcd/input/post20/post20.csv' USING PigStorage(','); | |
-- DISTINCT command is used removing the duplicate tuples from the pig relation | |
-- output is stored in unique_data pig relation | |
unique_data = DISTINCT input_data; | |
-- final output is stored in | |
STORE unique_data INTO '/hdpcd/output/post20'; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment