Created
May 11, 2017 18:55
-
-
Save milindjagre/86142fbf7240937a4326947cb3e0d034 to your computer and use it in GitHub Desktop.
this pig script is used for launching multiple reducer tasks using the SET command
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- this pig script is going to launch parallel reduce tasks | |
-- we are using SET command for doing this | |
-- below line launches 4 reducer tasks for doing an operation | |
SET default_parallel 4 | |
-- data in post21.csv is stored in input_data pig relation using LOAD command | |
input_data = LOAD '/hdpcd/input/post21/post21.csv' USING PigStorage(','); | |
-- a SORT operation is performed using ORDER command | |
-- output of this command is stored in sorted_data pig relation | |
sorted_data = ORDER input_data BY $6 DESC; | |
-- sorted_data pig relation is stored in HDFS using STORE command | |
-- since reduce tasks are 4, there should be 4 part files in /hdpcd/output/post21_1 directory | |
STORE sorted_data INTO '/hdpcd/output/post21_1' USING PigStorage(':'); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment