Skip to content

Instantly share code, notes, and snippets.

@gfelot
Last active May 2, 2016 16:00
Show Gist options
  • Save gfelot/d72b2f22ece8cfe4ec10ee0677748127 to your computer and use it in GitHub Desktop.
Save gfelot/d72b2f22ece8cfe4ec10ee0677748127 to your computer and use it in GitHub Desktop.
Udemy Hadoop Error with PIG exercice
grunt> STORE top10 INTO 'output/pig/avg-volume' USING PigStorage(',');
2016-05-02 14:38:00,588 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,ORDER_BY,FILTER,LIMIT
2016-05-02 14:38:00,623 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2016-05-02 14:38:00,649 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator
2016-05-02 14:38:00,653 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6000:
<line 7, column 0> Output Location Validation Failed for: 'hdfs://ip-172-31-45-216.ec2.internal:8020/user/hirwcourseuser0416/output/pig/avg-volume More info to follow:
Output directory hdfs://ip-172-31-45-216.ec2.internal:8020/user/hirwcourseuser0416/output/pig/avg-volume already exists
Details at logfile: /home/hirwcourseuser0416/pig_1462199798774.log
stocks = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as (exchange:chararray, symbol:chararray, date:datetime, open:float, high:float, low:float, close:float, volume:int, adj_close:float);
filter_by_yr = FILTER stocks by GetYear(date) == 2003;
grp_by_sym = GROUP filter_by_yr BY symbol;
avg_volume = FOREACH grp_by_sym GENERATE group, ROUND(AVG(filter_by_yr.volume)) as avgvolume;
avg_vol_ordered = ORDER avg_volume BY avgvolume DESC;
top10 = LIMIT avg_vol_ordered 10;
STORE top10 INTO 'output/pig/avg-volume' USING PigStorage(',');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment