Skip to content

Instantly share code, notes, and snippets.

@mehikmat
Created February 26, 2014 13:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save mehikmat/9229574 to your computer and use it in GitHub Desktop.
Save mehikmat/9229574 to your computer and use it in GitHub Desktop.
Hadoop commnad to run bash script in hadoop cluster-This script counts the number of lines in input file and writes count to output file
#!/bin/sh
# 's_\^\*~_\n_g' is the line delimiter in input file replace with yours
sed 's_\^\*~_\n_g'| wc -l
$sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u4.jar \
-numReduceTasks 0 \
-file /home/hdhamee/count.sh \
-input /user/hdhamee/input/Medical.csv \
-output /user/hdhamee/output/MedCount.csv \
-mapper count.sh \
-verbose
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment