Skip to content

Instantly share code, notes, and snippets.

@jeongho
Created February 4, 2016 18:04
Show Gist options
  • Save jeongho/3b8c028f5e8409c3a10a to your computer and use it in GitHub Desktop.
Save jeongho/3b8c028f5e8409c3a10a to your computer and use it in GitHub Desktop.
Hadoop benchmark 2. run terasort
#!/bin/bash
# terasort benchmark
# Usage: hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir>
#
# command to run nohup
# nohup bash ./run_terasort.sh > terasort.out 2>&1 &
# sudo -u hdfs nohup bash /tmp/run_terasort.sh > /tmp/terasort.out 2>&1 &
hadoop_jar=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
# TeraGen: 1TB = 1,000,000,000,000 = 1e12 BYTE = 100 BYTE * 1e10
number_of_100byte_rows=10000000000
input_dir=terasort-input
output_dir=terasort-output
validate_dir=terasort-validate
date
echo "---------------------TeraGen---------------------------------------"
hadoop fs -rm -R -skipTrash $input_dir
time hadoop jar $hadoop_jar teragen -Dmapred.compress.map.output=false -Dmapred.map.tasks=112 $number_of_100byte_rows $input_dir
date
echo "---------------------TeraSort---------------------------------------"
hadoop fs -rm -R -skipTrash $output_dir
time hadoop jar $hadoop_jar terasort $input_dir $output_dir
date
echo "---------------------TeraValidate-----------------------------------"
hadoop fs -rm -R -skipTrash $validate_dir
time hadoop jar $hadoop_jar teravalidate $output_dir $validate_dir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment