Skip to content

Instantly share code, notes, and snippets.

@snoremac
Created January 5, 2014 22:13
Show Gist options
  • Save snoremac/8274641 to your computer and use it in GitHub Desktop.
Save snoremac/8274641 to your computer and use it in GitHub Desktop.
Run a Java-based word count from data already on a running cluster's HDFS filesystem.
# Run the word count from local HDFS.
elastic-mapreduce -j j-2XP9O9IRLHHBU \
--jar s3n://emr-examples.dius.com.au/java/emr-examples.jar \
--main-class au.com.dius.emr.CommonCrawlTool \
--arg -D --arg target.words=hello,world \
--arg -D --arg base.uri=hdfs:///common-crawl \
--arg -D --arg max.segments=1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment