This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash -e | |
INSTANCE_TYPES="m1.small m1.medium m1.large m1.xlarge m2.xlarge m2.2xlarge m2.4xlarge c1.xlarge cc1.4xlarge cc2.8xlarge" | |
if [ "$#" -gt "0" ]; then | |
region=$1 | |
else | |
region=$AWS_REGION | |
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Launch a cluster and run the word count against a small portion of | |
# 1 crawl segment. | |
# | |
# The input data is read directly from S3. | |
# | |
# Here I've omitted the --alive argument, so the cluster will self-terminate | |
# once the job is complete. | |
elastic-mapreduce \ | |
--create \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
elastic-mapreduce --create --alive --instance-type m1.large |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
time while [ "$waiting" != "0" ]; do | |
sleep 3 | |
elastic-mapreduce --describe j-30Y4E7T52UPJT |grep "State.*WAITING" > /dev/null | |
waiting=$? | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Terminate the cluster we used in the previous example. | |
elastic-mapreduce --terminate -j j-30Y4E7T52UPJT |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Launch a cluster and run the word count against a single crawl segment. | |
$ elastic-mapreduce \ | |
--create \ | |
--name "Common Crawl word count" \ | |
--enable-debugging \ | |
--stream \ | |
--ami-version latest \ | |
--instance-group master --instance-count 1 --instance-type m2.2xlarge \ | |
--instance-group core --instance-count 10 --instance-type c1.xlarge \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Word count, Java-fied. | |
# | |
# In this implementation, the job knows how to discover its input URIs based | |
# on the max.segments argument, which specifies how many crawl segments | |
# to process. | |
# | |
# See the code for details. | |
$ elastic-mapreduce \ | |
--create \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Launch a cluster from the spot market. | |
# | |
# This time we specify --alive to keep the cluster running until we | |
# manually terminate it. | |
elastic-mapreduce \ | |
--create \ | |
--name "Common Crawl word count" \ | |
--alive \ | |
--enable-debugging \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Run the word count against a single crawl segment. | |
elastic-mapreduce -j j-2XP9O9IRLHHBU \ | |
--jar s3n://emr-examples.dius.com.au/java/emr-examples.jar \ | |
--main-class au.com.dius.emr.CommonCrawlTool \ | |
--arg -D --arg target.words=hello,world \ | |
--arg -D --arg max.segments=1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Run the word against 15 crawl segments | |
elastic-mapreduce -j j-2XP9O9IRLHHBU \ | |
--jar s3n://emr-examples.dius.com.au/java/emr-examples.jar \ | |
--main-class au.com.dius.emr.CommonCrawlTool \ | |
--arg -D --arg target.words=hello,world \ | |
--arg -D --arg max.segments=15 |
OlderNewer