Skip to content

Instantly share code, notes, and snippets.

@snoremac
snoremac / ec2-spot-price.sh
Last active December 26, 2015 04:29
Brief summary of current EC2 spot prices for select instance types. Requires Amazon EC2 API Tools: http://aws.amazon.com/developertools/351
#!/bin/bash -e
INSTANCE_TYPES="m1.small m1.medium m1.large m1.xlarge m2.xlarge m2.2xlarge m2.4xlarge c1.xlarge cc1.4xlarge cc2.8xlarge"
if [ "$#" -gt "0" ]; then
region=$1
else
region=$AWS_REGION
fi
@snoremac
snoremac / emr-examples-single-sequence-single-node-streaming.sh
Last active January 2, 2016 06:19
Launch an EMR cluster and run a word count against a small portion of 1 common crawl segment.
# Launch a cluster and run the word count against a small portion of
# 1 crawl segment.
#
# The input data is read directly from S3.
#
# Here I've omitted the --alive argument, so the cluster will self-terminate
# once the job is complete.
elastic-mapreduce \
--create \
@snoremac
snoremac / emr-examples-single-node-defaults.sh
Last active January 2, 2016 06:29
Launch a single node EMR cluster with keep-alive.
elastic-mapreduce --create --alive --instance-type m1.large
@snoremac
snoremac / emr-examples-wait-loop.sh
Created January 5, 2014 02:35
Wait for an EMR cluster to transition to WAITING state.
time while [ "$waiting" != "0" ]; do
sleep 3
elastic-mapreduce --describe j-30Y4E7T52UPJT |grep "State.*WAITING" > /dev/null
waiting=$?
done
@snoremac
snoremac / emr-examples-terminate.sh
Last active January 2, 2016 06:29
Terminate an EMR cluster.
# Terminate the cluster we used in the previous example.
elastic-mapreduce --terminate -j j-30Y4E7T52UPJT
@snoremac
snoremac / emr-examples-single-segment-10-node-streaming.sh
Created January 5, 2014 03:10
Launch an EMR cluster and run a word count against a single common crawl segment.
# Launch a cluster and run the word count against a single crawl segment.
$ elastic-mapreduce \
--create \
--name "Common Crawl word count" \
--enable-debugging \
--stream \
--ami-version latest \
--instance-group master --instance-count 1 --instance-type m2.2xlarge \
--instance-group core --instance-count 10 --instance-type c1.xlarge \
@snoremac
snoremac / emr-examples-single-segment-10-node-java.sh
Last active January 2, 2016 06:29
Launch a 10 node EMR cluster and run a Java word count against a single common crawl segment.
# Word count, Java-fied.
#
# In this implementation, the job knows how to discover its input URIs based
# on the max.segments argument, which specifies how many crawl segments
# to process.
#
# See the code for details.
$ elastic-mapreduce \
--create \
@snoremac
snoremac / emr-exampes-10-node-spot.sh
Last active January 2, 2016 08:09
Launch a 10 node EMR cluster with keep-alive from the spot market.
# Launch a cluster from the spot market.
#
# This time we specify --alive to keep the cluster running until we
# manually terminate it.
elastic-mapreduce \
--create \
--name "Common Crawl word count" \
--alive \
--enable-debugging \
@snoremac
snoremac / emr-examples-single-segment-java.sh
Last active January 2, 2016 08:09
Run a Java-based word count against an already running cluster.
# Run the word count against a single crawl segment.
elastic-mapreduce -j j-2XP9O9IRLHHBU \
--jar s3n://emr-examples.dius.com.au/java/emr-examples.jar \
--main-class au.com.dius.emr.CommonCrawlTool \
--arg -D --arg target.words=hello,world \
--arg -D --arg max.segments=1
@snoremac
snoremac / emr-examples-15-segment-java.sh
Last active January 2, 2016 08:09
Run a Java-based word count against 15 common crawl segments on an already running EMR cluster.
# Run the word against 15 crawl segments
elastic-mapreduce -j j-2XP9O9IRLHHBU \
--jar s3n://emr-examples.dius.com.au/java/emr-examples.jar \
--main-class au.com.dius.emr.CommonCrawlTool \
--arg -D --arg target.words=hello,world \
--arg -D --arg max.segments=15