Skip to content

Instantly share code, notes, and snippets.

@datawrangling
Created March 14, 2010 01:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save datawrangling/97d22e707ab7ca5fd140 to your computer and use it in GitHub Desktop.
Save datawrangling/97d22e707ab7ca5fd140 to your computer and use it in GitHub Desktop.
Launching customized cloudera Hadoop EC2 cluster w_ latest testing repo release
1) Follow setup steps described here:
http://archive.cloudera.com/docs/_getting_started.html
(download http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz)
2) Modify "install_hadoop()" function in hadoop-ec2-init-remote.sh:
replace:
apt-get -y install pig${PIG_VERSION:+-${PIG_VERSION}}
with:
apt-get -y install hadoop-pig${PIG_VERSION:+-${PIG_VERSION}}
3) Start a custom cluster w/ testing release:
./hadoop-ec2 launch-cluster --user-packages 'r-base r-base-core r-base-dev r-base-html r-base-latex r-cran-date ant subversion python-rpy python-setuptools python-docutils python-support python-distutils-extra python-simplejson git-core s3cmd policykit' --env REPO=testing --env HADOOP_VERSION=0.20 --auto-shutdown 57 my-hadoop-cluster 20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment