Skip to content

Instantly share code, notes, and snippets.

@amimimor
Created May 15, 2012 08:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amimimor/3cc8399ebdfb0c7ca2bb to your computer and use it in GitHub Desktop.
Save amimimor/3cc8399ebdfb0c7ca2bb to your computer and use it in GitHub Desktop.
Setup a Scalding job (for the first time) to be run on CDH3u2, using Maven
1. go to your scalding source directory
2. edit build.sbt (https://gist.github.com/238d74b081d9f2c6e5f1)
3. sbt -29 update
4. sbt -29 assembly
5. mvn install:install-file ..... (http://maven.apache.org/plugins/maven-install-plugin/usage.html) to install the created scalding-assembly.0.x.y.jar locally
6. download Cloudera's hadoop-0.20.2-cdh3u2.tar.gz (or just download hadoop-core-cdh3u2.jar)
6a. same as 5, install locally your cdh3u2 hadoop-core jar (of course, get it first, or embed Cloudera's parent pom)
7. in your IDE, create a new project using this pom: https://gist.github.com/40f1838bbdd15cc25b21
8. create the file src/assembly/job.xml and edit: https://gist.github.com/9c5e6f04da287667983a
9. create your Scala class implementing Scalding's Job, i.e. "class SomethingCool(args: Args) extends Job(args)"
10. mvn package
11. the created jar would be placed under your project's target folder, named like: YOURPROJECT-0.0.1-SNAPSHOT-job.jar
12. setup your hadoop conf files (most importantly, your core-site.xml file) and edit
<property>
<name>fs.default.name</name>
<value>hdfs://namenode.somethingcool.com:8020/</value>
</property>
13. cd to your hadoop-0.20-cdh3u2 folder
14. bin/hadoop jar YOURPROJECT-0.0.1-SNAPSHOT-job.jar com.twitter.scalding.Tool your.package.your.class --hdfs --input hdfs://namenode.somethingcool.com/user/hdfs/tmp/hello.txt --output hdfs://namenode.somethingcool.com/user/hdfs/tmp/hello_out.txt -libjars YOURPROJECT-0.0.1-SNAPSHOT-job.jar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment