Created
May 8, 2013 14:56
-
-
Save mlimotte/5541018 to your computer and use it in GitHub Desktop.
Sample Lemur jobdef, showing Hadoop Streaming and pipelined jobs (i.e. output of one job is input of another). The defstep defines a single step in the process. You can include as many defsteps as you want in the jobdef. The ones that are actually run are controlled by the fire! call, as shown in the example. Alternatively, the steps can be in a…
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; | |
;;; Sample of a Jobdef for a Streaming job | |
;;; | |
;;; Example of common usage: | |
;;; lemur run strm-jobdef.clj --bucket my-bucket-name | |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; | |
(catch-args | |
[:bucket "An s3 bucket, e.g. 'com.myco.bucket1'"] | |
) | |
(defcluster sample-cluster | |
:master-instance-type "m1.large" | |
:slave-instance-type "m1.large" | |
:num-instances 2 | |
:keypair "tcc-integration" | |
:enable-debugging? false | |
:runtime-jar "/home/hadoop/contrib/streaming/hadoop-streaming.jar" | |
) | |
(defstep sample-strm-step | |
:args.positional | |
["-input" "s3://elasticmapreduce/samples/wordcount/input" | |
"-output" "${data-uri}/out1" | |
"-mapper" "s3://elasticmapreduce/samples/wordcount/wordSplitter.py" | |
"-reducer" "aggregate"] | |
) | |
(defstep second-strm-step | |
:args.positional | |
["-input" "${data-uri}/out1" | |
"-output" "${data-uri}/out2" | |
"-mapper" "s3://elasticmapreduce/samples/wordcount/wordSplitter.py" | |
"-reducer" "aggregate"] | |
) | |
(fire! sample-cluster sample-strm-step second-strm-step) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment