Skip to content

Instantly share code, notes, and snippets.

@mlimotte
Created May 8, 2013 14:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mlimotte/5541018 to your computer and use it in GitHub Desktop.
Save mlimotte/5541018 to your computer and use it in GitHub Desktop.
Sample Lemur jobdef, showing Hadoop Streaming and pipelined jobs (i.e. output of one job is input of another). The defstep defines a single step in the process. You can include as many defsteps as you want in the jobdef. The ones that are actually run are controlled by the fire! call, as shown in the example. Alternatively, the steps can be in a…
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Sample of a Jobdef for a Streaming job
;;;
;;; Example of common usage:
;;; lemur run strm-jobdef.clj --bucket my-bucket-name
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(catch-args
[:bucket "An s3 bucket, e.g. 'com.myco.bucket1'"]
)
(defcluster sample-cluster
:master-instance-type "m1.large"
:slave-instance-type "m1.large"
:num-instances 2
:keypair "tcc-integration"
:enable-debugging? false
:runtime-jar "/home/hadoop/contrib/streaming/hadoop-streaming.jar"
)
(defstep sample-strm-step
:args.positional
["-input" "s3://elasticmapreduce/samples/wordcount/input"
"-output" "${data-uri}/out1"
"-mapper" "s3://elasticmapreduce/samples/wordcount/wordSplitter.py"
"-reducer" "aggregate"]
)
(defstep second-strm-step
:args.positional
["-input" "${data-uri}/out1"
"-output" "${data-uri}/out2"
"-mapper" "s3://elasticmapreduce/samples/wordcount/wordSplitter.py"
"-reducer" "aggregate"]
)
(fire! sample-cluster sample-strm-step second-strm-step)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment