Skip to content

Instantly share code, notes, and snippets.

@ceteri
Created October 14, 2012 19:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ceteri/3889615 to your computer and use it in GitHub Desktop.
Save ceteri/3889615 to your computer and use it in GitHub Desktop.
ACM DM - Multitool exercise
# use git to load multitool (simplest as a ZIP)
# https://github.com/Cascading/cascading.multitool
# to save time, we'll skip the JAR compile/build...
# download the JAR file from:
# https://s3.amazonaws.com/ceteri-mapred/multitool.jar
# cd to your cascading.multitool download
bash-3.2$ rm -rf output
bash-3.2$ hadoop jar ./multitool.jar source=data/days.txt select=Tuesday sink=output/tuesday.txt
Warning: $HADOOP_HOME is deprecated.
12/10/14 12:43:11 INFO multitool.Main: key: source
12/10/14 12:43:11 INFO multitool.Main: key: select
12/10/14 12:43:11 INFO multitool.Main: key: sink
12/10/14 12:43:11 INFO util.HadoopUtil: resolving application jar from found main method on: multitool.Main
12/10/14 12:43:11 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/Downloads/multitool.jar
12/10/14 12:43:11 INFO property.AppProps: using app.id: 179169EFDA360324B08F975B10EABE23
2012-10-14 12:43:11.671 java[49161:1903] Unable to load realm info from SCDynamicStore
12/10/14 12:43:11 INFO flow.Flow: [multitool] starting
12/10/14 12:43:11 INFO flow.Flow: [multitool] source: Hfs["TextLine[[0:1]->[ALL]]"]["data/days.txt"]"]
12/10/14 12:43:11 INFO flow.Flow: [multitool] sink: Hfs["TextDelimited[[UNKNOWN]->[0]]"]["output/tuesday.txt"]"]
12/10/14 12:43:11 INFO flow.Flow: [multitool] parallel execution is enabled: false
12/10/14 12:43:11 INFO flow.Flow: [multitool] starting jobs: 1
12/10/14 12:43:11 INFO flow.Flow: [multitool] allocating threads: 1
12/10/14 12:43:11 INFO flow.FlowStep: [multitool] starting step: (1/1) output/tuesday.txt
12/10/14 12:43:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/10/14 12:43:11 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/14 12:43:11 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/14 12:43:12 INFO flow.FlowStep: [multitool] submitted hadoop job: job_local_0001
12/10/14 12:43:12 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/10/14 12:43:12 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/src/concur/cascading.multitool/data/days.txt
12/10/14 12:43:12 INFO mapred.MapTask: numReduceTasks: 0
12/10/14 12:43:12 INFO hadoop.FlowMapper: cascading version: Concurrent, Inc - Cascading 2.0.6-wip-362
12/10/14 12:43:12 INFO hadoop.FlowMapper: child jvm opts: -server -Xmx512m
12/10/14 12:43:12 INFO hadoop.FlowMapper: sourcing from: Hfs["TextLine[[0:1]->[ALL]]"]["data/days.txt"]"]
12/10/14 12:43:12 INFO hadoop.FlowMapper: sinking to: Hfs["TextDelimited[[UNKNOWN]->[0]]"]["output/tuesday.txt"]"]
12/10/14 12:43:12 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/10/14 12:43:12 INFO mapred.LocalJobRunner:
12/10/14 12:43:12 INFO mapred.Task: Task attempt_local_0001_m_000000_0 is allowed to commit now
12/10/14 12:43:12 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to file:/Users/ceteri/src/concur/cascading.multitool/output/tuesday.txt
12/10/14 12:43:15 INFO mapred.LocalJobRunner: file:/Users/ceteri/src/concur/cascading.multitool/data/days.txt:0+726
12/10/14 12:43:15 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/10/14 12:43:17 INFO util.Hadoop18TapUtil: deleting temp path output/tuesday.txt/_temporary
bash-3.2$ ls output/tuesday.txt/
_SUCCESS part-00000
bash-3.2$ cat output/tuesday.txt/part-00000
Monday's child is fair in face, Tuesday's child is full of grace, Wednesday's child is full of woe.
We're still investigating. I heard that Monday or Tuesday we will probably be having a press conference announcing more.
We take time to go to a restaurant two times a week. A little candlelight, dinner, soft music and dancing. She goes Tuesdays, I go Fridays.
bash-3.2$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment