Skip to content

Instantly share code, notes, and snippets.

@ceteri
Last active December 10, 2015 09:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ceteri/4417586 to your computer and use it in GitHub Desktop.
Save ceteri/4417586 to your computer and use it in GitHub Desktop.
Cascading for the Impatient, Part 9
bash-3.2$ lein repl
Listening for transport dt_socket at address: 51539
nREPL server started on port 51542
REPL-y 0.1.0-beta10
Clojure 1.4.0
Exit: Control+D or (exit) or (quit)
Commands: (user/help)
Docs: (doc function-name-here)
(find-doc "part-of-name-here")
Source: (source function-name-here)
(user/sourcery function-name-here)
Javadoc: (javadoc java-object-or-class-here)
Examples from clojuredocs.org: [clojuredocs or cdoc]
(user/clojuredocs name-here)
(user/clojuredocs "ns-here" "name-here")
user=> (use 'cascalog.playground) (bootstrap)
nil
nil
user=> (?<- (stdout) [?person] (age ?person 25))
12/12/30 21:42:10 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
12/12/30 21:42:10 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/.m2/repository/cascading/cascading-hadoop/2.0.3/cascading-hadoop-2.0.3.jar
12/12/30 21:42:10 INFO property.AppProps: using app.id: 75CD13874F775C9F50B85D4020DB8597
12/12/30 21:42:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:11 INFO flow.Flow: [] starting
12/12/30 21:42:11 INFO flow.Flow: [] source: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/8aae2eaf-cee3-4569-8773-3340098b860b"]"]
12/12/30 21:42:11 INFO flow.Flow: [] sink: StdoutTap["SequenceFile[[UNKNOWN]->['?person']]"]["/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp84928667373582098931356932530374073000"]"]
12/12/30 21:42:11 INFO flow.Flow: [] parallel execution is enabled: false
12/12/30 21:42:11 INFO flow.Flow: [] starting jobs: 1
12/12/30 21:42:11 INFO flow.Flow: [] allocating threads: 1
12/12/30 21:42:11 INFO flow.FlowStep: [] starting step: (1/1) ...2098931356932530374073000
12/12/30 21:42:11 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:11 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001
12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:11 INFO mapred.MapTask: numReduceTasks: 0
12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:11 INFO hadoop.FlowMapper: cascading version: Concurrent, Inc - Cascading 2.0.3
12/12/30 21:42:11 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
12/12/30 21:42:11 INFO hadoop.FlowMapper: sourcing from: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/8aae2eaf-cee3-4569-8773-3340098b860b"]"]
12/12/30 21:42:11 INFO hadoop.FlowMapper: sinking to: StdoutTap["SequenceFile[[UNKNOWN]->['?person']]"]["/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp84928667373582098931356932530374073000"]"]
12/12/30 21:42:11 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/12/30 21:42:11 INFO mapred.LocalJobRunner:
12/12/30 21:42:11 INFO mapred.TaskRunner: Task attempt_local_0001_m_000000_0 is allowed to commit now
12/12/30 21:42:11 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to file:/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp84928667373582098931356932530374073000
12/12/30 21:42:11 INFO mapred.LocalJobRunner:
12/12/30 21:42:11 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
12/12/30 21:42:11 INFO mapred.FileInputFormat: Total input paths to process : 1
RESULTS
-----------------------
nil
user=> 12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:11 INFO util.Hadoop18TapUtil: deleting temp path /var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp84928667373582098931356932530374073000/_temporary
david
emily
-----------------------
user=> (?<- (stdout) [?person] (age ?person ?age) (< ?age 30))
12/12/30 21:42:23 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
12/12/30 21:42:23 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/.m2/repository/cascading/cascading-hadoop/2.0.3/cascading-hadoop-2.0.3.jar
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:23 INFO flow.Flow: [] starting
12/12/30 21:42:23 INFO flow.Flow: [] source: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/9542bdd3-e677-416e-81c1-a5c36e4c53ca"]"]
12/12/30 21:42:23 INFO flow.Flow: [] sink: StdoutTap["SequenceFile[[UNKNOWN]->['?person']]"]["/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp80760086378556118051356932542912465000"]"]
12/12/30 21:42:23 INFO flow.Flow: [] parallel execution is enabled: false
12/12/30 21:42:23 INFO flow.Flow: [] starting jobs: 1
12/12/30 21:42:23 INFO flow.Flow: [] allocating threads: 1
12/12/30 21:42:23 INFO flow.FlowStep: [] starting step: (1/1) ...6118051356932542912465000
12/12/30 21:42:23 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:23 INFO flow.FlowStep: [] submitted hadoop job: job_local_0002
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:23 INFO mapred.MapTask: numReduceTasks: 0
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:23 INFO hadoop.FlowMapper: cascading version: Concurrent, Inc - Cascading 2.0.3
12/12/30 21:42:23 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
12/12/30 21:42:23 INFO hadoop.FlowMapper: sourcing from: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/9542bdd3-e677-416e-81c1-a5c36e4c53ca"]"]
12/12/30 21:42:23 INFO hadoop.FlowMapper: sinking to: StdoutTap["SequenceFile[[UNKNOWN]->['?person']]"]["/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp80760086378556118051356932542912465000"]"]
12/12/30 21:42:23 INFO mapred.TaskRunner: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
12/12/30 21:42:23 INFO mapred.LocalJobRunner:
12/12/30 21:42:23 INFO mapred.TaskRunner: Task attempt_local_0002_m_000000_0 is allowed to commit now
12/12/30 21:42:23 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_m_000000_0' to file:/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp80760086378556118051356932542912465000
12/12/30 21:42:23 INFO mapred.LocalJobRunner:
12/12/30 21:42:23 INFO mapred.TaskRunner: Task 'attempt_local_0002_m_000000_0' done.
RESULTS
-----------------------
nil
user=> 12/12/30 21:42:23 INFO mapred.FileInputFormat: Total input paths to process : 1
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/30 21:42:23 INFO util.Hadoop18TapUtil: deleting temp path /var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp80760086378556118051356932542912465000/_temporary
alice
david
emily
gary
kumar
-----------------------
user=> Bye for now!
bash-3.2$
bash-3.2$ pwd
/Users/ceteri/opt/Impatient/part1
bash-3.2$ lein uberjar
Created /Users/ceteri/opt/Impatient/part1/target/impatient-0.1.0-SNAPSHOT.jar
Including impatient-0.1.0-SNAPSHOT.jar
Including reflectasm-1.06-shaded.jar
Including slf4j-log4j12-1.6.1.jar
Including cascading-core-2.0.0.jar
Including cascading-hadoop-2.0.0.jar
Including objenesis-1.2.jar
Including meat-locker-0.3.0.jar
Including kryo-2.16.jar
Including tools.macro-0.1.1.jar
Including tools.logging-0.2.3.jar
Including minlog-1.2.jar
Including jgrapht-jdk1.6-0.8.1.jar
Including clojure-1.4.0.jar
Including log4j-1.2.16.jar
Including jackknife-0.1.2.jar
Including cascading.kryo-0.4.0.jar
Including hadoop-util-0.2.8.jar
Including maple-0.2.0.jar
Including riffle-0.1-dev.jar
Including cascalog-more-taps-0.3.0.jar
Including asm-4.0.jar
Including carbonite-1.3.0.jar
Including cascalog-1.10.0.jar
Including slf4j-api-1.6.1.jar
Created /Users/ceteri/opt/Impatient/part1/target/impatient.jar
bash-3.2$
bash-3.2$ rm -rf output
bash-3.2$ hadoop jar ./target/impatient.jar data/rain.txt output/rain
Warning: $HADOOP_HOME is deprecated.
12/12/31 19:16:01 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core
12/12/31 19:16:01 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part1/./target/impatient.jar
12/12/31 19:16:01 INFO property.AppProps: using app.id: 0FA333331029D47F7CDF748E0D802569
2012-12-31 19:16:01.153 java[17399:1903] Unable to load realm info from SCDynamicStore
12/12/31 19:16:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/12/31 19:16:01 WARN snappy.LoadSnappy: Snappy native library not loaded
12/12/31 19:16:01 INFO mapred.FileInputFormat: Total input paths to process : 1
12/12/31 19:16:01 INFO util.Version: Concurrent, Inc - Cascading 2.0.0
12/12/31 19:16:01 INFO flow.Flow: [] starting
12/12/31 19:16:01 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/12/31 19:16:01 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?doc', '?line']]"]["output/rain"]"]
12/12/31 19:16:01 INFO flow.Flow: [] parallel execution is enabled: false
12/12/31 19:16:01 INFO flow.Flow: [] starting jobs: 1
12/12/31 19:16:01 INFO flow.Flow: [] allocating threads: 1
12/12/31 19:16:01 INFO flow.FlowStep: [] starting step: (1/1) output/rain
12/12/31 19:16:01 INFO mapred.FileInputFormat: Total input paths to process : 1
12/12/31 19:16:01 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001
12/12/31 19:16:01 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/12/31 19:16:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:16:01 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part1/data/rain.txt
12/12/31 19:16:01 INFO mapred.MapTask: numReduceTasks: 0
12/12/31 19:16:01 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/12/31 19:16:01 INFO hadoop.FlowMapper: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?doc', '?line']]"]["output/rain"]"]
12/12/31 19:16:01 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/12/31 19:16:01 INFO mapred.LocalJobRunner:
12/12/31 19:16:01 INFO mapred.Task: Task attempt_local_0001_m_000000_0 is allowed to commit now
12/12/31 19:16:01 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to file:/Users/ceteri/opt/Impatient/part1/output/rain
12/12/31 19:16:04 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part1/data/rain.txt:0+510
12/12/31 19:16:04 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/12/31 19:16:04 INFO util.Hadoop18TapUtil: deleting temp path output/rain/_temporary
bash-3.2$ cat output/rain/part-00000
doc01 A rain shadow is a dry area on the lee back side of a mountainous area.
doc02 This sinking, dry air produces a rain shadow, or area in the lee of a mountain with less rain and cloudcover.
doc03 A rain shadow is an area of dry land that lies on the leeward (or downwind) side of a mountain.
doc04 This is known as the rain shadow effect and is the primary cause of leeward deserts of mountain ranges, such as California's Death Valley.
doc05 Two Women. Secrets. A Broken Land. [DVD Australia]
bash-3.2$
bash-3.2$ pwd
/Users/ceteri/opt/Impatient/part4
bash-3.2$ lein uberjar
Created /Users/ceteri/opt/Impatient/part4/target/impatient-0.1.0-SNAPSHOT.jar
Including impatient-0.1.0-SNAPSHOT.jar
Including reflectasm-1.06-shaded.jar
Including slf4j-log4j12-1.6.1.jar
Including cascading-core-2.0.0.jar
Including cascading-hadoop-2.0.0.jar
Including objenesis-1.2.jar
Including meat-locker-0.3.0.jar
Including kryo-2.16.jar
Including tools.macro-0.1.1.jar
Including tools.logging-0.2.3.jar
Including minlog-1.2.jar
Including jgrapht-jdk1.6-0.8.1.jar
Including clojure-1.4.0.jar
Including log4j-1.2.16.jar
Including jackknife-0.1.2.jar
Including cascading.kryo-0.4.0.jar
Including hadoop-util-0.2.8.jar
Including maple-0.2.0.jar
Including riffle-0.1-dev.jar
Including cascalog-more-taps-0.3.0.jar
Including asm-4.0.jar
Including carbonite-1.3.0.jar
Including cascalog-1.10.0.jar
Including slf4j-api-1.6.1.jar
Created /Users/ceteri/opt/Impatient/part4/target/impatient.jar
bash-3.2$ rm -rf output
bash-3.2$ hadoop jar ./target/impatient.jar data/rain.txt output/wc data/en.stop
Warning: $HADOOP_HOME is deprecated.
12/12/31 19:57:20 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core
12/12/31 19:57:20 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part4/./target/impatient.jar
12/12/31 19:57:20 INFO property.AppProps: using app.id: 018CCCB36A27EFF4C33268E758DCEE9C
2012-12-31 19:57:20.696 java[17478:1903] Unable to load realm info from SCDynamicStore
12/12/31 19:57:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/12/31 19:57:20 WARN snappy.LoadSnappy: Snappy native library not loaded
12/12/31 19:57:20 INFO mapred.FileInputFormat: Total input paths to process : 1
12/12/31 19:57:20 INFO mapred.FileInputFormat: Total input paths to process : 1
12/12/31 19:57:20 INFO util.Version: Concurrent, Inc - Cascading 2.0.0
12/12/31 19:57:20 INFO flow.Flow: [] starting
12/12/31 19:57:20 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/12/31 19:57:20 INFO flow.Flow: [] source: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
12/12/31 19:57:20 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"]
12/12/31 19:57:20 INFO flow.Flow: [] parallel execution is enabled: false
12/12/31 19:57:20 INFO flow.Flow: [] starting jobs: 2
12/12/31 19:57:20 INFO flow.Flow: [] allocating threads: 1
12/12/31 19:57:20 INFO flow.FlowStep: [] starting step: (1/2)
12/12/31 19:57:21 INFO mapred.FileInputFormat: Total input paths to process : 1
12/12/31 19:57:21 INFO mapred.FileInputFormat: Total input paths to process : 1
12/12/31 19:57:21 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001
12/12/31 19:57:21 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:21 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part4/data/en.stop
12/12/31 19:57:21 INFO mapred.MapTask: numReduceTasks: 1
12/12/31 19:57:21 INFO mapred.MapTask: io.sort.mb = 100
12/12/31 19:57:21 INFO mapred.MapTask: data buffer = 79691776/99614720
12/12/31 19:57:21 INFO mapred.MapTask: record buffer = 262144/327680
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:21 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
12/12/31 19:57:21 INFO hadoop.FlowMapper: sinking to: CoGroup(05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3*226bb6d6-5a62-49a3-afe6-debd65484c23)[by:05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3:[{1}:'?word']226bb6d6-5a62-49a3-afe6-debd65484c23:[{1}:'?word']]
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:21 INFO mapred.MapTask: Starting flush of map output
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:21 INFO mapred.MapTask: Finished spill 0
12/12/31 19:57:21 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/12/31 19:57:24 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part4/data/en.stop:0+544
12/12/31 19:57:24 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/12/31 19:57:24 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/12/31 19:57:24 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:24 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part4/data/rain.txt
12/12/31 19:57:24 INFO mapred.MapTask: numReduceTasks: 1
12/12/31 19:57:24 INFO mapred.MapTask: io.sort.mb = 100
12/12/31 19:57:24 INFO mapred.MapTask: data buffer = 79691776/99614720
12/12/31 19:57:24 INFO mapred.MapTask: record buffer = 262144/327680
12/12/31 19:57:24 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:24 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:24 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/12/31 19:57:24 INFO hadoop.FlowMapper: sinking to: CoGroup(05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3*226bb6d6-5a62-49a3-afe6-debd65484c23)[by:05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3:[{1}:'?word']226bb6d6-5a62-49a3-afe6-debd65484c23:[{1}:'?word']]
12/12/31 19:57:24 INFO mapred.MapTask: Starting flush of map output
12/12/31 19:57:24 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:24 INFO mapred.MapTask: Finished spill 0
12/12/31 19:57:24 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/12/31 19:57:27 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part4/data/rain.txt:0+510
12/12/31 19:57:27 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
12/12/31 19:57:27 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/12/31 19:57:27 INFO mapred.LocalJobRunner:
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:27 INFO mapred.Merger: Merging 2 sorted segments
12/12/31 19:57:27 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 3285 bytes
12/12/31 19:57:27 INFO mapred.LocalJobRunner:
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:27 INFO hadoop.FlowReducer: sourcing from: CoGroup(05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3*226bb6d6-5a62-49a3-afe6-debd65484c23)[by:05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3:[{1}:'?word']226bb6d6-5a62-49a3-afe6-debd65484c23:[{1}:'?word']]
12/12/31 19:57:27 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?word', '!__gen16']]"][f9c244ec-acb0-49a1-af11-d/44884/]
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:29 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec
12/12/31 19:57:29 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec
12/12/31 19:57:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:29 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/12/31 19:57:29 INFO mapred.LocalJobRunner:
12/12/31 19:57:29 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/12/31 19:57:29 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/tmp/hadoop-ceteri/f9c244ec_acb0_49a1_af11_d_44884_13A5B5731DB1CB1DC6B7BB49BF3F5A46
12/12/31 19:57:30 INFO mapred.LocalJobRunner: reduce > reduce
12/12/31 19:57:30 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/12/31 19:57:30 INFO flow.FlowStep: [] starting step: (2/2) output/wc
12/12/31 19:57:30 INFO mapred.FileInputFormat: Total input paths to process : 1
12/12/31 19:57:30 INFO flow.FlowStep: [] submitted hadoop job: job_local_0002
12/12/31 19:57:30 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:30 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/f9c244ec_acb0_49a1_af11_d_44884_13A5B5731DB1CB1DC6B7BB49BF3F5A46/part-00000
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:30 INFO mapred.MapTask: numReduceTasks: 1
12/12/31 19:57:30 INFO mapred.MapTask: io.sort.mb = 100
12/12/31 19:57:30 INFO mapred.MapTask: data buffer = 79691776/99614720
12/12/31 19:57:30 INFO mapred.MapTask: record buffer = 262144/327680
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:30 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?word', '!__gen16']]"][f9c244ec-acb0-49a1-af11-d/44884/]
12/12/31 19:57:30 INFO hadoop.FlowMapper: sinking to: GroupBy(f9c244ec-acb0-49a1-af11-d27e79ada8d9)[by:[{1}:'?word']]
12/12/31 19:57:30 INFO mapred.MapTask: Starting flush of map output
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:30 INFO mapred.MapTask: Finished spill 0
12/12/31 19:57:30 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
12/12/31 19:57:33 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/f9c244ec_acb0_49a1_af11_d_44884_13A5B5731DB1CB1DC6B7BB49BF3F5A46/part-00000:0+784
12/12/31 19:57:33 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done.
12/12/31 19:57:33 INFO mapred.Task: Using ResourceCalculatorPlugin : null
12/12/31 19:57:33 INFO mapred.LocalJobRunner:
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:33 INFO mapred.Merger: Merging 1 sorted segments
12/12/31 19:57:33 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 561 bytes
12/12/31 19:57:33 INFO mapred.LocalJobRunner:
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:33 INFO hadoop.FlowReducer: sourcing from: GroupBy(f9c244ec-acb0-49a1-af11-d27e79ada8d9)[by:[{1}:'?word']]
12/12/31 19:57:33 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"]
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/12/31 19:57:33 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
12/12/31 19:57:33 INFO mapred.LocalJobRunner:
12/12/31 19:57:33 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now
12/12/31 19:57:33 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/Users/ceteri/opt/Impatient/part4/output/wc
12/12/31 19:57:36 INFO mapred.LocalJobRunner: reduce > reduce
12/12/31 19:57:36 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done.
12/12/31 19:57:36 INFO util.Hadoop18TapUtil: deleting temp path output/wc/_temporary
bash-3.2$ more output/wc/part-00000
air 1
area 4
australia 1
broken 1
california's 1
cause 1
cloudcover 1
death 1
deserts 1
downwind 1
dry 3
dvd 1
effect 1
known 1
land 2
lee 2
leeward 2
less 1
lies 1
mountain 3
mountainous 1
primary 1
produces 1
rain 5
ranges 1
secrets 1
shadow 4
sinking 1
such 1
valley 1
women 1
bash-3.2$
bash-3.2$ pwd
/Users/ceteri/opt/Impatient/part6
bash-3.2$ lein uberjar
Created /Users/ceteri/opt/Impatient/part6/target/impatient-0.1.0-SNAPSHOT.jar
Including impatient-0.1.0-SNAPSHOT.jar
Including cascalog-checkpoint-0.2.0.jar
Including reflectasm-1.06-shaded.jar
Including slf4j-log4j12-1.6.1.jar
Including cascading-core-2.0.0.jar
Including cascading-hadoop-2.0.0.jar
Including objenesis-1.2.jar
Including meat-locker-0.3.0.jar
Including kryo-2.16.jar
Including tools.macro-0.1.1.jar
Including tools.logging-0.2.3.jar
Including minlog-1.2.jar
Including jgrapht-jdk1.6-0.8.1.jar
Including clojure-1.4.0.jar
Including log4j-1.2.16.jar
Including jackknife-0.1.2.jar
Including cascading.kryo-0.4.0.jar
Including hadoop-util-0.2.8.jar
Including maple-0.2.0.jar
Including riffle-0.1-dev.jar
Including cascalog-more-taps-0.3.0.jar
Including asm-4.0.jar
Including carbonite-1.3.0.jar
Including cascalog-1.10.0.jar
Including slf4j-api-1.6.1.jar
Created /Users/ceteri/opt/Impatient/part6/target/impatient.jar
bash-3.2$ rm -rf output
bash-3.2$ hadoop jar target/impatient.jar data/rain.txt output/wc data/en.stop output/tfidf
Warning: $HADOOP_HOME is deprecated.
2013-01-01 01:19:35.234 java[17801:1903] Unable to load realm info from SCDynamicStore
13/01/01 01:19:35 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
13/01/01 01:19:35 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part6/target/impatient.jar
13/01/01 01:19:35 INFO property.AppProps: using app.id: D246032EBAD75DCC3C0BE86023BAFCA6
13/01/01 01:19:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/01/01 01:19:35 WARN snappy.LoadSnappy: Snappy native library not loaded
13/01/01 01:19:35 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:35 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:36 INFO util.Version: Concurrent, Inc - Cascading 2.0.0
13/01/01 01:19:36 INFO flow.Flow: [] starting
13/01/01 01:19:36 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
13/01/01 01:19:36 INFO flow.Flow: [] source: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
13/01/01 01:19:36 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?word']]"]["tmp/checkpoint/data/etl-stage"]"]
13/01/01 01:19:36 INFO flow.Flow: [] parallel execution is enabled: false
13/01/01 01:19:36 INFO flow.Flow: [] starting jobs: 1
13/01/01 01:19:36 INFO flow.Flow: [] allocating threads: 1
13/01/01 01:19:36 INFO flow.FlowStep: [] starting step: (1/1) ...checkpoint/data/etl-stage
13/01/01 01:19:36 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:36 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:36 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001
13/01/01 01:19:36 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:36 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/data/en.stop
13/01/01 01:19:36 INFO mapred.MapTask: numReduceTasks: 1
13/01/01 01:19:36 INFO mapred.MapTask: io.sort.mb = 100
13/01/01 01:19:36 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/01 01:19:36 INFO mapred.MapTask: record buffer = 262144/327680
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:36 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"]
13/01/01 01:19:36 INFO hadoop.FlowMapper: sinking to: CoGroup(abf66f9b-db1c-4098-9a60-cd581ab4fea5*7e70de4e-2493-4c49-ac62-555817afc959)[by:abf66f9b-db1c-4098-9a60-cd581ab4fea5:[{1}:'?word']7e70de4e-2493-4c49-ac62-555817afc959:[{1}:'?word']]
13/01/01 01:19:36 INFO hadoop.FlowMapper: trapping to: Hfs["TextLine[['line']->[ALL]]"]["output/trap"]"]
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:36 INFO mapred.MapTask: Starting flush of map output
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:36 INFO mapred.MapTask: Finished spill 0
13/01/01 01:19:36 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/01/01 01:19:39 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/data/en.stop:0+544
13/01/01 01:19:39 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/01/01 01:19:39 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:39 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:39 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/data/rain.txt
13/01/01 01:19:39 INFO mapred.MapTask: numReduceTasks: 1
13/01/01 01:19:39 INFO mapred.MapTask: io.sort.mb = 100
13/01/01 01:19:39 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/01 01:19:39 INFO mapred.MapTask: record buffer = 262144/327680
13/01/01 01:19:39 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:39 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:39 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
13/01/01 01:19:39 INFO hadoop.FlowMapper: sinking to: CoGroup(abf66f9b-db1c-4098-9a60-cd581ab4fea5*7e70de4e-2493-4c49-ac62-555817afc959)[by:abf66f9b-db1c-4098-9a60-cd581ab4fea5:[{1}:'?word']7e70de4e-2493-4c49-ac62-555817afc959:[{1}:'?word']]
13/01/01 01:19:39 INFO hadoop.FlowMapper: trapping to: Hfs["TextLine[['line']->[ALL]]"]["output/trap"]"]
13/01/01 01:19:39 INFO util.Hadoop18TapUtil: setting up task: 'attempt_local_0001_m_000001_0' - file:/Users/ceteri/opt/Impatient/part6/output/trap/_temporary/_attempt_local_0001_m_000001_0
13/01/01 01:19:39 WARN stream.TrapHandler: exception trap on branch: '7e4e6052-627c-4283-8bca-694a6744a232', for fields: [{1}:'?doc-id'] tuple: ['zoink']
cascading.pipe.OperatorException: [7e4e6052-627c-4283-8bc...][sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)] operator Each failed executing operation
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:68)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
at cascading.operation.Identity.operate(Identity.java:110)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
at cascalog.ClojureMap.operate(Unknown Source)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
at cascading.operation.Identity.operate(Identity.java:110)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
at cascalog.ClojureMapcat.operate(Unknown Source)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86)
at cascading.operation.Identity.operate(Identity.java:110)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:124)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.AssertionError: Assert failed: unexpected doc-id
(pred x)
at impatient.core$assert_tuple.invoke(core.clj:19)
at clojure.lang.AFn.applyToHelper(AFn.java:167)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:605)
at clojure.core$partial$fn__446.doInvoke(core.clj:2345)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.lang.Var.invoke(Var.java:415)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.Var.applyTo(Var.java:532)
at cascalog.ClojureCascadingBase.applyFunction(Unknown Source)
at cascalog.ClojureFilter.isRemove(Unknown Source)
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:57)
... 43 more
13/01/01 01:19:39 INFO io.TapOutputCollector: closing tap collector for: output/trap/part-m-00001-00001
13/01/01 01:19:39 INFO util.Hadoop18TapUtil: committing task: 'attempt_local_0001_m_000001_0' - file:/Users/ceteri/opt/Impatient/part6/output/trap/_temporary/_attempt_local_0001_m_000001_0
13/01/01 01:19:39 INFO util.Hadoop18TapUtil: saved output of task 'attempt_local_0001_m_000001_0' to file:/Users/ceteri/opt/Impatient/part6/output/trap
13/01/01 01:19:39 INFO mapred.MapTask: Starting flush of map output
13/01/01 01:19:39 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:39 INFO mapred.MapTask: Finished spill 0
13/01/01 01:19:39 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
13/01/01 01:19:42 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/data/rain.txt:0+521
13/01/01 01:19:42 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
13/01/01 01:19:42 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:42 INFO mapred.LocalJobRunner:
13/01/01 01:19:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:42 INFO mapred.Merger: Merging 2 sorted segments
13/01/01 01:19:42 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 4175 bytes
13/01/01 01:19:42 INFO mapred.LocalJobRunner:
13/01/01 01:19:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:42 INFO hadoop.FlowReducer: sourcing from: CoGroup(abf66f9b-db1c-4098-9a60-cd581ab4fea5*7e70de4e-2493-4c49-ac62-555817afc959)[by:abf66f9b-db1c-4098-9a60-cd581ab4fea5:[{1}:'?word']7e70de4e-2493-4c49-ac62-555817afc959:[{1}:'?word']]
13/01/01 01:19:42 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?word']]"]["tmp/checkpoint/data/etl-stage"]"]
13/01/01 01:19:42 INFO hadoop.FlowReducer: trapping to: Hfs["TextLine[['line']->[ALL]]"]["output/trap"]"]
13/01/01 01:19:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:44 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec
13/01/01 01:19:44 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec
13/01/01 01:19:44 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:44 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:44 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
13/01/01 01:19:44 INFO mapred.LocalJobRunner:
13/01/01 01:19:44 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
13/01/01 01:19:44 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage
13/01/01 01:19:45 INFO mapred.LocalJobRunner: reduce > reduce
13/01/01 01:19:45 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/01/01 01:19:45 INFO util.Hadoop18TapUtil: deleting temp path tmp/checkpoint/data/etl-stage/_temporary
13/01/01 01:19:45 INFO util.Hadoop18TapUtil: deleting temp path output/trap/_temporary
13/01/01 01:19:45 INFO util.Hadoop18TapUtil: deleting temp path output/trap/_temporary
13/01/01 01:19:45 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
13/01/01 01:19:45 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part6/target/impatient.jar
13/01/01 01:19:45 INFO flow.Flow: [] starting
13/01/01 01:19:45 INFO flow.Flow: [] source: Hfs["TextDelimited[[UNKNOWN]->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"]
13/01/01 01:19:45 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"]
13/01/01 01:19:45 INFO flow.Flow: [] parallel execution is enabled: false
13/01/01 01:19:45 INFO flow.Flow: [] starting jobs: 1
13/01/01 01:19:45 INFO flow.Flow: [] allocating threads: 1
13/01/01 01:19:45 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
13/01/01 01:19:45 INFO flow.FlowStep: [] starting step: (1/1) output/wc
13/01/01 01:19:45 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part6/target/impatient.jar
13/01/01 01:19:45 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:45 INFO flow.Flow: [] starting
13/01/01 01:19:45 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc02', 'air']->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"]
13/01/01 01:19:45 INFO flow.Flow: [] sink: Hfs["SequenceFile[[UNKNOWN]->['?n-docs']]"]["/tmp/cascalog_reserved/8554be9c-a22d-4c54-b8b1-fc158dfbf932/5db49fa3-279c-4985-a0c5-6a93024f4b4f"]"]
13/01/01 01:19:45 INFO flow.Flow: [] parallel execution is enabled: false
13/01/01 01:19:45 INFO flow.Flow: [] starting jobs: 1
13/01/01 01:19:45 INFO flow.Flow: [] allocating threads: 1
13/01/01 01:19:45 INFO flow.FlowStep: [] starting step: (1/1) ...9c-4985-a0c5-6a93024f4b4f
13/01/01 01:19:45 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:45 INFO flow.FlowStep: [] submitted hadoop job: job_local_0002
13/01/01 01:19:45 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:45 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000
13/01/01 01:19:45 INFO mapred.MapTask: numReduceTasks: 1
13/01/01 01:19:45 INFO mapred.MapTask: io.sort.mb = 100
13/01/01 01:19:45 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/01 01:19:45 INFO mapred.MapTask: record buffer = 262144/327680
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:45 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:45 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[[UNKNOWN]->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"]
13/01/01 01:19:45 INFO hadoop.FlowMapper: sinking to: GroupBy(81c22799-47bd-4214-bfa9-869ab649c4cf)[by:[{1}:'?word']]
13/01/01 01:19:45 INFO mapred.MapTask: Starting flush of map output
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:45 INFO mapred.MapTask: Finished spill 0
13/01/01 01:19:45 INFO flow.FlowStep: [] submitted hadoop job: job_local_0003
13/01/01 01:19:45 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
13/01/01 01:19:45 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:45 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000
13/01/01 01:19:45 INFO mapred.MapTask: numReduceTasks: 1
13/01/01 01:19:45 INFO mapred.MapTask: io.sort.mb = 100
13/01/01 01:19:45 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/01 01:19:45 INFO mapred.MapTask: record buffer = 262144/327680
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:45 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc02', 'air']->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"]
13/01/01 01:19:45 INFO hadoop.FlowMapper: sinking to: GroupBy(b218075c-5611-4f7d-9665-be5325bedeb7)[by:[{1}:'!__gen19']]
13/01/01 01:19:45 INFO mapred.MapTask: Starting flush of map output
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:45 INFO mapred.MapTask: Finished spill 0
13/01/01 01:19:45 INFO mapred.Task: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting
13/01/01 01:19:48 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000:0+605
13/01/01 01:19:48 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done.
13/01/01 01:19:48 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:48 INFO mapred.LocalJobRunner:
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO mapred.Merger: Merging 1 sorted segments
13/01/01 01:19:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 561 bytes
13/01/01 01:19:48 INFO mapred.LocalJobRunner:
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO hadoop.FlowReducer: sourcing from: GroupBy(81c22799-47bd-4214-bfa9-869ab649c4cf)[by:[{1}:'?word']]
13/01/01 01:19:48 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"]
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
13/01/01 01:19:48 INFO mapred.LocalJobRunner:
13/01/01 01:19:48 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now
13/01/01 01:19:48 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/Users/ceteri/opt/Impatient/part6/output/wc
13/01/01 01:19:48 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000:0+605
13/01/01 01:19:48 INFO mapred.Task: Task 'attempt_local_0003_m_000000_0' done.
13/01/01 01:19:48 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:48 INFO mapred.LocalJobRunner:
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO mapred.Merger: Merging 1 sorted segments
13/01/01 01:19:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1318 bytes
13/01/01 01:19:48 INFO mapred.LocalJobRunner:
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO hadoop.FlowReducer: sourcing from: GroupBy(b218075c-5611-4f7d-9665-be5325bedeb7)[by:[{1}:'!__gen19']]
13/01/01 01:19:48 INFO hadoop.FlowReducer: sinking to: Hfs["SequenceFile[[UNKNOWN]->['?n-docs']]"]["/tmp/cascalog_reserved/8554be9c-a22d-4c54-b8b1-fc158dfbf932/5db49fa3-279c-4985-a0c5-6a93024f4b4f"]"]
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:48 INFO mapred.Task: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting
13/01/01 01:19:48 INFO mapred.LocalJobRunner:
13/01/01 01:19:48 INFO mapred.Task: Task attempt_local_0003_r_000000_0 is allowed to commit now
13/01/01 01:19:48 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0003_r_000000_0' to file:/tmp/cascalog_reserved/8554be9c-a22d-4c54-b8b1-fc158dfbf932/5db49fa3-279c-4985-a0c5-6a93024f4b4f
13/01/01 01:19:51 INFO mapred.LocalJobRunner: reduce > reduce
13/01/01 01:19:51 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done.
13/01/01 01:19:51 INFO util.Hadoop18TapUtil: deleting temp path output/wc/_temporary
13/01/01 01:19:51 INFO mapred.LocalJobRunner: reduce > reduce
13/01/01 01:19:51 INFO mapred.Task: Task 'attempt_local_0003_r_000000_0' done.
13/01/01 01:19:51 INFO util.Hadoop18TapUtil: deleting temp path /tmp/cascalog_reserved/8554be9c-a22d-4c54-b8b1-fc158dfbf932/5db49fa3-279c-4985-a0c5-6a93024f4b4f/_temporary
13/01/01 01:19:51 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:51 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:51 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
13/01/01 01:19:51 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part6/target/impatient.jar
13/01/01 01:19:52 INFO flow.Flow: [] starting
13/01/01 01:19:52 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc02', 'air']->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"]
13/01/01 01:19:52 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?tf-idf', '?tf-word']]"]["output/tfidf"]"]
13/01/01 01:19:52 INFO flow.Flow: [] parallel execution is enabled: false
13/01/01 01:19:52 INFO flow.Flow: [] starting jobs: 4
13/01/01 01:19:52 INFO flow.Flow: [] allocating threads: 1
13/01/01 01:19:52 INFO flow.FlowStep: [] starting step: (1/4)
13/01/01 01:19:52 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:52 INFO flow.FlowStep: [] submitted hadoop job: job_local_0004
13/01/01 01:19:52 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:52 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:52 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000
13/01/01 01:19:52 INFO mapred.MapTask: numReduceTasks: 0
13/01/01 01:19:52 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:52 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc02', 'air']->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"]
13/01/01 01:19:52 INFO hadoop.FlowMapper: sinking to: TempHfs["SequenceFile[['?doc-id', '?word']]"][cd7e380f-49d0-4b67-a91f-5/72115/]
13/01/01 01:19:52 INFO mapred.Task: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting
13/01/01 01:19:52 INFO mapred.LocalJobRunner:
13/01/01 01:19:52 INFO mapred.Task: Task attempt_local_0004_m_000000_0 is allowed to commit now
13/01/01 01:19:52 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0004_m_000000_0' to file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537
13/01/01 01:19:55 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000:0+605
13/01/01 01:19:55 INFO mapred.Task: Task 'attempt_local_0004_m_000000_0' done.
13/01/01 01:19:55 INFO flow.FlowStep: [] starting step: (2/4)
13/01/01 01:19:55 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:19:55 INFO flow.FlowStep: [] submitted hadoop job: job_local_0005
13/01/01 01:19:55 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:55 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537/part-00000
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:55 INFO mapred.MapTask: numReduceTasks: 1
13/01/01 01:19:55 INFO mapred.MapTask: io.sort.mb = 100
13/01/01 01:19:55 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/01 01:19:55 INFO mapred.MapTask: record buffer = 262144/327680
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:55 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?doc-id', '?word']]"][cd7e380f-49d0-4b67-a91f-5/72115/]
13/01/01 01:19:55 INFO hadoop.FlowMapper: sinking to: GroupBy(62b0e591-1804-462d-b3d8-5acfacdf6f9b)[by:[{2}:'?tf-word', '?doc-id']]
13/01/01 01:19:55 INFO mapred.MapTask: Starting flush of map output
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:55 INFO mapred.MapTask: Finished spill 0
13/01/01 01:19:55 INFO mapred.Task: Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting
13/01/01 01:19:58 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537/part-00000:0+1511
13/01/01 01:19:58 INFO mapred.Task: Task 'attempt_local_0005_m_000000_0' done.
13/01/01 01:19:58 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:19:58 INFO mapred.LocalJobRunner:
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:58 INFO mapred.Merger: Merging 1 sorted segments
13/01/01 01:19:58 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1295 bytes
13/01/01 01:19:58 INFO mapred.LocalJobRunner:
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:58 INFO hadoop.FlowReducer: sourcing from: GroupBy(62b0e591-1804-462d-b3d8-5acfacdf6f9b)[by:[{2}:'?tf-word', '?doc-id']]
13/01/01 01:19:58 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?tf-word', '?doc-id', '?tf-count']]"][aaef0093-4403-415b-8367-e/21694/]
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:19:58 INFO mapred.Task: Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting
13/01/01 01:19:58 INFO mapred.LocalJobRunner:
13/01/01 01:19:58 INFO mapred.Task: Task attempt_local_0005_r_000000_0 is allowed to commit now
13/01/01 01:19:58 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0005_r_000000_0' to file:/tmp/hadoop-ceteri/aaef0093_4403_415b_8367_e_21694_AB930F031BA44B520A0D4C55B8829388
13/01/01 01:20:01 INFO mapred.LocalJobRunner: reduce > reduce
13/01/01 01:20:01 INFO mapred.Task: Task 'attempt_local_0005_r_000000_0' done.
13/01/01 01:20:01 INFO flow.FlowStep: [] starting step: (3/4)
13/01/01 01:20:01 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:20:01 INFO flow.FlowStep: [] submitted hadoop job: job_local_0006
13/01/01 01:20:01 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:01 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537/part-00000
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:01 INFO mapred.MapTask: numReduceTasks: 1
13/01/01 01:20:01 INFO mapred.MapTask: io.sort.mb = 100
13/01/01 01:20:01 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/01 01:20:01 INFO mapred.MapTask: record buffer = 262144/327680
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:01 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?doc-id', '?word']]"][cd7e380f-49d0-4b67-a91f-5/72115/]
13/01/01 01:20:01 INFO hadoop.FlowMapper: sinking to: GroupBy(c001f807-a69f-47f0-8d8a-b6a9c417e18d)[by:[{1}:'?df-word']]
13/01/01 01:20:01 INFO mapred.MapTask: Starting flush of map output
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:01 INFO mapred.MapTask: Finished spill 0
13/01/01 01:20:01 INFO mapred.Task: Task:attempt_local_0006_m_000000_0 is done. And is in the process of commiting
13/01/01 01:20:04 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537/part-00000:0+1511
13/01/01 01:20:04 INFO mapred.Task: Task 'attempt_local_0006_m_000000_0' done.
13/01/01 01:20:04 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:20:04 INFO mapred.LocalJobRunner:
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:04 INFO mapred.Merger: Merging 1 sorted segments
13/01/01 01:20:04 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 2226 bytes
13/01/01 01:20:04 INFO mapred.LocalJobRunner:
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:04 INFO hadoop.FlowReducer: sourcing from: GroupBy(c001f807-a69f-47f0-8d8a-b6a9c417e18d)[by:[{1}:'?df-word']]
13/01/01 01:20:04 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?df-count', '?tf-word']]"][d7841a4f-5383-49c8-8020-d/17408/]
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:04 INFO mapred.Task: Task:attempt_local_0006_r_000000_0 is done. And is in the process of commiting
13/01/01 01:20:04 INFO mapred.LocalJobRunner:
13/01/01 01:20:04 INFO mapred.Task: Task attempt_local_0006_r_000000_0 is allowed to commit now
13/01/01 01:20:04 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0006_r_000000_0' to file:/tmp/hadoop-ceteri/d7841a4f_5383_49c8_8020_d_17408_0E81F006944699B723B8F7297ED7752F
13/01/01 01:20:07 INFO mapred.LocalJobRunner: reduce > reduce
13/01/01 01:20:07 INFO mapred.Task: Task 'attempt_local_0006_r_000000_0' done.
13/01/01 01:20:07 INFO flow.FlowStep: [] starting step: (4/4) output/tfidf
13/01/01 01:20:07 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:20:07 INFO mapred.FileInputFormat: Total input paths to process : 1
13/01/01 01:20:07 INFO flow.FlowStep: [] submitted hadoop job: job_local_0007
13/01/01 01:20:07 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:07 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/aaef0093_4403_415b_8367_e_21694_AB930F031BA44B520A0D4C55B8829388/part-00000
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:07 INFO mapred.MapTask: numReduceTasks: 1
13/01/01 01:20:07 INFO mapred.MapTask: io.sort.mb = 100
13/01/01 01:20:07 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/01 01:20:07 INFO mapred.MapTask: record buffer = 262144/327680
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:07 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?tf-word', '?doc-id', '?tf-count']]"][aaef0093-4403-415b-8367-e/21694/]
13/01/01 01:20:07 INFO hadoop.FlowMapper: sinking to: CoGroup(aaef0093-4403-415b-8367-ef9f9858c4a1*d7841a4f-5383-49c8-8020-d12ad48809be)[by:aaef0093-4403-415b-8367-ef9f9858c4a1:[{1}:'?tf-word']d7841a4f-5383-49c8-8020-d12ad48809be:[{1}:'?tf-word']]
13/01/01 01:20:07 INFO mapred.MapTask: Starting flush of map output
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:07 INFO mapred.MapTask: Finished spill 0
13/01/01 01:20:07 INFO mapred.Task: Task:attempt_local_0007_m_000000_0 is done. And is in the process of commiting
13/01/01 01:20:10 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/aaef0093_4403_415b_8367_e_21694_AB930F031BA44B520A0D4C55B8829388/part-00000:0+1543
13/01/01 01:20:10 INFO mapred.Task: Task 'attempt_local_0007_m_000000_0' done.
13/01/01 01:20:10 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:10 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/d7841a4f_5383_49c8_8020_d_17408_0E81F006944699B723B8F7297ED7752F/part-00000
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:10 INFO mapred.MapTask: numReduceTasks: 1
13/01/01 01:20:10 INFO mapred.MapTask: io.sort.mb = 100
13/01/01 01:20:10 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/01 01:20:10 INFO mapred.MapTask: record buffer = 262144/327680
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:10 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?df-count', '?tf-word']]"][d7841a4f-5383-49c8-8020-d/17408/]
13/01/01 01:20:10 INFO hadoop.FlowMapper: sinking to: CoGroup(aaef0093-4403-415b-8367-ef9f9858c4a1*d7841a4f-5383-49c8-8020-d12ad48809be)[by:aaef0093-4403-415b-8367-ef9f9858c4a1:[{1}:'?tf-word']d7841a4f-5383-49c8-8020-d12ad48809be:[{1}:'?tf-word']]
13/01/01 01:20:10 INFO mapred.MapTask: Starting flush of map output
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:10 INFO mapred.MapTask: Finished spill 0
13/01/01 01:20:10 INFO mapred.Task: Task:attempt_local_0007_m_000001_0 is done. And is in the process of commiting
13/01/01 01:20:13 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/d7841a4f_5383_49c8_8020_d_17408_0E81F006944699B723B8F7297ED7752F/part-00000:0+764
13/01/01 01:20:13 INFO mapred.Task: Task 'attempt_local_0007_m_000001_0' done.
13/01/01 01:20:13 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/01/01 01:20:13 INFO mapred.LocalJobRunner:
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:13 INFO mapred.Merger: Merging 2 sorted segments
13/01/01 01:20:13 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 1946 bytes
13/01/01 01:20:13 INFO mapred.LocalJobRunner:
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:13 INFO hadoop.FlowReducer: sourcing from: CoGroup(aaef0093-4403-415b-8367-ef9f9858c4a1*d7841a4f-5383-49c8-8020-d12ad48809be)[by:aaef0093-4403-415b-8367-ef9f9858c4a1:[{1}:'?tf-word']d7841a4f-5383-49c8-8020-d12ad48809be:[{1}:'?tf-word']]
13/01/01 01:20:13 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?tf-idf', '?tf-word']]"]["output/tfidf"]"]
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:13 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec
13/01/01 01:20:13 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
13/01/01 01:20:13 INFO mapred.Task: Task:attempt_local_0007_r_000000_0 is done. And is in the process of commiting
13/01/01 01:20:13 INFO mapred.LocalJobRunner:
13/01/01 01:20:13 INFO mapred.Task: Task attempt_local_0007_r_000000_0 is allowed to commit now
13/01/01 01:20:13 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0007_r_000000_0' to file:/Users/ceteri/opt/Impatient/part6/output/tfidf
13/01/01 01:20:16 INFO mapred.LocalJobRunner: reduce > reduce
13/01/01 01:20:16 INFO mapred.Task: Task 'attempt_local_0007_r_000000_0' done.
13/01/01 01:20:16 INFO util.Hadoop18TapUtil: deleting temp path output/tfidf/_temporary
13/01/01 01:20:16 INFO checkpointed-workflow: Workflow completed successfully
bash-3.2$ more output/trap/part-m-00001-00001
zoink
bash-3.2$ more output/tfidf/part-00000
doc02 0.22314355131420976 area
doc01 0.44628710262841953 area
doc03 0.22314355131420976 area
doc05 0.9162907318741551 australia
doc05 0.9162907318741551 broken
doc04 0.9162907318741551 california's
doc04 0.9162907318741551 cause
doc02 0.9162907318741551 cloudcover
doc04 0.9162907318741551 death
doc04 0.9162907318741551 deserts
doc03 0.9162907318741551 downwind
doc01 0.22314355131420976 dry
doc02 0.22314355131420976 dry
doc03 0.22314355131420976 dry
doc05 0.9162907318741551 dvd
doc04 0.9162907318741551 effect
doc04 0.9162907318741551 known
doc03 0.5108256237659907 land
doc05 0.5108256237659907 land
doc01 0.5108256237659907 lee
doc02 0.5108256237659907 lee
doc04 0.5108256237659907 leeward
doc03 0.5108256237659907 leeward
doc02 0.9162907318741551 less
doc03 0.9162907318741551 lies
doc02 0.22314355131420976 mountain
doc03 0.22314355131420976 mountain
doc04 0.22314355131420976 mountain
doc01 0.9162907318741551 mountainous
doc04 0.9162907318741551 primary
doc02 0.9162907318741551 produces
doc04 0.0 rain
doc01 0.0 rain
doc02 0.0 rain
doc03 0.0 rain
doc04 0.9162907318741551 ranges
doc05 0.9162907318741551 secrets
doc01 0.0 shadow
doc02 0.0 shadow
doc03 0.0 shadow
doc04 0.0 shadow
doc02 0.9162907318741551 sinking
doc04 0.9162907318741551 such
doc04 0.9162907318741551 valley
doc05 0.9162907318741551 women
bash-3.2$ lein test
Retrieving org/clojure/clojure/maven-metadata.xml (2k)
from http://repo1.maven.org/maven2/
Retrieving org/clojure/clojure/maven-metadata.xml (1k)
from https://clojars.org/repo/
Retrieving org/clojure/clojure/maven-metadata.xml (2k)
from http://repo1.maven.org/maven2/
Retrieving org/clojure/clojure/maven-metadata.xml
from http://oss.sonatype.org/content/repositories/snapshots/
Retrieving org/clojure/clojure/maven-metadata.xml
from http://oss.sonatype.org/content/repositories/releases/
lein test impatient.core-test
Ran 2 tests containing 2 assertions.
0 failures, 0 errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment