Last active
December 10, 2015 09:58
-
-
Save ceteri/4417586 to your computer and use it in GitHub Desktop.
Cascading for the Impatient, Part 9
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ lein repl | |
Listening for transport dt_socket at address: 51539 | |
nREPL server started on port 51542 | |
REPL-y 0.1.0-beta10 | |
Clojure 1.4.0 | |
Exit: Control+D or (exit) or (quit) | |
Commands: (user/help) | |
Docs: (doc function-name-here) | |
(find-doc "part-of-name-here") | |
Source: (source function-name-here) | |
(user/sourcery function-name-here) | |
Javadoc: (javadoc java-object-or-class-here) | |
Examples from clojuredocs.org: [clojuredocs or cdoc] | |
(user/clojuredocs name-here) | |
(user/clojuredocs "ns-here" "name-here") | |
user=> (use 'cascalog.playground) (bootstrap) | |
nil | |
nil | |
user=> (?<- (stdout) [?person] (age ?person 25)) | |
12/12/30 21:42:10 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster | |
12/12/30 21:42:10 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/.m2/repository/cascading/cascading-hadoop/2.0.3/cascading-hadoop-2.0.3.jar | |
12/12/30 21:42:10 INFO property.AppProps: using app.id: 75CD13874F775C9F50B85D4020DB8597 | |
12/12/30 21:42:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:11 INFO flow.Flow: [] starting | |
12/12/30 21:42:11 INFO flow.Flow: [] source: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/8aae2eaf-cee3-4569-8773-3340098b860b"]"] | |
12/12/30 21:42:11 INFO flow.Flow: [] sink: StdoutTap["SequenceFile[[UNKNOWN]->['?person']]"]["/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp84928667373582098931356932530374073000"]"] | |
12/12/30 21:42:11 INFO flow.Flow: [] parallel execution is enabled: false | |
12/12/30 21:42:11 INFO flow.Flow: [] starting jobs: 1 | |
12/12/30 21:42:11 INFO flow.Flow: [] allocating threads: 1 | |
12/12/30 21:42:11 INFO flow.FlowStep: [] starting step: (1/1) ...2098931356932530374073000 | |
12/12/30 21:42:11 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= | |
12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:11 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001 | |
12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:11 INFO mapred.MapTask: numReduceTasks: 0 | |
12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:11 INFO hadoop.FlowMapper: cascading version: Concurrent, Inc - Cascading 2.0.3 | |
12/12/30 21:42:11 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m | |
12/12/30 21:42:11 INFO hadoop.FlowMapper: sourcing from: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/8aae2eaf-cee3-4569-8773-3340098b860b"]"] | |
12/12/30 21:42:11 INFO hadoop.FlowMapper: sinking to: StdoutTap["SequenceFile[[UNKNOWN]->['?person']]"]["/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp84928667373582098931356932530374073000"]"] | |
12/12/30 21:42:11 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting | |
12/12/30 21:42:11 INFO mapred.LocalJobRunner: | |
12/12/30 21:42:11 INFO mapred.TaskRunner: Task attempt_local_0001_m_000000_0 is allowed to commit now | |
12/12/30 21:42:11 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to file:/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp84928667373582098931356932530374073000 | |
12/12/30 21:42:11 INFO mapred.LocalJobRunner: | |
12/12/30 21:42:11 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done. | |
12/12/30 21:42:11 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
RESULTS | |
----------------------- | |
nil | |
user=> 12/12/30 21:42:11 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:11 INFO util.Hadoop18TapUtil: deleting temp path /var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp84928667373582098931356932530374073000/_temporary | |
david | |
emily | |
----------------------- | |
user=> (?<- (stdout) [?person] (age ?person ?age) (< ?age 30)) | |
12/12/30 21:42:23 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster | |
12/12/30 21:42:23 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/.m2/repository/cascading/cascading-hadoop/2.0.3/cascading-hadoop-2.0.3.jar | |
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:23 INFO flow.Flow: [] starting | |
12/12/30 21:42:23 INFO flow.Flow: [] source: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/9542bdd3-e677-416e-81c1-a5c36e4c53ca"]"] | |
12/12/30 21:42:23 INFO flow.Flow: [] sink: StdoutTap["SequenceFile[[UNKNOWN]->['?person']]"]["/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp80760086378556118051356932542912465000"]"] | |
12/12/30 21:42:23 INFO flow.Flow: [] parallel execution is enabled: false | |
12/12/30 21:42:23 INFO flow.Flow: [] starting jobs: 1 | |
12/12/30 21:42:23 INFO flow.Flow: [] allocating threads: 1 | |
12/12/30 21:42:23 INFO flow.FlowStep: [] starting step: (1/1) ...6118051356932542912465000 | |
12/12/30 21:42:23 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized | |
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:23 INFO flow.FlowStep: [] submitted hadoop job: job_local_0002 | |
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:23 INFO mapred.MapTask: numReduceTasks: 0 | |
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:23 INFO hadoop.FlowMapper: cascading version: Concurrent, Inc - Cascading 2.0.3 | |
12/12/30 21:42:23 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m | |
12/12/30 21:42:23 INFO hadoop.FlowMapper: sourcing from: MemorySourceTap["MemorySourceScheme[[UNKNOWN]->[ALL]]"]["/9542bdd3-e677-416e-81c1-a5c36e4c53ca"]"] | |
12/12/30 21:42:23 INFO hadoop.FlowMapper: sinking to: StdoutTap["SequenceFile[[UNKNOWN]->['?person']]"]["/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp80760086378556118051356932542912465000"]"] | |
12/12/30 21:42:23 INFO mapred.TaskRunner: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting | |
12/12/30 21:42:23 INFO mapred.LocalJobRunner: | |
12/12/30 21:42:23 INFO mapred.TaskRunner: Task attempt_local_0002_m_000000_0 is allowed to commit now | |
12/12/30 21:42:23 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_m_000000_0' to file:/var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp80760086378556118051356932542912465000 | |
12/12/30 21:42:23 INFO mapred.LocalJobRunner: | |
12/12/30 21:42:23 INFO mapred.TaskRunner: Task 'attempt_local_0002_m_000000_0' done. | |
RESULTS | |
----------------------- | |
nil | |
user=> 12/12/30 21:42:23 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/12/30 21:42:23 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/30 21:42:23 INFO util.Hadoop18TapUtil: deleting temp path /var/folders/bl/zrtbg3cd57lfsgzzllhnxxjc0000gn/T/temp80760086378556118051356932542912465000/_temporary | |
alice | |
david | |
emily | |
gary | |
kumar | |
----------------------- | |
user=> Bye for now! | |
bash-3.2$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ pwd | |
/Users/ceteri/opt/Impatient/part1 | |
bash-3.2$ lein uberjar | |
Created /Users/ceteri/opt/Impatient/part1/target/impatient-0.1.0-SNAPSHOT.jar | |
Including impatient-0.1.0-SNAPSHOT.jar | |
Including reflectasm-1.06-shaded.jar | |
Including slf4j-log4j12-1.6.1.jar | |
Including cascading-core-2.0.0.jar | |
Including cascading-hadoop-2.0.0.jar | |
Including objenesis-1.2.jar | |
Including meat-locker-0.3.0.jar | |
Including kryo-2.16.jar | |
Including tools.macro-0.1.1.jar | |
Including tools.logging-0.2.3.jar | |
Including minlog-1.2.jar | |
Including jgrapht-jdk1.6-0.8.1.jar | |
Including clojure-1.4.0.jar | |
Including log4j-1.2.16.jar | |
Including jackknife-0.1.2.jar | |
Including cascading.kryo-0.4.0.jar | |
Including hadoop-util-0.2.8.jar | |
Including maple-0.2.0.jar | |
Including riffle-0.1-dev.jar | |
Including cascalog-more-taps-0.3.0.jar | |
Including asm-4.0.jar | |
Including carbonite-1.3.0.jar | |
Including cascalog-1.10.0.jar | |
Including slf4j-api-1.6.1.jar | |
Created /Users/ceteri/opt/Impatient/part1/target/impatient.jar | |
bash-3.2$ | |
bash-3.2$ rm -rf output | |
bash-3.2$ hadoop jar ./target/impatient.jar data/rain.txt output/rain | |
Warning: $HADOOP_HOME is deprecated. | |
12/12/31 19:16:01 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core | |
12/12/31 19:16:01 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part1/./target/impatient.jar | |
12/12/31 19:16:01 INFO property.AppProps: using app.id: 0FA333331029D47F7CDF748E0D802569 | |
2012-12-31 19:16:01.153 java[17399:1903] Unable to load realm info from SCDynamicStore | |
12/12/31 19:16:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
12/12/31 19:16:01 WARN snappy.LoadSnappy: Snappy native library not loaded | |
12/12/31 19:16:01 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/12/31 19:16:01 INFO util.Version: Concurrent, Inc - Cascading 2.0.0 | |
12/12/31 19:16:01 INFO flow.Flow: [] starting | |
12/12/31 19:16:01 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"] | |
12/12/31 19:16:01 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?doc', '?line']]"]["output/rain"]"] | |
12/12/31 19:16:01 INFO flow.Flow: [] parallel execution is enabled: false | |
12/12/31 19:16:01 INFO flow.Flow: [] starting jobs: 1 | |
12/12/31 19:16:01 INFO flow.Flow: [] allocating threads: 1 | |
12/12/31 19:16:01 INFO flow.FlowStep: [] starting step: (1/1) output/rain | |
12/12/31 19:16:01 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/12/31 19:16:01 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001 | |
12/12/31 19:16:01 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
12/12/31 19:16:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:16:01 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part1/data/rain.txt | |
12/12/31 19:16:01 INFO mapred.MapTask: numReduceTasks: 0 | |
12/12/31 19:16:01 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"] | |
12/12/31 19:16:01 INFO hadoop.FlowMapper: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?doc', '?line']]"]["output/rain"]"] | |
12/12/31 19:16:01 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting | |
12/12/31 19:16:01 INFO mapred.LocalJobRunner: | |
12/12/31 19:16:01 INFO mapred.Task: Task attempt_local_0001_m_000000_0 is allowed to commit now | |
12/12/31 19:16:01 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to file:/Users/ceteri/opt/Impatient/part1/output/rain | |
12/12/31 19:16:04 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part1/data/rain.txt:0+510 | |
12/12/31 19:16:04 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. | |
12/12/31 19:16:04 INFO util.Hadoop18TapUtil: deleting temp path output/rain/_temporary | |
bash-3.2$ cat output/rain/part-00000 | |
doc01 A rain shadow is a dry area on the lee back side of a mountainous area. | |
doc02 This sinking, dry air produces a rain shadow, or area in the lee of a mountain with less rain and cloudcover. | |
doc03 A rain shadow is an area of dry land that lies on the leeward (or downwind) side of a mountain. | |
doc04 This is known as the rain shadow effect and is the primary cause of leeward deserts of mountain ranges, such as California's Death Valley. | |
doc05 Two Women. Secrets. A Broken Land. [DVD Australia] | |
bash-3.2$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ pwd | |
/Users/ceteri/opt/Impatient/part4 | |
bash-3.2$ lein uberjar | |
Created /Users/ceteri/opt/Impatient/part4/target/impatient-0.1.0-SNAPSHOT.jar | |
Including impatient-0.1.0-SNAPSHOT.jar | |
Including reflectasm-1.06-shaded.jar | |
Including slf4j-log4j12-1.6.1.jar | |
Including cascading-core-2.0.0.jar | |
Including cascading-hadoop-2.0.0.jar | |
Including objenesis-1.2.jar | |
Including meat-locker-0.3.0.jar | |
Including kryo-2.16.jar | |
Including tools.macro-0.1.1.jar | |
Including tools.logging-0.2.3.jar | |
Including minlog-1.2.jar | |
Including jgrapht-jdk1.6-0.8.1.jar | |
Including clojure-1.4.0.jar | |
Including log4j-1.2.16.jar | |
Including jackknife-0.1.2.jar | |
Including cascading.kryo-0.4.0.jar | |
Including hadoop-util-0.2.8.jar | |
Including maple-0.2.0.jar | |
Including riffle-0.1-dev.jar | |
Including cascalog-more-taps-0.3.0.jar | |
Including asm-4.0.jar | |
Including carbonite-1.3.0.jar | |
Including cascalog-1.10.0.jar | |
Including slf4j-api-1.6.1.jar | |
Created /Users/ceteri/opt/Impatient/part4/target/impatient.jar | |
bash-3.2$ rm -rf output | |
bash-3.2$ hadoop jar ./target/impatient.jar data/rain.txt output/wc data/en.stop | |
Warning: $HADOOP_HOME is deprecated. | |
12/12/31 19:57:20 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core | |
12/12/31 19:57:20 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part4/./target/impatient.jar | |
12/12/31 19:57:20 INFO property.AppProps: using app.id: 018CCCB36A27EFF4C33268E758DCEE9C | |
2012-12-31 19:57:20.696 java[17478:1903] Unable to load realm info from SCDynamicStore | |
12/12/31 19:57:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
12/12/31 19:57:20 WARN snappy.LoadSnappy: Snappy native library not loaded | |
12/12/31 19:57:20 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/12/31 19:57:20 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/12/31 19:57:20 INFO util.Version: Concurrent, Inc - Cascading 2.0.0 | |
12/12/31 19:57:20 INFO flow.Flow: [] starting | |
12/12/31 19:57:20 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"] | |
12/12/31 19:57:20 INFO flow.Flow: [] source: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"] | |
12/12/31 19:57:20 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"] | |
12/12/31 19:57:20 INFO flow.Flow: [] parallel execution is enabled: false | |
12/12/31 19:57:20 INFO flow.Flow: [] starting jobs: 2 | |
12/12/31 19:57:20 INFO flow.Flow: [] allocating threads: 1 | |
12/12/31 19:57:20 INFO flow.FlowStep: [] starting step: (1/2) | |
12/12/31 19:57:21 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/12/31 19:57:21 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/12/31 19:57:21 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001 | |
12/12/31 19:57:21 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:21 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part4/data/en.stop | |
12/12/31 19:57:21 INFO mapred.MapTask: numReduceTasks: 1 | |
12/12/31 19:57:21 INFO mapred.MapTask: io.sort.mb = 100 | |
12/12/31 19:57:21 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
12/12/31 19:57:21 INFO mapred.MapTask: record buffer = 262144/327680 | |
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:21 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"] | |
12/12/31 19:57:21 INFO hadoop.FlowMapper: sinking to: CoGroup(05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3*226bb6d6-5a62-49a3-afe6-debd65484c23)[by:05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3:[{1}:'?word']226bb6d6-5a62-49a3-afe6-debd65484c23:[{1}:'?word']] | |
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:21 INFO mapred.MapTask: Starting flush of map output | |
12/12/31 19:57:21 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:21 INFO mapred.MapTask: Finished spill 0 | |
12/12/31 19:57:21 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting | |
12/12/31 19:57:24 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part4/data/en.stop:0+544 | |
12/12/31 19:57:24 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. | |
12/12/31 19:57:24 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
12/12/31 19:57:24 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:24 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part4/data/rain.txt | |
12/12/31 19:57:24 INFO mapred.MapTask: numReduceTasks: 1 | |
12/12/31 19:57:24 INFO mapred.MapTask: io.sort.mb = 100 | |
12/12/31 19:57:24 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
12/12/31 19:57:24 INFO mapred.MapTask: record buffer = 262144/327680 | |
12/12/31 19:57:24 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:24 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:24 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"] | |
12/12/31 19:57:24 INFO hadoop.FlowMapper: sinking to: CoGroup(05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3*226bb6d6-5a62-49a3-afe6-debd65484c23)[by:05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3:[{1}:'?word']226bb6d6-5a62-49a3-afe6-debd65484c23:[{1}:'?word']] | |
12/12/31 19:57:24 INFO mapred.MapTask: Starting flush of map output | |
12/12/31 19:57:24 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:24 INFO mapred.MapTask: Finished spill 0 | |
12/12/31 19:57:24 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting | |
12/12/31 19:57:27 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part4/data/rain.txt:0+510 | |
12/12/31 19:57:27 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done. | |
12/12/31 19:57:27 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
12/12/31 19:57:27 INFO mapred.LocalJobRunner: | |
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:27 INFO mapred.Merger: Merging 2 sorted segments | |
12/12/31 19:57:27 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 3285 bytes | |
12/12/31 19:57:27 INFO mapred.LocalJobRunner: | |
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:27 INFO hadoop.FlowReducer: sourcing from: CoGroup(05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3*226bb6d6-5a62-49a3-afe6-debd65484c23)[by:05eaf3b0-0553-4c2a-bee7-66a9d6cd8cc3:[{1}:'?word']226bb6d6-5a62-49a3-afe6-debd65484c23:[{1}:'?word']] | |
12/12/31 19:57:27 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?word', '!__gen16']]"][f9c244ec-acb0-49a1-af11-d/44884/] | |
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:27 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:29 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec | |
12/12/31 19:57:29 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec | |
12/12/31 19:57:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:29 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:29 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting | |
12/12/31 19:57:29 INFO mapred.LocalJobRunner: | |
12/12/31 19:57:29 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now | |
12/12/31 19:57:29 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/tmp/hadoop-ceteri/f9c244ec_acb0_49a1_af11_d_44884_13A5B5731DB1CB1DC6B7BB49BF3F5A46 | |
12/12/31 19:57:30 INFO mapred.LocalJobRunner: reduce > reduce | |
12/12/31 19:57:30 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done. | |
12/12/31 19:57:30 INFO flow.FlowStep: [] starting step: (2/2) output/wc | |
12/12/31 19:57:30 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
12/12/31 19:57:30 INFO flow.FlowStep: [] submitted hadoop job: job_local_0002 | |
12/12/31 19:57:30 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:30 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/f9c244ec_acb0_49a1_af11_d_44884_13A5B5731DB1CB1DC6B7BB49BF3F5A46/part-00000 | |
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:30 INFO mapred.MapTask: numReduceTasks: 1 | |
12/12/31 19:57:30 INFO mapred.MapTask: io.sort.mb = 100 | |
12/12/31 19:57:30 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
12/12/31 19:57:30 INFO mapred.MapTask: record buffer = 262144/327680 | |
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:30 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?word', '!__gen16']]"][f9c244ec-acb0-49a1-af11-d/44884/] | |
12/12/31 19:57:30 INFO hadoop.FlowMapper: sinking to: GroupBy(f9c244ec-acb0-49a1-af11-d27e79ada8d9)[by:[{1}:'?word']] | |
12/12/31 19:57:30 INFO mapred.MapTask: Starting flush of map output | |
12/12/31 19:57:30 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:30 INFO mapred.MapTask: Finished spill 0 | |
12/12/31 19:57:30 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting | |
12/12/31 19:57:33 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/f9c244ec_acb0_49a1_af11_d_44884_13A5B5731DB1CB1DC6B7BB49BF3F5A46/part-00000:0+784 | |
12/12/31 19:57:33 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done. | |
12/12/31 19:57:33 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
12/12/31 19:57:33 INFO mapred.LocalJobRunner: | |
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:33 INFO mapred.Merger: Merging 1 sorted segments | |
12/12/31 19:57:33 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 561 bytes | |
12/12/31 19:57:33 INFO mapred.LocalJobRunner: | |
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:33 INFO hadoop.FlowReducer: sourcing from: GroupBy(f9c244ec-acb0-49a1-af11-d27e79ada8d9)[by:[{1}:'?word']] | |
12/12/31 19:57:33 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"] | |
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:33 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
12/12/31 19:57:33 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting | |
12/12/31 19:57:33 INFO mapred.LocalJobRunner: | |
12/12/31 19:57:33 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now | |
12/12/31 19:57:33 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/Users/ceteri/opt/Impatient/part4/output/wc | |
12/12/31 19:57:36 INFO mapred.LocalJobRunner: reduce > reduce | |
12/12/31 19:57:36 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done. | |
12/12/31 19:57:36 INFO util.Hadoop18TapUtil: deleting temp path output/wc/_temporary | |
bash-3.2$ more output/wc/part-00000 | |
air 1 | |
area 4 | |
australia 1 | |
broken 1 | |
california's 1 | |
cause 1 | |
cloudcover 1 | |
death 1 | |
deserts 1 | |
downwind 1 | |
dry 3 | |
dvd 1 | |
effect 1 | |
known 1 | |
land 2 | |
lee 2 | |
leeward 2 | |
less 1 | |
lies 1 | |
mountain 3 | |
mountainous 1 | |
primary 1 | |
produces 1 | |
rain 5 | |
ranges 1 | |
secrets 1 | |
shadow 4 | |
sinking 1 | |
such 1 | |
valley 1 | |
women 1 | |
bash-3.2$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ pwd | |
/Users/ceteri/opt/Impatient/part6 | |
bash-3.2$ lein uberjar | |
Created /Users/ceteri/opt/Impatient/part6/target/impatient-0.1.0-SNAPSHOT.jar | |
Including impatient-0.1.0-SNAPSHOT.jar | |
Including cascalog-checkpoint-0.2.0.jar | |
Including reflectasm-1.06-shaded.jar | |
Including slf4j-log4j12-1.6.1.jar | |
Including cascading-core-2.0.0.jar | |
Including cascading-hadoop-2.0.0.jar | |
Including objenesis-1.2.jar | |
Including meat-locker-0.3.0.jar | |
Including kryo-2.16.jar | |
Including tools.macro-0.1.1.jar | |
Including tools.logging-0.2.3.jar | |
Including minlog-1.2.jar | |
Including jgrapht-jdk1.6-0.8.1.jar | |
Including clojure-1.4.0.jar | |
Including log4j-1.2.16.jar | |
Including jackknife-0.1.2.jar | |
Including cascading.kryo-0.4.0.jar | |
Including hadoop-util-0.2.8.jar | |
Including maple-0.2.0.jar | |
Including riffle-0.1-dev.jar | |
Including cascalog-more-taps-0.3.0.jar | |
Including asm-4.0.jar | |
Including carbonite-1.3.0.jar | |
Including cascalog-1.10.0.jar | |
Including slf4j-api-1.6.1.jar | |
Created /Users/ceteri/opt/Impatient/part6/target/impatient.jar | |
bash-3.2$ rm -rf output | |
bash-3.2$ hadoop jar target/impatient.jar data/rain.txt output/wc data/en.stop output/tfidf | |
Warning: $HADOOP_HOME is deprecated. | |
2013-01-01 01:19:35.234 java[17801:1903] Unable to load realm info from SCDynamicStore | |
13/01/01 01:19:35 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster | |
13/01/01 01:19:35 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part6/target/impatient.jar | |
13/01/01 01:19:35 INFO property.AppProps: using app.id: D246032EBAD75DCC3C0BE86023BAFCA6 | |
13/01/01 01:19:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
13/01/01 01:19:35 WARN snappy.LoadSnappy: Snappy native library not loaded | |
13/01/01 01:19:35 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:35 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:36 INFO util.Version: Concurrent, Inc - Cascading 2.0.0 | |
13/01/01 01:19:36 INFO flow.Flow: [] starting | |
13/01/01 01:19:36 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"] | |
13/01/01 01:19:36 INFO flow.Flow: [] source: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"] | |
13/01/01 01:19:36 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?word']]"]["tmp/checkpoint/data/etl-stage"]"] | |
13/01/01 01:19:36 INFO flow.Flow: [] parallel execution is enabled: false | |
13/01/01 01:19:36 INFO flow.Flow: [] starting jobs: 1 | |
13/01/01 01:19:36 INFO flow.Flow: [] allocating threads: 1 | |
13/01/01 01:19:36 INFO flow.FlowStep: [] starting step: (1/1) ...checkpoint/data/etl-stage | |
13/01/01 01:19:36 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:36 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:36 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001 | |
13/01/01 01:19:36 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:36 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/data/en.stop | |
13/01/01 01:19:36 INFO mapred.MapTask: numReduceTasks: 1 | |
13/01/01 01:19:36 INFO mapred.MapTask: io.sort.mb = 100 | |
13/01/01 01:19:36 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
13/01/01 01:19:36 INFO mapred.MapTask: record buffer = 262144/327680 | |
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:36 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['stop']->[ALL]]"]["data/en.stop"]"] | |
13/01/01 01:19:36 INFO hadoop.FlowMapper: sinking to: CoGroup(abf66f9b-db1c-4098-9a60-cd581ab4fea5*7e70de4e-2493-4c49-ac62-555817afc959)[by:abf66f9b-db1c-4098-9a60-cd581ab4fea5:[{1}:'?word']7e70de4e-2493-4c49-ac62-555817afc959:[{1}:'?word']] | |
13/01/01 01:19:36 INFO hadoop.FlowMapper: trapping to: Hfs["TextLine[['line']->[ALL]]"]["output/trap"]"] | |
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:36 INFO mapred.MapTask: Starting flush of map output | |
13/01/01 01:19:36 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:36 INFO mapred.MapTask: Finished spill 0 | |
13/01/01 01:19:36 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting | |
13/01/01 01:19:39 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/data/en.stop:0+544 | |
13/01/01 01:19:39 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. | |
13/01/01 01:19:39 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:39 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:39 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/data/rain.txt | |
13/01/01 01:19:39 INFO mapred.MapTask: numReduceTasks: 1 | |
13/01/01 01:19:39 INFO mapred.MapTask: io.sort.mb = 100 | |
13/01/01 01:19:39 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
13/01/01 01:19:39 INFO mapred.MapTask: record buffer = 262144/327680 | |
13/01/01 01:19:39 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:39 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:39 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"] | |
13/01/01 01:19:39 INFO hadoop.FlowMapper: sinking to: CoGroup(abf66f9b-db1c-4098-9a60-cd581ab4fea5*7e70de4e-2493-4c49-ac62-555817afc959)[by:abf66f9b-db1c-4098-9a60-cd581ab4fea5:[{1}:'?word']7e70de4e-2493-4c49-ac62-555817afc959:[{1}:'?word']] | |
13/01/01 01:19:39 INFO hadoop.FlowMapper: trapping to: Hfs["TextLine[['line']->[ALL]]"]["output/trap"]"] | |
13/01/01 01:19:39 INFO util.Hadoop18TapUtil: setting up task: 'attempt_local_0001_m_000001_0' - file:/Users/ceteri/opt/Impatient/part6/output/trap/_temporary/_attempt_local_0001_m_000001_0 | |
13/01/01 01:19:39 WARN stream.TrapHandler: exception trap on branch: '7e4e6052-627c-4283-8bca-694a6744a232', for fields: [{1}:'?doc-id'] tuple: ['zoink'] | |
cascading.pipe.OperatorException: [7e4e6052-627c-4283-8bc...][sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)] operator Each failed executing operation | |
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:68) | |
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33) | |
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67) | |
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93) | |
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86) | |
at cascading.operation.Identity.operate(Identity.java:110) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38) | |
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60) | |
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33) | |
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67) | |
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93) | |
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86) | |
at cascalog.ClojureMap.operate(Unknown Source) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38) | |
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67) | |
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93) | |
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86) | |
at cascading.operation.Identity.operate(Identity.java:110) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38) | |
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60) | |
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33) | |
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67) | |
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93) | |
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86) | |
at cascalog.ClojureMapcat.operate(Unknown Source) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38) | |
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:60) | |
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:33) | |
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:67) | |
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:93) | |
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:86) | |
at cascading.operation.Identity.operate(Identity.java:110) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:86) | |
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:38) | |
at cascading.flow.stream.SourceStage.map(SourceStage.java:102) | |
at cascading.flow.stream.SourceStage.run(SourceStage.java:58) | |
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:124) | |
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) | |
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) | |
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) | |
Caused by: java.lang.AssertionError: Assert failed: unexpected doc-id | |
(pred x) | |
at impatient.core$assert_tuple.invoke(core.clj:19) | |
at clojure.lang.AFn.applyToHelper(AFn.java:167) | |
at clojure.lang.AFn.applyTo(AFn.java:151) | |
at clojure.core$apply.invoke(core.clj:605) | |
at clojure.core$partial$fn__446.doInvoke(core.clj:2345) | |
at clojure.lang.RestFn.invoke(RestFn.java:408) | |
at clojure.lang.Var.invoke(Var.java:415) | |
at clojure.lang.AFn.applyToHelper(AFn.java:161) | |
at clojure.lang.Var.applyTo(Var.java:532) | |
at cascalog.ClojureCascadingBase.applyFunction(Unknown Source) | |
at cascalog.ClojureFilter.isRemove(Unknown Source) | |
at cascading.flow.stream.FilterEachStage.receive(FilterEachStage.java:57) | |
... 43 more | |
13/01/01 01:19:39 INFO io.TapOutputCollector: closing tap collector for: output/trap/part-m-00001-00001 | |
13/01/01 01:19:39 INFO util.Hadoop18TapUtil: committing task: 'attempt_local_0001_m_000001_0' - file:/Users/ceteri/opt/Impatient/part6/output/trap/_temporary/_attempt_local_0001_m_000001_0 | |
13/01/01 01:19:39 INFO util.Hadoop18TapUtil: saved output of task 'attempt_local_0001_m_000001_0' to file:/Users/ceteri/opt/Impatient/part6/output/trap | |
13/01/01 01:19:39 INFO mapred.MapTask: Starting flush of map output | |
13/01/01 01:19:39 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:39 INFO mapred.MapTask: Finished spill 0 | |
13/01/01 01:19:39 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting | |
13/01/01 01:19:42 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/data/rain.txt:0+521 | |
13/01/01 01:19:42 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done. | |
13/01/01 01:19:42 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:42 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:42 INFO mapred.Merger: Merging 2 sorted segments | |
13/01/01 01:19:42 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 4175 bytes | |
13/01/01 01:19:42 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:42 INFO hadoop.FlowReducer: sourcing from: CoGroup(abf66f9b-db1c-4098-9a60-cd581ab4fea5*7e70de4e-2493-4c49-ac62-555817afc959)[by:abf66f9b-db1c-4098-9a60-cd581ab4fea5:[{1}:'?word']7e70de4e-2493-4c49-ac62-555817afc959:[{1}:'?word']] | |
13/01/01 01:19:42 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?word']]"]["tmp/checkpoint/data/etl-stage"]"] | |
13/01/01 01:19:42 INFO hadoop.FlowReducer: trapping to: Hfs["TextLine[['line']->[ALL]]"]["output/trap"]"] | |
13/01/01 01:19:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:42 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:44 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec | |
13/01/01 01:19:44 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec | |
13/01/01 01:19:44 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:44 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:44 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting | |
13/01/01 01:19:44 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:44 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now | |
13/01/01 01:19:44 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage | |
13/01/01 01:19:45 INFO mapred.LocalJobRunner: reduce > reduce | |
13/01/01 01:19:45 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done. | |
13/01/01 01:19:45 INFO util.Hadoop18TapUtil: deleting temp path tmp/checkpoint/data/etl-stage/_temporary | |
13/01/01 01:19:45 INFO util.Hadoop18TapUtil: deleting temp path output/trap/_temporary | |
13/01/01 01:19:45 INFO util.Hadoop18TapUtil: deleting temp path output/trap/_temporary | |
13/01/01 01:19:45 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster | |
13/01/01 01:19:45 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part6/target/impatient.jar | |
13/01/01 01:19:45 INFO flow.Flow: [] starting | |
13/01/01 01:19:45 INFO flow.Flow: [] source: Hfs["TextDelimited[[UNKNOWN]->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"] | |
13/01/01 01:19:45 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"] | |
13/01/01 01:19:45 INFO flow.Flow: [] parallel execution is enabled: false | |
13/01/01 01:19:45 INFO flow.Flow: [] starting jobs: 1 | |
13/01/01 01:19:45 INFO flow.Flow: [] allocating threads: 1 | |
13/01/01 01:19:45 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster | |
13/01/01 01:19:45 INFO flow.FlowStep: [] starting step: (1/1) output/wc | |
13/01/01 01:19:45 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part6/target/impatient.jar | |
13/01/01 01:19:45 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:45 INFO flow.Flow: [] starting | |
13/01/01 01:19:45 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc02', 'air']->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"] | |
13/01/01 01:19:45 INFO flow.Flow: [] sink: Hfs["SequenceFile[[UNKNOWN]->['?n-docs']]"]["/tmp/cascalog_reserved/8554be9c-a22d-4c54-b8b1-fc158dfbf932/5db49fa3-279c-4985-a0c5-6a93024f4b4f"]"] | |
13/01/01 01:19:45 INFO flow.Flow: [] parallel execution is enabled: false | |
13/01/01 01:19:45 INFO flow.Flow: [] starting jobs: 1 | |
13/01/01 01:19:45 INFO flow.Flow: [] allocating threads: 1 | |
13/01/01 01:19:45 INFO flow.FlowStep: [] starting step: (1/1) ...9c-4985-a0c5-6a93024f4b4f | |
13/01/01 01:19:45 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:45 INFO flow.FlowStep: [] submitted hadoop job: job_local_0002 | |
13/01/01 01:19:45 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:45 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000 | |
13/01/01 01:19:45 INFO mapred.MapTask: numReduceTasks: 1 | |
13/01/01 01:19:45 INFO mapred.MapTask: io.sort.mb = 100 | |
13/01/01 01:19:45 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
13/01/01 01:19:45 INFO mapred.MapTask: record buffer = 262144/327680 | |
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:45 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:45 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[[UNKNOWN]->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"] | |
13/01/01 01:19:45 INFO hadoop.FlowMapper: sinking to: GroupBy(81c22799-47bd-4214-bfa9-869ab649c4cf)[by:[{1}:'?word']] | |
13/01/01 01:19:45 INFO mapred.MapTask: Starting flush of map output | |
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:45 INFO mapred.MapTask: Finished spill 0 | |
13/01/01 01:19:45 INFO flow.FlowStep: [] submitted hadoop job: job_local_0003 | |
13/01/01 01:19:45 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting | |
13/01/01 01:19:45 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:45 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000 | |
13/01/01 01:19:45 INFO mapred.MapTask: numReduceTasks: 1 | |
13/01/01 01:19:45 INFO mapred.MapTask: io.sort.mb = 100 | |
13/01/01 01:19:45 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
13/01/01 01:19:45 INFO mapred.MapTask: record buffer = 262144/327680 | |
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:45 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc02', 'air']->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"] | |
13/01/01 01:19:45 INFO hadoop.FlowMapper: sinking to: GroupBy(b218075c-5611-4f7d-9665-be5325bedeb7)[by:[{1}:'!__gen19']] | |
13/01/01 01:19:45 INFO mapred.MapTask: Starting flush of map output | |
13/01/01 01:19:45 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:45 INFO mapred.MapTask: Finished spill 0 | |
13/01/01 01:19:45 INFO mapred.Task: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting | |
13/01/01 01:19:48 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000:0+605 | |
13/01/01 01:19:48 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done. | |
13/01/01 01:19:48 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:48 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO mapred.Merger: Merging 1 sorted segments | |
13/01/01 01:19:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 561 bytes | |
13/01/01 01:19:48 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO hadoop.FlowReducer: sourcing from: GroupBy(81c22799-47bd-4214-bfa9-869ab649c4cf)[by:[{1}:'?word']] | |
13/01/01 01:19:48 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/wc"]"] | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting | |
13/01/01 01:19:48 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:48 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now | |
13/01/01 01:19:48 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/Users/ceteri/opt/Impatient/part6/output/wc | |
13/01/01 01:19:48 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000:0+605 | |
13/01/01 01:19:48 INFO mapred.Task: Task 'attempt_local_0003_m_000000_0' done. | |
13/01/01 01:19:48 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:48 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO mapred.Merger: Merging 1 sorted segments | |
13/01/01 01:19:48 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1318 bytes | |
13/01/01 01:19:48 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO hadoop.FlowReducer: sourcing from: GroupBy(b218075c-5611-4f7d-9665-be5325bedeb7)[by:[{1}:'!__gen19']] | |
13/01/01 01:19:48 INFO hadoop.FlowReducer: sinking to: Hfs["SequenceFile[[UNKNOWN]->['?n-docs']]"]["/tmp/cascalog_reserved/8554be9c-a22d-4c54-b8b1-fc158dfbf932/5db49fa3-279c-4985-a0c5-6a93024f4b4f"]"] | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:48 INFO mapred.Task: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting | |
13/01/01 01:19:48 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:48 INFO mapred.Task: Task attempt_local_0003_r_000000_0 is allowed to commit now | |
13/01/01 01:19:48 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0003_r_000000_0' to file:/tmp/cascalog_reserved/8554be9c-a22d-4c54-b8b1-fc158dfbf932/5db49fa3-279c-4985-a0c5-6a93024f4b4f | |
13/01/01 01:19:51 INFO mapred.LocalJobRunner: reduce > reduce | |
13/01/01 01:19:51 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done. | |
13/01/01 01:19:51 INFO util.Hadoop18TapUtil: deleting temp path output/wc/_temporary | |
13/01/01 01:19:51 INFO mapred.LocalJobRunner: reduce > reduce | |
13/01/01 01:19:51 INFO mapred.Task: Task 'attempt_local_0003_r_000000_0' done. | |
13/01/01 01:19:51 INFO util.Hadoop18TapUtil: deleting temp path /tmp/cascalog_reserved/8554be9c-a22d-4c54-b8b1-fc158dfbf932/5db49fa3-279c-4985-a0c5-6a93024f4b4f/_temporary | |
13/01/01 01:19:51 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:51 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:51 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster | |
13/01/01 01:19:51 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/opt/Impatient/part6/target/impatient.jar | |
13/01/01 01:19:52 INFO flow.Flow: [] starting | |
13/01/01 01:19:52 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc02', 'air']->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"] | |
13/01/01 01:19:52 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?tf-idf', '?tf-word']]"]["output/tfidf"]"] | |
13/01/01 01:19:52 INFO flow.Flow: [] parallel execution is enabled: false | |
13/01/01 01:19:52 INFO flow.Flow: [] starting jobs: 4 | |
13/01/01 01:19:52 INFO flow.Flow: [] allocating threads: 1 | |
13/01/01 01:19:52 INFO flow.FlowStep: [] starting step: (1/4) | |
13/01/01 01:19:52 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:52 INFO flow.FlowStep: [] submitted hadoop job: job_local_0004 | |
13/01/01 01:19:52 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:52 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:52 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000 | |
13/01/01 01:19:52 INFO mapred.MapTask: numReduceTasks: 0 | |
13/01/01 01:19:52 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:52 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc02', 'air']->[ALL]]"]["tmp/checkpoint/data/etl-stage"]"] | |
13/01/01 01:19:52 INFO hadoop.FlowMapper: sinking to: TempHfs["SequenceFile[['?doc-id', '?word']]"][cd7e380f-49d0-4b67-a91f-5/72115/] | |
13/01/01 01:19:52 INFO mapred.Task: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting | |
13/01/01 01:19:52 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:52 INFO mapred.Task: Task attempt_local_0004_m_000000_0 is allowed to commit now | |
13/01/01 01:19:52 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0004_m_000000_0' to file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537 | |
13/01/01 01:19:55 INFO mapred.LocalJobRunner: file:/Users/ceteri/opt/Impatient/part6/tmp/checkpoint/data/etl-stage/part-00000:0+605 | |
13/01/01 01:19:55 INFO mapred.Task: Task 'attempt_local_0004_m_000000_0' done. | |
13/01/01 01:19:55 INFO flow.FlowStep: [] starting step: (2/4) | |
13/01/01 01:19:55 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:19:55 INFO flow.FlowStep: [] submitted hadoop job: job_local_0005 | |
13/01/01 01:19:55 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:55 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537/part-00000 | |
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:55 INFO mapred.MapTask: numReduceTasks: 1 | |
13/01/01 01:19:55 INFO mapred.MapTask: io.sort.mb = 100 | |
13/01/01 01:19:55 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
13/01/01 01:19:55 INFO mapred.MapTask: record buffer = 262144/327680 | |
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:55 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?doc-id', '?word']]"][cd7e380f-49d0-4b67-a91f-5/72115/] | |
13/01/01 01:19:55 INFO hadoop.FlowMapper: sinking to: GroupBy(62b0e591-1804-462d-b3d8-5acfacdf6f9b)[by:[{2}:'?tf-word', '?doc-id']] | |
13/01/01 01:19:55 INFO mapred.MapTask: Starting flush of map output | |
13/01/01 01:19:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:55 INFO mapred.MapTask: Finished spill 0 | |
13/01/01 01:19:55 INFO mapred.Task: Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting | |
13/01/01 01:19:58 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537/part-00000:0+1511 | |
13/01/01 01:19:58 INFO mapred.Task: Task 'attempt_local_0005_m_000000_0' done. | |
13/01/01 01:19:58 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:19:58 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:58 INFO mapred.Merger: Merging 1 sorted segments | |
13/01/01 01:19:58 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1295 bytes | |
13/01/01 01:19:58 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:58 INFO hadoop.FlowReducer: sourcing from: GroupBy(62b0e591-1804-462d-b3d8-5acfacdf6f9b)[by:[{2}:'?tf-word', '?doc-id']] | |
13/01/01 01:19:58 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?tf-word', '?doc-id', '?tf-count']]"][aaef0093-4403-415b-8367-e/21694/] | |
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:58 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:19:58 INFO mapred.Task: Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting | |
13/01/01 01:19:58 INFO mapred.LocalJobRunner: | |
13/01/01 01:19:58 INFO mapred.Task: Task attempt_local_0005_r_000000_0 is allowed to commit now | |
13/01/01 01:19:58 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0005_r_000000_0' to file:/tmp/hadoop-ceteri/aaef0093_4403_415b_8367_e_21694_AB930F031BA44B520A0D4C55B8829388 | |
13/01/01 01:20:01 INFO mapred.LocalJobRunner: reduce > reduce | |
13/01/01 01:20:01 INFO mapred.Task: Task 'attempt_local_0005_r_000000_0' done. | |
13/01/01 01:20:01 INFO flow.FlowStep: [] starting step: (3/4) | |
13/01/01 01:20:01 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:20:01 INFO flow.FlowStep: [] submitted hadoop job: job_local_0006 | |
13/01/01 01:20:01 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:01 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537/part-00000 | |
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:01 INFO mapred.MapTask: numReduceTasks: 1 | |
13/01/01 01:20:01 INFO mapred.MapTask: io.sort.mb = 100 | |
13/01/01 01:20:01 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
13/01/01 01:20:01 INFO mapred.MapTask: record buffer = 262144/327680 | |
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:01 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?doc-id', '?word']]"][cd7e380f-49d0-4b67-a91f-5/72115/] | |
13/01/01 01:20:01 INFO hadoop.FlowMapper: sinking to: GroupBy(c001f807-a69f-47f0-8d8a-b6a9c417e18d)[by:[{1}:'?df-word']] | |
13/01/01 01:20:01 INFO mapred.MapTask: Starting flush of map output | |
13/01/01 01:20:01 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:01 INFO mapred.MapTask: Finished spill 0 | |
13/01/01 01:20:01 INFO mapred.Task: Task:attempt_local_0006_m_000000_0 is done. And is in the process of commiting | |
13/01/01 01:20:04 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/cd7e380f_49d0_4b67_a91f_5_72115_8F3024B95FDD8EDA23306899FD292537/part-00000:0+1511 | |
13/01/01 01:20:04 INFO mapred.Task: Task 'attempt_local_0006_m_000000_0' done. | |
13/01/01 01:20:04 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:20:04 INFO mapred.LocalJobRunner: | |
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:04 INFO mapred.Merger: Merging 1 sorted segments | |
13/01/01 01:20:04 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 2226 bytes | |
13/01/01 01:20:04 INFO mapred.LocalJobRunner: | |
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:04 INFO hadoop.FlowReducer: sourcing from: GroupBy(c001f807-a69f-47f0-8d8a-b6a9c417e18d)[by:[{1}:'?df-word']] | |
13/01/01 01:20:04 INFO hadoop.FlowReducer: sinking to: TempHfs["SequenceFile[['?df-count', '?tf-word']]"][d7841a4f-5383-49c8-8020-d/17408/] | |
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:04 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:04 INFO mapred.Task: Task:attempt_local_0006_r_000000_0 is done. And is in the process of commiting | |
13/01/01 01:20:04 INFO mapred.LocalJobRunner: | |
13/01/01 01:20:04 INFO mapred.Task: Task attempt_local_0006_r_000000_0 is allowed to commit now | |
13/01/01 01:20:04 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0006_r_000000_0' to file:/tmp/hadoop-ceteri/d7841a4f_5383_49c8_8020_d_17408_0E81F006944699B723B8F7297ED7752F | |
13/01/01 01:20:07 INFO mapred.LocalJobRunner: reduce > reduce | |
13/01/01 01:20:07 INFO mapred.Task: Task 'attempt_local_0006_r_000000_0' done. | |
13/01/01 01:20:07 INFO flow.FlowStep: [] starting step: (4/4) output/tfidf | |
13/01/01 01:20:07 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:20:07 INFO mapred.FileInputFormat: Total input paths to process : 1 | |
13/01/01 01:20:07 INFO flow.FlowStep: [] submitted hadoop job: job_local_0007 | |
13/01/01 01:20:07 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:07 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/aaef0093_4403_415b_8367_e_21694_AB930F031BA44B520A0D4C55B8829388/part-00000 | |
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:07 INFO mapred.MapTask: numReduceTasks: 1 | |
13/01/01 01:20:07 INFO mapred.MapTask: io.sort.mb = 100 | |
13/01/01 01:20:07 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
13/01/01 01:20:07 INFO mapred.MapTask: record buffer = 262144/327680 | |
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:07 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?tf-word', '?doc-id', '?tf-count']]"][aaef0093-4403-415b-8367-e/21694/] | |
13/01/01 01:20:07 INFO hadoop.FlowMapper: sinking to: CoGroup(aaef0093-4403-415b-8367-ef9f9858c4a1*d7841a4f-5383-49c8-8020-d12ad48809be)[by:aaef0093-4403-415b-8367-ef9f9858c4a1:[{1}:'?tf-word']d7841a4f-5383-49c8-8020-d12ad48809be:[{1}:'?tf-word']] | |
13/01/01 01:20:07 INFO mapred.MapTask: Starting flush of map output | |
13/01/01 01:20:07 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:07 INFO mapred.MapTask: Finished spill 0 | |
13/01/01 01:20:07 INFO mapred.Task: Task:attempt_local_0007_m_000000_0 is done. And is in the process of commiting | |
13/01/01 01:20:10 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/aaef0093_4403_415b_8367_e_21694_AB930F031BA44B520A0D4C55B8829388/part-00000:0+1543 | |
13/01/01 01:20:10 INFO mapred.Task: Task 'attempt_local_0007_m_000000_0' done. | |
13/01/01 01:20:10 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:10 INFO io.MultiInputSplit: current split input path: file:/tmp/hadoop-ceteri/d7841a4f_5383_49c8_8020_d_17408_0E81F006944699B723B8F7297ED7752F/part-00000 | |
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:10 INFO mapred.MapTask: numReduceTasks: 1 | |
13/01/01 01:20:10 INFO mapred.MapTask: io.sort.mb = 100 | |
13/01/01 01:20:10 INFO mapred.MapTask: data buffer = 79691776/99614720 | |
13/01/01 01:20:10 INFO mapred.MapTask: record buffer = 262144/327680 | |
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:10 INFO hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['?df-count', '?tf-word']]"][d7841a4f-5383-49c8-8020-d/17408/] | |
13/01/01 01:20:10 INFO hadoop.FlowMapper: sinking to: CoGroup(aaef0093-4403-415b-8367-ef9f9858c4a1*d7841a4f-5383-49c8-8020-d12ad48809be)[by:aaef0093-4403-415b-8367-ef9f9858c4a1:[{1}:'?tf-word']d7841a4f-5383-49c8-8020-d12ad48809be:[{1}:'?tf-word']] | |
13/01/01 01:20:10 INFO mapred.MapTask: Starting flush of map output | |
13/01/01 01:20:10 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:10 INFO mapred.MapTask: Finished spill 0 | |
13/01/01 01:20:10 INFO mapred.Task: Task:attempt_local_0007_m_000001_0 is done. And is in the process of commiting | |
13/01/01 01:20:13 INFO mapred.LocalJobRunner: file:/tmp/hadoop-ceteri/d7841a4f_5383_49c8_8020_d_17408_0E81F006944699B723B8F7297ED7752F/part-00000:0+764 | |
13/01/01 01:20:13 INFO mapred.Task: Task 'attempt_local_0007_m_000001_0' done. | |
13/01/01 01:20:13 INFO mapred.Task: Using ResourceCalculatorPlugin : null | |
13/01/01 01:20:13 INFO mapred.LocalJobRunner: | |
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:13 INFO mapred.Merger: Merging 2 sorted segments | |
13/01/01 01:20:13 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 1946 bytes | |
13/01/01 01:20:13 INFO mapred.LocalJobRunner: | |
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:13 INFO hadoop.FlowReducer: sourcing from: CoGroup(aaef0093-4403-415b-8367-ef9f9858c4a1*d7841a4f-5383-49c8-8020-d12ad48809be)[by:aaef0093-4403-415b-8367-ef9f9858c4a1:[{1}:'?tf-word']d7841a4f-5383-49c8-8020-d12ad48809be:[{1}:'?tf-word']] | |
13/01/01 01:20:13 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?doc-id', '?tf-idf', '?tf-word']]"]["output/tfidf"]"] | |
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:13 INFO collect.SpillableTupleList: attempting to load codec: org.apache.hadoop.io.compress.GzipCodec | |
13/01/01 01:20:13 INFO collect.SpillableTupleList: found codec: org.apache.hadoop.io.compress.GzipCodec | |
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:13 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator | |
13/01/01 01:20:13 INFO mapred.Task: Task:attempt_local_0007_r_000000_0 is done. And is in the process of commiting | |
13/01/01 01:20:13 INFO mapred.LocalJobRunner: | |
13/01/01 01:20:13 INFO mapred.Task: Task attempt_local_0007_r_000000_0 is allowed to commit now | |
13/01/01 01:20:13 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0007_r_000000_0' to file:/Users/ceteri/opt/Impatient/part6/output/tfidf | |
13/01/01 01:20:16 INFO mapred.LocalJobRunner: reduce > reduce | |
13/01/01 01:20:16 INFO mapred.Task: Task 'attempt_local_0007_r_000000_0' done. | |
13/01/01 01:20:16 INFO util.Hadoop18TapUtil: deleting temp path output/tfidf/_temporary | |
13/01/01 01:20:16 INFO checkpointed-workflow: Workflow completed successfully | |
bash-3.2$ more output/trap/part-m-00001-00001 | |
zoink | |
bash-3.2$ more output/tfidf/part-00000 | |
doc02 0.22314355131420976 area | |
doc01 0.44628710262841953 area | |
doc03 0.22314355131420976 area | |
doc05 0.9162907318741551 australia | |
doc05 0.9162907318741551 broken | |
doc04 0.9162907318741551 california's | |
doc04 0.9162907318741551 cause | |
doc02 0.9162907318741551 cloudcover | |
doc04 0.9162907318741551 death | |
doc04 0.9162907318741551 deserts | |
doc03 0.9162907318741551 downwind | |
doc01 0.22314355131420976 dry | |
doc02 0.22314355131420976 dry | |
doc03 0.22314355131420976 dry | |
doc05 0.9162907318741551 dvd | |
doc04 0.9162907318741551 effect | |
doc04 0.9162907318741551 known | |
doc03 0.5108256237659907 land | |
doc05 0.5108256237659907 land | |
doc01 0.5108256237659907 lee | |
doc02 0.5108256237659907 lee | |
doc04 0.5108256237659907 leeward | |
doc03 0.5108256237659907 leeward | |
doc02 0.9162907318741551 less | |
doc03 0.9162907318741551 lies | |
doc02 0.22314355131420976 mountain | |
doc03 0.22314355131420976 mountain | |
doc04 0.22314355131420976 mountain | |
doc01 0.9162907318741551 mountainous | |
doc04 0.9162907318741551 primary | |
doc02 0.9162907318741551 produces | |
doc04 0.0 rain | |
doc01 0.0 rain | |
doc02 0.0 rain | |
doc03 0.0 rain | |
doc04 0.9162907318741551 ranges | |
doc05 0.9162907318741551 secrets | |
doc01 0.0 shadow | |
doc02 0.0 shadow | |
doc03 0.0 shadow | |
doc04 0.0 shadow | |
doc02 0.9162907318741551 sinking | |
doc04 0.9162907318741551 such | |
doc04 0.9162907318741551 valley | |
doc05 0.9162907318741551 women | |
bash-3.2$ lein test | |
Retrieving org/clojure/clojure/maven-metadata.xml (2k) | |
from http://repo1.maven.org/maven2/ | |
Retrieving org/clojure/clojure/maven-metadata.xml (1k) | |
from https://clojars.org/repo/ | |
Retrieving org/clojure/clojure/maven-metadata.xml (2k) | |
from http://repo1.maven.org/maven2/ | |
Retrieving org/clojure/clojure/maven-metadata.xml | |
from http://oss.sonatype.org/content/repositories/snapshots/ | |
Retrieving org/clojure/clojure/maven-metadata.xml | |
from http://oss.sonatype.org/content/repositories/releases/ | |
lein test impatient.core-test | |
Ran 2 tests containing 2 assertions. | |
0 failures, 0 errors. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment