Skip to content

Instantly share code, notes, and snippets.

@Quantisan
Created August 24, 2012 20:55
Show Gist options
  • Save Quantisan/3455515 to your computer and use it in GitHub Desktop.
Save Quantisan/3455515 to your computer and use it in GitHub Desktop.
Impatient part 2
$ cat output/rain/part-00000
A 3
Australia 1
Broken 1
California's 1
DVD 1
Death 1
Land 1
Secrets 1
This 2
Two 1
Valley 1
Women 1
a 5
air 1
an 1
and 2
area 4
as 2
back 1
cause 1
cloudcover 1
deserts 1
downwind 1
dry 3
effect 1
in 1
is 4
known 1
land 1
lee 2
leeward 2
less 1
lies 1
mountain 3
mountainous 1
of 6
on 2
or 2
primary 1
produces 1
rain 5
ranges 1
shadow 4
side 2
sinking 1
such 1
that 1
the 5
with 1
$ hadoop jar ./target/impatient.jar data/rain.txt output/rain
12/08/24 21:53:54 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core
12/08/24 21:53:54 INFO planner.HadoopPlanner: using application jar: /Users/paullam/Dropbox/Projects/Impatient/part2/./target/impatient.jar
12/08/24 21:53:54 INFO property.AppProps: using app.id: 70F7BFF5DCA3B7EE5F4DB0CD8C35BA62
2012-08-24 21:53:54.604 java[14150:1903] Unable to load realm info from SCDynamicStore
12/08/24 21:53:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/08/24 21:53:54 WARN snappy.LoadSnappy: Snappy native library not loaded
12/08/24 21:53:54 INFO mapred.FileInputFormat: Total input paths to process : 1
12/08/24 21:53:54 INFO util.Version: Concurrent, Inc - Cascading 2.0.0
12/08/24 21:53:54 INFO flow.Flow: [] starting
12/08/24 21:53:54 INFO flow.Flow: [] source: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/08/24 21:53:54 INFO flow.Flow: [] sink: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/rain"]"]
12/08/24 21:53:54 INFO flow.Flow: [] parallel execution is enabled: false
12/08/24 21:53:54 INFO flow.Flow: [] starting jobs: 1
12/08/24 21:53:54 INFO flow.Flow: [] allocating threads: 1
12/08/24 21:53:54 INFO flow.FlowStep: [] starting step: (1/1) output/rain
12/08/24 21:53:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/08/24 21:53:55 INFO mapred.FileInputFormat: Total input paths to process : 1
12/08/24 21:53:55 INFO flow.FlowStep: [] submitted hadoop job: job_local_0001
12/08/24 21:53:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/08/24 21:53:55 INFO io.MultiInputSplit: current split input path: file:/Users/paullam/Dropbox/Projects/Impatient/part2/data/rain.txt
12/08/24 21:53:55 INFO mapred.MapTask: numReduceTasks: 1
12/08/24 21:53:55 INFO mapred.MapTask: io.sort.mb = 100
12/08/24 21:53:55 INFO mapred.MapTask: data buffer = 79691776/99614720
12/08/24 21:53:55 INFO mapred.MapTask: record buffer = 262144/327680
12/08/24 21:53:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/08/24 21:53:55 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/08/24 21:53:55 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['doc_id', 'text']->[ALL]]"]["data/rain.txt"]"]
12/08/24 21:53:55 INFO hadoop.FlowMapper: sinking to: GroupBy(59cefaaa-cf51-42fe-903b-09b06ed2fe3c)[by:[{1}:'?word']]
12/08/24 21:53:59 INFO mapred.MapTask: Starting flush of map output
12/08/24 21:53:59 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/08/24 21:53:59 INFO mapred.MapTask: Finished spill 0
12/08/24 21:53:59 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/08/24 21:53:59 INFO mapred.LocalJobRunner: file:/Users/paullam/Dropbox/Projects/Impatient/part2/data/rain.txt:0+510
12/08/24 21:53:59 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/08/24 21:53:59 INFO mapred.LocalJobRunner:
12/08/24 21:53:59 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/08/24 21:53:59 INFO mapred.Merger: Merging 1 sorted segments
12/08/24 21:53:59 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 826 bytes
12/08/24 21:53:59 INFO mapred.LocalJobRunner:
12/08/24 21:53:59 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/08/24 21:53:59 INFO hadoop.FlowReducer: sourcing from: GroupBy(59cefaaa-cf51-42fe-903b-09b06ed2fe3c)[by:[{1}:'?word']]
12/08/24 21:53:59 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['?word', '?count']]"]["output/rain"]"]
12/08/24 21:53:59 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/08/24 21:53:59 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/08/24 21:53:59 INFO hadoop.TupleSerialization: using default comparator: cascalog.hadoop.DefaultComparator
12/08/24 21:53:59 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/08/24 21:53:59 INFO mapred.LocalJobRunner:
12/08/24 21:53:59 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/08/24 21:53:59 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/Users/paullam/Dropbox/Projects/Impatient/part2/output/rain
12/08/24 21:53:59 INFO mapred.LocalJobRunner: reduce > reduce
12/08/24 21:53:59 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/08/24 21:53:59 INFO util.Hadoop18TapUtil: deleting temp path output/rain/_temporary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment