Skip to content

Instantly share code, notes, and snippets.

@makoto
Created May 6, 2010 20:10
Show Gist options
  • Save makoto/392643 to your computer and use it in GitHub Desktop.
Save makoto/392643 to your computer and use it in GitHub Desktop.

the cpu spike result

The first spike (almost look like a bar) is when I ran meguro and the second spike (almost look like a hill) is when I ran our.rb

Image and video hosting by TinyPic

# ruby clone of megruo map reduce sample file
# http://www.sevenforge.com/meguro
#
#!/usr/bin/ruby
require 'rubygems'
require 'json'
require 'pp'
timelines = File.open("stream.log").readlines
# mapper = {"#hello" => 1, "#hello" => 1, "#world" => 1}
mapper = []
# start_time = Time.now
timelines.each{|value|
json = JSON.parse(value)
text = json["text"]
next unless text
words = text.split(/[\s\.:?!]+/)
words.each{|word|
mapper << { word => 1 } if word[0] && word[0].chr == '#'
}
}
# end_time = Time.now
mapper.reduce({}){|total, current|
key = current.keys.first
value = current.values.first
total[key] = total[key] ? total[key] + value : value
total
}
output = File.open("out.rb.out", "w")
mapper.each{|k, v|
output.write "#{k} \t\t #{v} \n"
}
# p "#{start_time} - #{end_time} (#{end_time - start_time} sec)"
# Result
#
# [tmp]$ time meguro -j our.js -o stream.log
# ---------------Meguro------------------
# Javascript: our.js
# Mapper output: map.out
# Reducer output: reduce.out
# Number of threads: 2
# Javascript runtime memory size: 95.37M
# Mapper buckets: 1.00M
# Mapper memory size: 64.00M
# Mapping stream.log: 100%
# Mapper Complete: 5.88K Emits
# Estimated Map File Size: 58.87K
# Reducing: 100%
#
# real 0m2.791s
# user 0m3.580s
# sys 0m0.536s
#
# [tmp]$ time ruby our.rb
#
# real 0m5.755s
# user 0m5.361s
# sys 0m0.298s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment