Skip to content

Instantly share code, notes, and snippets.

@jubos
Forked from makoto/cpu_spike.md
Created May 7, 2010 16:13
Show Gist options
  • Save jubos/393644 to your computer and use it in GitHub Desktop.
Save jubos/393644 to your computer and use it in GitHub Desktop.
function map(key,value) {
var json = JSON.parse(value);
var text = json.text;
if (text) {
words = text.split(/[\s\.:?!]+/)
for(var i=0; i < words.length; i++) {
var word = words[i];
if (word.indexOf('#') == 0)
Meguro.emit(word,'1');
}
}
}
function reduce(key,values) {
Meguro.save(key,values.length);
}
require 'json'
counts = {}
File.open(ARGV.shift).each do |line|
json = JSON.parse(line)
if text = json['text']
text.split(/[\s\.:?!]+/).each do |word|
if word[0] == 35 # char code for hash tag
counts[word] ||= 0
counts[word] += 1
end
end
end
end
counts.each do |k,v|
puts "#{k}\t#{v}"
end

the cpu spike result

The first spike (almost look like a bar) is when I ran meguro and the second spike (almost look like a hill) is when I ran our.rb

Image and video hosting by TinyPic

#!/usr/bin/ruby
# Give this a file and an enlargement factor and pipe the resulting output to a bigger file
lines = File.open(ARGV.shift).readlines
factor = ARGV.shift.to_i
factor.times do
lines.each do |line|
puts line
end
end
# ruby clone of megruo map reduce sample file
# http://www.sevenforge.com/meguro
#
#!/usr/bin/ruby
require 'rubygems'
require 'json'
require 'pp'
timelines = File.open("stream.log").readlines
# mapper = {"#hello" => 1, "#hello" => 1, "#world" => 1}
mapper = []
# start_time = Time.now
timelines.each{|value|
json = JSON.parse(value)
text = json["text"]
next unless text
words = text.split(/[\s\.:?!]+/)
words.each{|word|
mapper << { word => 1 } if word[0] && word[0].chr == '#'
}
}
# end_time = Time.now
mapper.reduce({}){|total, current|
key = current.keys.first
value = current.values.first
total[key] = total[key] ? total[key] + value : value
total
}
output = File.open("out.rb.out", "w")
mapper.each{|k, v|
output.write "#{k} \t\t #{v} \n"
}
# p "#{start_time} - #{end_time} (#{end_time - start_time} sec)"
# Result
#
# [tmp]$ time meguro -j our.js -o stream.log
# ---------------Meguro------------------
# Javascript: our.js
# Mapper output: map.out
# Reducer output: reduce.out
# Number of threads: 2
# Javascript runtime memory size: 95.37M
# Mapper buckets: 1.00M
# Mapper memory size: 64.00M
# Mapping stream.log: 100%
# Mapper Complete: 5.88K Emits
# Estimated Map File Size: 58.87K
# Reducing: 100%
#
# real 0m2.791s
# user 0m3.580s
# sys 0m0.536s
#
# [tmp]$ time ruby our.rb
#
# real 0m5.755s
# user 0m5.361s
# sys 0m0.298s
$ time ruby count-hashtags.rb large-stream.log > output
f
real 4m4.605s
user 3m16.452s
sys 0m7.309s
$ time meguro -j count-hashtags.js large-stream.log
---------------Meguro------------------
Javascript: count-hashtags.js
Mapper output: map.out
Reducer output: reduce.out
Number of threads: 2
Javascript runtime memory size: 95.37M
Mapper buckets: 1.00M
Mapper memory size: 64.00M
Mapping large-stream.log: 100%
Mapper Complete: 176.55K Emits
Reducing: 100%
real 2m19.187s
user 3m9.071s
sys 0m17.464s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment