Skip to content

Instantly share code, notes, and snippets.

@miguelff
Created May 8, 2012 11:18
Show Gist options
  • Save miguelff/2634316 to your computer and use it in GitHub Desktop.
Save miguelff/2634316 to your computer and use it in GitHub Desktop.
An script to grab some quick dirty stats of the aol query log
#!/usr/bin/env ruby
files = %w{aol015 aol026}.map{|f| f+".sessionized"}
sessions = {}
#load sessions hash
files.each do |f|
lines = File.open(f).each do |line|
if line.length > 10
sid, uid, q, *rest = line.split("\t")
if sessions.has_key? sid
sessions[sid] << q
else
sessions[sid] = [q]
end
end
end
end
#remove query duplicates
session_queries = sessions.values.map(&:uniq)
#number of sessions
number_of_sessions = session_queries.size.to_f
#average query length
avg_length = session_queries.reduce(0) {|sum, value| sum + value.size} / number_of_sessions
#frequencies
freqs = []
session_queries.map(&:size).reduce(freqs) do |freqs, size|
freqs[size] = freqs[size].nil? ? 1 : freqs[size] + 1
freqs
end
freqs.map!{|freq| freq ? freq : 0}
#relative frequencies
freqs_rel = freqs.map { |x| x / number_of_sessions}
one = freqs_rel.reduce(0) {|sum, value| sum + value}
output=<<END
TOTAL sessions:
#{number_of_sessions}
AVG length:
#{avg_length}
Freqs:
#{freqs.inspect}
Rel freqs:
#{freqs_rel.inspect}
one: #{one}
END
puts output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment