Skip to content

Instantly share code, notes, and snippets.

@gromnitsky
Created December 14, 2010 13:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gromnitsky/740415 to your computer and use it in GitHub Desktop.
Save gromnitsky/740415 to your computer and use it in GitHub Desktop.
Counts n top email domain names leaked from Gawker's database
#!/usr/bin/env ruby
# -*-ruby-*-
data = 'gawker.data' if !File.file?(data = ARGV[1].to_s)
ntop = 50 if (ntop = ARGV[0].to_i) <= 5
raw = 0
mail = Hash.new 0
File.open(data) {|fp|
while line = fp.gets
if line.match(/[^[:space:]]+@([^[:space:]]+\.[^[:space:]]+)/)
raw += 1
mail[$1.downcase] += 1
end
end
}
printf "Mail total: %10d\n", raw
printf "Mail with unique domain: %10d\n", mail.size
puts "\nTop #{ntop} domains:\n"
mail_sa = mail.sort_by {|k, v| v}
mail_sa.last(ntop).reverse.map {|k, v|
printf "%30s%10d\n", k, v
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment