Skip to content

Instantly share code, notes, and snippets.

@DRMacIver
Created January 26, 2009 23:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DRMacIver/53052 to your computer and use it in GitHub Desktop.
Save DRMacIver/53052 to your computer and use it in GitHub Desktop.
require "rubygems"
require "lingua/stemmer"
stemmer = Lingua::Stemmer.new
tag_counts = {}
STDIN.lines.each{|l| c, t = l.split; tag_counts[t.strip] = c.to_i }
duplicates = Hash.new{|h, k| h[k] = []}
tag_counts.keys.each{|k| duplicates[k.split("_").map{|x| stemmer.stem(x)}] << k }
duplicates.values.each{|vs| vs.sort!{|x, y| tag_counts[y] <=> tag_counts[x]} }
new_tag_counts = {}
duplicates.values.each{|vs| new_tag_counts[vs[0]] = vs.map{|v| tag_counts[v]}.inject(0, &:+)}
puts new_tag_counts.to_a.sort{|x, y| y[1] <=> x[1]}.map{|t, c| " #{c} #{t}" }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment