Skip to content

Instantly share code, notes, and snippets.

@sferik
Created September 27, 2009 06:26
Show Gist options
  • Save sferik/194627 to your computer and use it in GitHub Desktop.
Save sferik/194627 to your computer and use it in GitHub Desktop.
include Stemmable
class String
def to_tags
# lower case
# replace new lines, numbers, and puncuation with spaces
# break words on spaces
# get the word stem
# remove duplicates
# removed stems less than 3 letters
# remove common words (after they've been stemmed)
common_words = %w(and are but for from had have her his like not our she some than that the their them then there these they this via was were with you your)
self.downcase.gsub(/[^a-z \n]/, ' ').split.map!{|s|s.stem}.uniq.map!{|s|s if (s.length > 2)}.compact - common_words.map!{|s|s.stem}
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment