Skip to content

Instantly share code, notes, and snippets.

@sferik sferik/to_tags.rb
Created Sep 27, 2009

Embed
What would you like to do?
include Stemmable
class String
def to_tags
# lower case
# replace new lines, numbers, and puncuation with spaces
# break words on spaces
# get the word stem
# remove duplicates
# removed stems less than 3 letters
# remove common words (after they've been stemmed)
common_words = %w(and are but for from had have her his like not our she some than that the their them then there these they this via was were with you your)
self.downcase.gsub(/[^a-z \n]/, ' ').split.map!{|s|s.stem}.uniq.map!{|s|s if (s.length > 2)}.compact - common_words.map!{|s|s.stem}
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.