Skip to content
All gists
Back to GitHub
Sign in
Sign up
Sign in
Sign up
{{ message }}
Instantly share code, notes, and snippets.
sferik
/
to_tags.rb
Created
Sep 27, 2009
Star
0
Fork
0
Star
Code
Revisions
2
Embed
What would you like to do?
Embed
Embed this gist in your website.
Share
Copy sharable link for this gist.
Clone via HTTPS
Clone with Git or checkout with SVN using the repository’s web address.
Learn more about clone URLs
Download ZIP
Raw
to_tags.rb
include
Stemmable
class
String
def
to_tags
# lower case
# replace new lines, numbers, and puncuation with spaces
# break words on spaces
# get the word stem
# remove duplicates
# removed stems less than 3 letters
# remove common words (after they've been stemmed)
common_words
=
%w(
and
are
but
for
from
had
have
her
his
like
not
our
she
some
than
that
the
their
them
then
there
these
they
this
via
was
were
with
you
your
)
self
.
downcase
.
gsub
(
/[^a-z
\n
]/
,
' '
)
.
split
.
map!
{
|
s
|
s
.
stem
}
.
uniq
.
map!
{
|
s
|
s
if
(
s
.
length
>
2
)
}
.
compact
-
common_words
.
map!
{
|
s
|
s
.
stem
}
end
end
Sign up for free
to join this conversation on GitHub
. Already have an account?
Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.