Instantly share code, notes, and snippets.

Embed
What would you like to do?
i
me
my
myself
we
our
ours
ourselves
you
your
yours
yourself
yourselves
he
him
his
himself
she
her
hers
herself
it
its
itself
they
them
their
theirs
themselves
what
which
who
whom
this
that
these
those
am
is
are
was
were
be
been
being
have
has
had
having
do
does
did
doing
a
an
the
and
but
if
or
because
as
until
while
of
at
by
for
with
about
against
between
into
through
during
before
after
above
below
to
from
up
down
in
out
on
off
over
under
again
further
then
once
here
there
when
where
why
how
all
any
both
each
few
more
most
other
some
such
no
nor
not
only
own
same
so
than
too
very
s
t
can
will
just
don
should
now
@codysoyland

This comment has been minimized.

codysoyland commented Aug 28, 2010

It's like a poem, really.

@binilg

This comment has been minimized.

binilg commented Mar 25, 2017

nice list

@kujjwal02

This comment has been minimized.

kujjwal02 commented May 4, 2018

This is outdated
Now they have a bigger list

@bruceredmon

This comment has been minimized.

bruceredmon commented May 10, 2018

"s" and "t" ?

@yauhen-info

This comment has been minimized.

yauhen-info commented May 21, 2018

@bruceredmon, I guess, depending on tokenizer and the input we can get
cat's -> ['cat', 's']
don't -> ['don/do', 't'],
so filtering afterwards might help.

@dorukcan

This comment has been minimized.

dorukcan commented May 21, 2018

Anyone wants the above list as array? Here it is:

["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself", "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom", "this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or", "because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between", "into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down", "in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other", "some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]

@aviolante

This comment has been minimized.

aviolante commented May 31, 2018

Does anyone have the updated list with additional stopwords?

@paragkhursange

This comment has been minimized.

paragkhursange commented Jun 1, 2018

how to find hindi stop words

@emigre459

This comment has been minimized.

emigre459 commented Jul 4, 2018

You can generate the most recent stopword list by doing the following:

from nltk.corpus import stopwords
sw = stopwords.words("english")

Note that you will need to also do

import nltk
nltk.download()

and download all of the corpora in order to use this.

This generates the most up-to-date list of 179 English words you can use. Additionally, if you run stopwords.fileids(), you'll find out what languages have available stopword lists. Sorry @paragkhursange, but hindi doesn't seem to be an option at this time.

@vibrantabhi19

This comment has been minimized.

vibrantabhi19 commented Jul 6, 2018

If you import NLTK stop words using
from nltk.corpus import stopwords

and try printing the words using
stopwords.words('english')

Then you would get the latest of all the stop words in the NLTK corpus. I tried that above and the following array is what I got. Hope this helps.

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment