Skip to content

Instantly share code, notes, and snippets.

@ayushoriginal
Created June 24, 2019 05:53
Show Gist options
  • Save ayushoriginal/6e96470e4ba06c24f91e56ae5097e723 to your computer and use it in GitHub Desktop.
Save ayushoriginal/6e96470e4ba06c24f91e56ae5097e723 to your computer and use it in GitHub Desktop.
remove noise and stopwords
def remove_stopwords(self):
from nltk.corpus import stopwords
import re
stop = set(stopwords.words("english"))
noise = ['user']
for i,tweet in tqdm(enumerate(self.data),'Stopwords Removal'):
self.data[i] = [w for w in tweet if w not in stop and not re.match(r"[^a-zA-Z\d\s]+", w) and w not in noise]
return self.data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment