Skip to content

Instantly share code, notes, and snippets.

@FranciscusRenatus
Last active July 20, 2017 17:51
Show Gist options
  • Save FranciscusRenatus/be5e90dcee507aa5f23bca1ee9621628 to your computer and use it in GitHub Desktop.
Save FranciscusRenatus/be5e90dcee507aa5f23bca1ee9621628 to your computer and use it in GitHub Desktop.
# Cleaning the texts
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus = []
for i in range(0, 1000):
review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])
review = review.lower()
review = review.split()
ps = PorterStemmer()
review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
review = ' '.join(review)
corpus.append(review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment