Created
April 7, 2020 12:18
-
-
Save pragatibaheti/217c62814364b346567e42bcd0db22b2 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#import Label Encoder from scikit learn | |
from sklearn.preprocessing import LabelEncoder | |
encoder = LabelEncoder() | |
Y = encoder.fit_transform(classes) #1,0 | |
#removing not-wanted characters using regular expressions - Remove punctuation,whitespaces, | |
processed = processed.str.replace(r'[^\w\d\s]', ' ') | |
processed = processed.str.replace(r'\s+', ' ') | |
processed = processed.str.replace(r'^\s+|\s+?$', '') | |
#stemming of words | |
PS = nltk.PorterStemmer() | |
processed = processed.apply(lambda x: ' '.join(PS.stem(term) for term in x.split())) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment