Skip to content

Instantly share code, notes, and snippets.

@pragatibaheti
Created April 7, 2020 12:18
Show Gist options
  • Save pragatibaheti/217c62814364b346567e42bcd0db22b2 to your computer and use it in GitHub Desktop.
Save pragatibaheti/217c62814364b346567e42bcd0db22b2 to your computer and use it in GitHub Desktop.
#import Label Encoder from scikit learn
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
Y = encoder.fit_transform(classes) #1,0
#removing not-wanted characters using regular expressions - Remove punctuation,whitespaces,
processed = processed.str.replace(r'[^\w\d\s]', ' ')
processed = processed.str.replace(r'\s+', ' ')
processed = processed.str.replace(r'^\s+|\s+?$', '')
#stemming of words
PS = nltk.PorterStemmer()
processed = processed.apply(lambda x: ' '.join(PS.stem(term) for term in x.split()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment