Skip to content

Instantly share code, notes, and snippets.

@prafulgondane
Last active June 20, 2022 17:14
Show Gist options
  • Save prafulgondane/6dc1c0fd86056f5b3588fe7129c60e68 to your computer and use it in GitHub Desktop.
Save prafulgondane/6dc1c0fd86056f5b3588fe7129c60e68 to your computer and use it in GitHub Desktop.
def text_preprocessing(df):
#Remove punctuations
df['clean_text'] = df['comment_text'].apply(lambda x:remove_punctuation(x))
#make lower case
df['clean_text_lower']= df['clean_text'].apply(lambda x: x.lower())
#toenize the string
df['text_tokenied']= df['clean_text_lower'].apply(lambda x: tokenization(x))
#remove stop words
df['text_tokenized_no_stopwords']= df['text_tokenied'].apply(lambda x:remove_stopwords(x))
#lemmatize
df['text_lemmatized']=df['text_tokenized_no_stopwords'].apply(lambda x:lemmatizer(x))
return df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment