Skip to content

Instantly share code, notes, and snippets.

@abhishek-shrm
Last active September 23, 2020 07:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abhishek-shrm/1fd8c81efa0096fa358aee9d25a7d09f to your computer and use it in GitHub Desktop.
Save abhishek-shrm/1fd8c81efa0096fa358aee9d25a7d09f to your computer and use it in GitHub Desktop.
# For working with regular expressions
import re
# Function for cleaning text
def cleaner(text):
# Lowercasing text
text=text.lower()
# Keeping only words
text=re.sub("[^a-z]+"," ",text)
# Removing extra spaces
text=re.sub("[ ]+"," ",text)
return text
# Clean comments in Train Set
df_train['cleaned']=df_train['comment_text'].apply(cleaner)
# Cleaning comments in Test Set
df_test['cleaned']=df_test['comment_text'].apply(cleaner)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment