Skip to content

Instantly share code, notes, and snippets.

@amankharwal
Created December 7, 2020 13:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amankharwal/6517b9efe16a50e6be28515628776cd3 to your computer and use it in GitHub Desktop.
Save amankharwal/6517b9efe16a50e6be28515628776cd3 to your computer and use it in GitHub Desktop.
import re
def preprocessor(text):
text=re.sub('<[^>]*>','',text)
emojis=re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)',text)
text=re.sub('[\W]+',' ',text.lower()) +\
' '.join(emojis).replace('-','')
return text
data['text']=data['text'].apply(preprocessor)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment