Skip to content

Instantly share code, notes, and snippets.

@abhishek-shrm
Created August 4, 2020 03:02
Show Gist options
  • Save abhishek-shrm/d6c88e3f9682ba0430fcf3d94e6b85a5 to your computer and use it in GitHub Desktop.
Save abhishek-shrm/d6c88e3f9682ba0430fcf3d94e6b85a5 to your computer and use it in GitHub Desktop.
# Function for Cleaning Text
def clean_text(text):
text=re.sub('\w*\d\w*','', text)
text=re.sub('\n',' ',text)
text=re.sub(r"http\S+", "", text)
text=re.sub('[^a-z]',' ',text)
return text
# Cleaning corpus using RegEx
training_corpus['cleaned']=training_corpus['cleaned'].apply(lambda x: clean_text(x))
testing_corpus['cleaned']=testing_corpus['cleaned'].apply(lambda x: clean_text(x))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment