Skip to content

Instantly share code, notes, and snippets.

@mohdsanadzakirizvi
Last active August 7, 2019 02:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mohdsanadzakirizvi/4bba4525166d44aaca640cb874a14e21 to your computer and use it in GitHub Desktop.
Save mohdsanadzakirizvi/4bba4525166d44aaca640cb874a14e21 to your computer and use it in GitHub Desktop.
import re
def text_cleaner(text):
# lower case text
newString = text.lower()
newString = re.sub(r"'s\b","",newString)
# remove punctuations
newString = re.sub("[^a-zA-Z]", " ", newString)
long_words=[]
# remove short word
for i in newString.split():
if len(i)>=3:
long_words.append(i)
return (" ".join(long_words)).strip()
# preprocess the text
data_new = text_cleaner(data_text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment