Skip to content

Instantly share code, notes, and snippets.

@alinazhanguwo
Created December 20, 2020 04:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alinazhanguwo/efe316b842f3c6ea691f162a1f2e0b0d to your computer and use it in GitHub Desktop.
Save alinazhanguwo/efe316b842f3c6ea691f162a1f2e0b0d to your computer and use it in GitHub Desktop.
STOPWORDS = stopwords.words('english')
STOPWORDS = set(STOPWORDS)
def text_prepare(text, STOPWORDS):
"""
text: a string
return: a clean string
"""
REPLACE_BY_SPACE_RE = re.compile('[\n\"\'/(){}\[\]\|@,;#]')
text = re.sub(REPLACE_BY_SPACE_RE, ' ', text)
text = re.sub(' +', ' ', text)
text = text.lower()
# delete stopwords from text
text = ' '.join([word for word in text.split() if word not in STOPWORDS])
text = text.strip()
return text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment