Skip to content

Instantly share code, notes, and snippets.

@mhashizume
Created February 10, 2020 02:53
Show Gist options
  • Save mhashizume/8a466070aefb7f9cd58fcf1c42714097 to your computer and use it in GitHub Desktop.
Save mhashizume/8a466070aefb7f9cd58fcf1c42714097 to your computer and use it in GitHub Desktop.
def sanitize_text():
total_text = []
# Regex split to strip non-alphanumeric characters
for line in fileinput.input():
line.casefold()
words = [x for x in re.split("\W",line) if x]
total_text += words
return total_text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment