Skip to content

Instantly share code, notes, and snippets.

@soobrosa
Created September 25, 2013 10:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save soobrosa/6697632 to your computer and use it in GitHub Desktop.
Save soobrosa/6697632 to your computer and use it in GitHub Desktop.
clean text (needs homogenize)
def clean(sentence):
stopchars = ['.', ',', '?', '!', '"', '-']
gain = []
sentence = sentence.lower()
for char in stopchars:
sentence = sentence.replace(char,' ')
words = sentence.split(' ')
for word in words:
if word <> '':
gain.append(homogenize(word.decode('utf-8')))
return gain
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment