Skip to content

Instantly share code, notes, and snippets.

@mukulrawat1986
Created May 8, 2014 17:41
Show Gist options
  • Save mukulrawat1986/9674b54f63abbe4352e4 to your computer and use it in GitHub Desktop.
Save mukulrawat1986/9674b54f63abbe4352e4 to your computer and use it in GitHub Desktop.
Stripping text of words and punctutations
text = raw_input()
# split text into words based on white space and strip punctuation
texts = [word.strip(string.punctuation) for word in text.split()]
# remove words with length 0
texts = [word for word in texts if len(word)!=0]
# to remove punctuation from words which are only separated by punctuation
# no whitespace e.g. "abc,def.ghi"
texts = [re.sub("[^a-zA-Z0-9 ]", " ", word) for word in texts]
texts = [x for word in texts for x in word.split()]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment