Skip to content

Instantly share code, notes, and snippets.

@widiger-anna
Last active May 4, 2018 18:38
Show Gist options
  • Save widiger-anna/786db42764e2899a76838f2ca0378a57 to your computer and use it in GitHub Desktop.
Save widiger-anna/786db42764e2899a76838f2ca0378a57 to your computer and use it in GitHub Desktop.
Regex examples for removing numbers and special characters inside a word
from __future__ import unicode_literals, print_function
import re
'''
For a token (word):
clean_words removes anything that is not alphanumeric (numbers, special characters)
remove_numbers substitutes a number with a placeholder NUMBER
'''
def clean_words(word):
try:
word = re.sub(r'\W|\d', '', word, flags=re.UNICODE)
except:
pass
return word
def remove_numbers(word):
try:
word = re.sub(r'\d', ' NUMBER ', word, flags=re.UNICODE)
except:
pass
return word
word = "l8ter"
print(clean_words(word))
print(remove_numbers(word))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment