Skip to content

Instantly share code, notes, and snippets.

@ddbs
Last active November 16, 2022 23:03
Show Gist options
  • Save ddbs/e4577681dcb5106b95c2b5397e877b52 to your computer and use it in GitHub Desktop.
Save ddbs/e4577681dcb5106b95c2b5397e877b52 to your computer and use it in GitHub Desktop.
Sanitize text
import re
def sanitize_text(text):
# remove extra white space
text = ' '.join(text.split())
# remove white space at start/end of text
text = text.lstrip()
text = text.rstrip()
# capitalize only first letter without changing the others
text = text[0].upper() + text[1:]
# replace invalid punctuation at end of text
if text[-1] == ";":
text = text[:-1]
if text[-1] == ",":
text = text[:-1]
# add full stop if the text does not end with valid punctuation
if re.match('[\w\s]+[?.!]$', text) is None:
text = text + "."
return(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment