Skip to content

Instantly share code, notes, and snippets.

@cathalgarvey
Created January 20, 2013 18:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cathalgarvey/4580577 to your computer and use it in GitHub Desktop.
Save cathalgarvey/4580577 to your computer and use it in GitHub Desktop.
Word soup fixer, for emails written in one long line full of ellipses. I get a surprising number of these.
#!/usr/bin/env python3
import sys
fixfile = sys.argv[1]
with open(fixfile) as InputFile:
word_soup = InputFile.read()
# Strip off excess whitespace and any trailing ellipsis.
word_soup = word_soup.strip().strip(".!?")
# People use variable numbers of periods in their wild ellipses-rants,
# so let's reduce those all down to only two-period pairs; easily found
# and worked upon.
while "..." in word_soup:
word_soup = word_soup.replace("...", "..")
# Now that dot^x is replaced with dot^2, we can split at dot^2:
word_soup = word_soup.split("..")
new_lines = []
# Bear in mind, "lines" might actually be paragraphs with proper grammar
# or newlines characters. We're just treating the ellipses, here.
for line in word_soup:
# Remove extra whitespace and punctuation characters at either end:
line = line.strip().strip(".!?")
# TODO: Add simple logic to append "?" to lines beginning with common
# query words like "How", "What" etc.
# Convert first character to upper-case, and add a full-stop character
# to the end to replace any stripped punctuation characters.
line = line[0].upper() + line[1:] + "."
# Add fixed line to new lines:
new_lines.append(line)
# Output the fixed lines delimited by a newline character.
print('\n'.join(new_lines))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment