Skip to content

Instantly share code, notes, and snippets.

@htv2012
Created January 8, 2015 14:39
Show Gist options
  • Save htv2012/ca050e1c4f752a95a0cd to your computer and use it in GitHub Desktop.
Save htv2012/ca050e1c4f752a95a0cd to your computer and use it in GitHub Desktop.
A small utility to take in as input an HTML file and output formatted information. I used this to un-garble test failures messages.
"""
Unescape HTML junk into something readable
Usage: python html_unescape.py < input_file
"""
import fileinput
import HTMLParser
replacements = [
('\\:', ':')
]
def main():
parser = HTMLParser.HTMLParser()
for html_text in fileinput.input():
plain_text = parser.unescape(html_text)
for old_text, new_text in replacements:
plain_text = plain_text.replace(old_text, new_text)
print plain_text
if __name__ == '__main__':
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment