Created
January 8, 2015 14:39
-
-
Save htv2012/ca050e1c4f752a95a0cd to your computer and use it in GitHub Desktop.
A small utility to take in as input an HTML file and output formatted information. I used this to un-garble test failures messages.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Unescape HTML junk into something readable | |
Usage: python html_unescape.py < input_file | |
""" | |
import fileinput | |
import HTMLParser | |
replacements = [ | |
('\\:', ':') | |
] | |
def main(): | |
parser = HTMLParser.HTMLParser() | |
for html_text in fileinput.input(): | |
plain_text = parser.unescape(html_text) | |
for old_text, new_text in replacements: | |
plain_text = plain_text.replace(old_text, new_text) | |
print plain_text | |
if __name__ == '__main__': | |
main() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment