Skip to content

Instantly share code, notes, and snippets.

@thecapacity
Created April 2, 2016 23:07
Show Gist options
  • Save thecapacity/adbd7ef9b508891b7f61f92b063bbd97 to your computer and use it in GitHub Desktop.
Save thecapacity/adbd7ef9b508891b7f61f92b063bbd97 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# -*- coding: ascii -*-
### Sample Usage: for f in `ls *.md`; do html_unescape.py $f > ok && mv ok $f; done
import sys
import codecs
import HTMLParser
h = HTMLParser.HTMLParser()
def convert_html(filename):
f = open(filename, "r")
text = h.unescape( codecs.open(filename, "r", "utf-8").read() )
# text = h.unescape( f.read() )
return text.encode('utf-8')
if __name__ == "__main__":
print convert_html( sys.argv[1] )
@thecapacity
Copy link
Author

Used this to convert the Wordpress export to jekyll / Markdown

@thecapacity
Copy link
Author

thecapacity commented Apr 28, 2016

Use via base with something to the effect of;
for f in *.md; do html_unescape.py $f > ok && mv ok $f done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment