Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Convert HTML to markup for Jekyll
#
# The Jekyll import tool (http://import.jekyllrb.com/docs/blogger/)
# creates HTML files. I'd like to use html2text
# (https://github.com/aaronsw/html2text) to convert those to Markdown.
# The challenge is that Jekyll files have a YAML header at the top that gets
# mangled by the conversion. This strips the header, passes the remainder
# of the body into html2text, then adds the header back to the result.
#
import os
import subprocess
import sys
import tempfile
import html2text
for filename in sys.argv[1:]:
with open(filename, 'r') as infile:
contents = infile.read()
# Find the end of the YAML header
endheader = contents.find('\n---\n')
header = contents[:endheader + 5]
new_body = html2text.html2text(contents[endheader + 5:].decode('unicode_escape'))
with open(os.path.splitext(filename)[0]+'.markdown', 'w') as f:
f.write(header)
f.write('\n')
f.write(new_body.encode('utf8'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment