Skip to content

Instantly share code, notes, and snippets.

@rshipp
Created July 7, 2014 23:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rshipp/59db7192c40209baf19a to your computer and use it in GitHub Desktop.
Save rshipp/59db7192c40209baf19a to your computer and use it in GitHub Desktop.
Playing with reconstructing parsed html5.
#!/usr/bin/env python3
import html5lib
data = html5lib.parse(open("test.html", "r").read())
def start_tag(node):
return '<' + node.tag.split('}')[1:][0] + \
''.join([' {}="{}"'.format(item[0], item[1])
for item in node.items()]) + '>'
def close_tag(node):
return '</' + node.tag.split('}')[1:][0] + '>'
def indent(data, level=0, space=" ", times=2):
for node in data:
print(space*times*level + start_tag(node))
if node.text and node.text.strip():
print(space*times*level + node.text.strip())
if node:
indent(node, level+1)
print(space*times*level + close_tag(node))
indent(data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment