Skip to content

Instantly share code, notes, and snippets.

@raylu
Created May 1, 2015 23:15
Show Gist options
  • Save raylu/366ad9e68b957ee4b275 to your computer and use it in GitHub Desktop.
Save raylu/366ad9e68b957ee4b275 to your computer and use it in GitHub Desktop.
(x)html parser
#!/usr/bin/env python3
import html
import http.client
import re
from xml.etree import ElementTree
parser = ElementTree.XMLParser()
for entity, char in html.entities.html5.items():
parser.entity[entity[:-1]] = char
conn = http.client.HTTPSConnection('docs.python.org')
conn.request('GET', '/3/library/xml.etree.elementtree.html')
response = conn.getresponse()
xmlstring = response.read().decode('utf-8')
xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)
tree = ElementTree.fromstring(xmlstring, parser=parser)
print(next(tree.iter('a')).text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment