Skip to content

Instantly share code, notes, and snippets.

@davidpgero
Created April 25, 2011 20:21
NLTKclean_html
from nltk import clean_html
from urllib2 import urlopen
html = urlopen('http://mek.niif.hu/00700/00707/html/vs192601.htm').read()
tisztitott_html = clean_html(html)
print tisztitott_html[:50]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment