Skip to content

Instantly share code, notes, and snippets.

@ivan1911
Created April 11, 2013 14:46
Show Gist options
  • Save ivan1911/5363966 to your computer and use it in GitHub Desktop.
Save ivan1911/5363966 to your computer and use it in GitHub Desktop.
korean parse
#! /usr/bin/env python
# -*- coding: utf-8 -*-
#
import lxml.html
f = open('in.htm', 'r')
html = f.read()
f.close()
doc = lxml.html.document_fromstring(html)
for tr in doc.xpath('//*[@style="border: 1px solid rgb(178, 178, 178); width: 70px; height: 40px; overflow: hidden; margin-left: 5px;"]'):
url = tr.getchildren()[0].attrib['href']
preview = tr.getchildren()[0].getchildren()[0].attrib['src']
print url, preview
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment