Skip to content

Instantly share code, notes, and snippets.

@baobao
Last active December 7, 2018 04:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save baobao/bbfb2a78d269eba9eb4f to your computer and use it in GitHub Desktop.
Save baobao/bbfb2a78d269eba9eb4f to your computer and use it in GitHub Desktop.
htmlをパースして、使用している画像を全てoutput.htmlに書き出すpythonスクリプト
# -*- coding: utf-8 -*-
import urllib2
from HTMLParser import HTMLParser
URL ="http://google.com"
OUTPUT="output.html"
urlList = []
class TestParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
def handle_starttag(self,tagname,attribute):
if tagname.lower() == "img":
for i in attribute:
if i[0].lower() == "src":
imgurl = i[1];
urlList.append(imgurl)
show()
def show():
str = ""
for imgUrl in urlList:
url='<img src="' + imgUrl +'" />'
# print url
str+=url
#print str
createFile(str)
pass
def createFile(str):
f = open(OUTPUT, "w")
f.write(str)
f.close()
pass
if __name__ == "__main__":
url = URL
htmldata = urllib2.urlopen(url)
parser = TestParser()
parser.feed(htmldata.read())
parser.close()
htmldata.close()
@baobao
Copy link
Author

baobao commented Feb 17, 2013

ここからパースするところを書いていきます。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment