Skip to content

Instantly share code, notes, and snippets.

@cloudaice
Created February 19, 2012 06:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save cloudaice/1862304 to your computer and use it in GitHub Desktop.
Save cloudaice/1862304 to your computer and use it in GitHub Desktop.
htmlparser
from HTMLParser import HTMLParser
import urllib
import sys
class parselinks(HTMLParser):
def __init__(self):
self.data=[]
self.href=0
self.linkname=''
HTMLParser.__init__(self)
def handle_starttag(self,tag,attrs):
if tag =='a':
for name,value in attrs:
if name == 'href':
self.href=1
def handle_data(self,data):
if self.href:
self.linkname+=data
def handle_endtag(self,tag):
if tag=='a':
self.linkname=''.join(self.linkname.split())
self.linkname=self.linkname.strip()
if self.linkname:
self.data.append(self.linkname)
self.linkname=''
self.href=0
def getresult(self):
for value in self.data:
print value
if __name__=="__main__":
IParser = parselinks()
IParser.feed(urllib.urlopen("http://www.python.org/index.html").read())
IParser.getresult()
IParser.close()
AdvancedSearch
About
News
Documentation
Download
下载
Community
Foundation
CoreDevelopment
Help
PackageIndex
QuickLinks(2.7.2)
Documentation
WindowsInstaller
SourceDistribution
QuickLinks(3.2.2)
Documentation
WindowsInstaller
SourceDistribution
PythonJobs
PythonMerchandise
PythonWiki
PythonInsiderBlog
Python2or3?
HelpMaintainWebsite
HelpFundPython
Non-EnglishResources
PythonReleaseScheduleiCalCalendar
Python3
PyPIpackagename
Results
Rackspace
IndustrialLightandMagic
AstraZeneca
Honeywell
andmanyothers
eWeek
more...
WebProgramming
CGI
Zope
Django
TurboGears
XML
Databases
ODBC
MySQL
GUIDevelopment
wxPython
tkInter
PyGtk
PyQt
ScientificandNumeric
Physics
Education
pyBiblio
SoftwareCarpentryCourse
Networking
Sockets
Twisted
SoftwareDevelopment
Buildbot
Trac
Roundup
IDEs
GameDevelopment
PyGame
PyKyra
3DRendering
more...
opensourcelicense
Python2orPython3
PythonSoftwareFoundation
PyConconference
Readmore
downloadPythonnow
O'ReillyOpenSourceConvention
CallforProposals
BestProgrammingLanguage
PyConinChina
IronPython2.7.1
PyArkansas
PyGotham
Python3.2.2
RSS
WebsitemaintainedbythePythoncommunity
hostingbyxs4all
designbyTimParkin
PythonSoftwareFoundation
LegalStatements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment