Skip to content

Instantly share code, notes, and snippets.

@nickfox-taterli
Created June 20, 2020 09:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nickfox-taterli/f3559b13e8434b3555e67dd5303c8e85 to your computer and use it in GitHub Desktop.
Save nickfox-taterli/f3559b13e8434b3555e67dd5303c8e85 to your computer and use it in GitHub Desktop.
HTML文件XPath测试
# 导入 etree类
from lxml import etree
# 作为示例的 html文本
f = open("test.html",encoding='utf-8')
html = f.read()
dom = etree.HTML(html)
for b in range(len(dom.xpath('/html/body/table/tbody/tr/td[2]/text()'))):
i = b + 1
# name = dom.xpath('/html/body/table/tbody/tr[' + str(i) + ']/td[2]/text()')[0] + ' ' + \
# dom.xpath('/html/body/table/tbody/tr[' + str(i) + ']/td[1]/text()')[0]
name = dom.xpath('/html/body/table/tbody/tr[' + str(i) + ']/td[1]/text()')[0]
url = dom.xpath('/html/body/table/tbody/tr[' + str(i) + ']/td[3]')[0].attrib['endpoint'].replace('http://', '').replace(
'/', '')
en_name = url.split('.')[0]
print('''
++ %s
menu = %s
title = DigitalOcean %s
host = %s''' % (en_name, en_name, name, url))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment