Skip to content

Instantly share code, notes, and snippets.

@ronaldhan
Created December 15, 2015 01:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ronaldhan/77531c9d2cc3c0ee1b93 to your computer and use it in GitHub Desktop.
Save ronaldhan/77531c9d2cc3c0ee1b93 to your computer and use it in GitHub Desktop.
python xml解析

在进行网络数据抓取抓取时,使用requests可以直接处理json格式的返回结果,对于xml格式的返回结果requests并没有提供直接的支持。python内建了xml解析器,下面的示例说明两种情况下对xml的解析。

解析xml文件

from xml.etree import ElementTree

xml_file = r'D:\BJMapSearch.xml' 
try:
    tree = ElementTree.parse(xml_file)
    root = tree.getroot() # 获取根节点
except Exception, e:
    print '解析xml文件出错'
    return -1

# 对结点进行查询
data_node = root.find("layer").find("hts").findall("ht")
for node in data_node:
    if u'关联关系' in node.attrib.keys():
         print node.attrib.get(u'关联关系')
         
# xml结构如下
<Response>
<count>10</count>
<total>4706</total>
<actualtotal>4706</actualtotal>
<layer id="L10319" type="">...</layer>
</Response>

# layer结点的结构如下
<hts Sum="10">
<ht></ht>
<ht></ht>
...
</hts>

网络获取xml进行解析

import requests
from xml.etree import ElementTree

xml_file = requests.get('http://www.beijingmap.gov.cn/bjgtj/BJMapSearch?p=0%2C10&s=%2A&l=L10319&t=xml')

# fromstring方法直接返回root结点
root = ElementTree.fromstring(xml_file.text)
data_node = root.find("layer").find("hts").findall("ht")
for node in data_node:
    if u'关联关系' in node.attrib.keys():
        print node.attrib.get(u'关联关系')

如果网络请求返回的结果比较大,需要使用另外的方式

response = requests.get(url, stream=True)
# if the server sent a Gzip or Deflate compressed response, decompress
# as we read the raw stream:
response.raw.decode_content = True

events = ElementTree.iterparse(response.raw)
for elem, event in events:
    # do something with `elem`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment