Skip to content

Instantly share code, notes, and snippets.

@christopherkullenberg
Created December 30, 2015 13:07
Show Gist options
  • Save christopherkullenberg/5aba769551c058b4b048 to your computer and use it in GitHub Desktop.
Save christopherkullenberg/5aba769551c058b4b048 to your computer and use it in GitHub Desktop.
Question concering XML
"""
Data structure: http://libris.kb.se/xsearch?d=swepub&hitlist&q=l%C3%A4ros%C3%A4te%3agu&f=ext&spell=true&hist=true&n=200&p=1
Trying to access only the value after "code="u">" in:
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="a">Alvestad, Torgeir,</subfield>
<subfield code="d">1960-,</subfield>
<subfield code="u">Göteborgs universitet, Institutionen för pedagogik och didaktik, University of Gothenburg, Department of Education</subfield>
<subfield code="4">edt</subfield>
<subfield code="0">(SwePub:chalmers.se)xalvto</subfield>
</datafield>
How?
"""
from os import listdir
from lxml import etree as ET
for filename in listdir("GU20151228/"): #loads a large ammount of xml-files as above link
with open("GU20151228/" + filename) as currentFile:
tree = ET.parse(currentFile)
root = tree.getroot()
for child in root[0]:
for c in child:
for value in c:
print(value.text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment