Make sure you've got these:
from lxml import etree
import csv
The snippets will all be related to parsing XPaths with etree vs. CSV stuff.
A useful function to test how many nodes match a single xpath. e.g. if you have 3 dc:creator fields at 0, 1, 2, it will return 3. This is super helpful if you need to set up a condition for processing 1 element vs. multiples.
def lenTest(path,tree):
return len(tree.xpath(path))
How to get he value of an element using etree. If path = the XPath of the item, elemZero is the actual element within the tree at this XPath, including the fact that it's the 0th element (vs [1]
[2]
, [3]
etc). However, the value of elemZero is just the reference point. So you need to use etree.tostring, which parses that element. I include encoding and method values here because it's useful as a reference point.
elemZero = tree.xpath(path)[0]
elemZeroValue = etree.tostring(elemZero, encoding='UTF-8', method='text')
If there's just instance of the element at that XPath, you generally need the [0]
anyway. You can try testing without.
A simple example in EAD3:
# titleXPath is declared as a variable so you can switch to EAD 2002 if you want
# throws on a .strip() just in case your data is imperfect.
titleXPath = "/ead/control/filedesc/titlestmt/titleproper"
titleString = etree.tostring(tree.xpath(titleXPath)[0], encoding='UTF-8', method='text').strip()