Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Convert an lxml.etree node tree into a dict.
def elem2dict(node):
"""
Convert an lxml.etree node tree into a dict.
"""
d = {}
for e in node.iterchildren():
key = e.tag.split('}')[1] if '}' in e.tag else e.tag
value = e.text if e.text else elem2dict(e)
d[key] = value
return d
@pjknowles
Copy link

Here's a version that introduces also element attributes into the dictionary.

def elem2dict(node, attributes=True):
    """
    Convert an lxml.etree node tree into a dict.
    """
    result = {}
    if attributes:
        for item in node.attrib.items():
            key, result[key] = item

    for element in node.iterchildren():
        # Remove namespace prefix
        key = element.tag.split('}')[1] if '}' in element.tag else element.tag

        # Process element as tree element if the inner XML contains non-whitespace content
        if element.text and element.text.strip():
            value = element.text
        else:
            value = elem2dict(element)
        if key in result:
            if type(result[key]) is list:
                result[key].append(value)
            else:
                result[key] = [result[key], value]
        else:
            result[key] = value
    return result

@tyler-8
Copy link

tyler-8 commented Dec 28, 2022

Version with key parsing done using lxml.etree's QName method

from lxml import etree
def elem2dict(node, attributes=True):
    """
    Convert an lxml.etree node tree into a dict.
    """
    result = {}
    if attributes:
        for item in node.attrib.items():
            key, result[key] = item

    for element in node.iterchildren():
        # Remove namespace prefix
        key = etree.QName(element).localname

        # Process element as tree element if the inner XML contains non-whitespace content
        if element.text and element.text.strip():
            value = element.text
        else:
            value = elem2dict(element)
        if key in result:
            if type(result[key]) is list:
                result[key].append(value)
            else:
                result[key] = [result[key], value]
        else:
            result[key] = value
    return result

@fedorovmn
Copy link

The function doesn't convert root tag into a dict. So you should care about a root tag outside a recursion if you need it.
for ex. testtest2 will be converted to {a: test1, b: test2} NOT {'root': {a: test1, b: test2}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment