Create a gist now

Instantly share code, notes, and snippets.

@nvie /generic.py Secret
Created Oct 3, 2014

What would you like to do?
Complete solution to the problem posed in this blog post: http://nvie.com/posts/modifying-deeply-nested-structures/
def traverse(obj, path=None, callback=None):
"""
Traverse an arbitrary Python object structure (limited to JSON data
types), calling a callback function for every element in the structure,
and inserting the return value of the callback as the new value.
"""
if path is None:
path = []
if isinstance(obj, dict):
value = {k: traverse(v, path + [k], callback)
for k, v in obj.items()}
elif isinstance(obj, list):
value = [traverse(elem, path + [[]], callback)
for elem in obj]
else:
value = obj
if callback is None:
return value
else:
return callback(path, value)
def traverse_modify(obj, target_path, action):
"""
Traverses an arbitrary object structure and where the path matches,
performs the given action on the value, replacing the node with the
action's return value.
"""
target_path = to_path(target_path)
def transformer(path, value):
if path == target_path:
return action(value)
else:
return value
return traverse(obj, callback=transformer)
def to_path(path):
"""
Helper function, converting path strings into path lists.
>>> to_path('foo')
['foo']
>>> to_path('foo.bar')
['foo', 'bar']
>>> to_path('foo.bar[]')
['foo', 'bar', []]
"""
if isinstance(path, list):
return path # already in list format
def _iter_path(path):
for parts in path.split('[]'):
for part in parts.strip('.').split('.'):
yield part
yield []
return list(_iter_path(path))[:-1]
from operator import itemgetter
from generic import traverse_modify
d = {
"timestamp": 1412282459,
"res": [
{
"group": "1",
"catlist": [
{
"cat": "1",
"start": "none",
"stop": "none",
"points": [
{"point": "1", "start": "13.00", "stop": "13.35"},
{"point": "2", "start": "11.00", "stop": "14.35"}
]
}
]
}
]
}
def sort_points(points):
"""Will sort a list of points."""
return sorted(points, reverse=True, key=itemgetter('stop'))
print(traverse_modify(d, 'res[].catlist[].points', sort_points))

Very elegant solution Vincent! There's one question that still lingers in my mind and would want your opinion on: how would you refactor the traverse function to stop the traversal as soon as you found a specified obj?

Owner
nvie commented Jun 18, 2015

Sorry for not seeing your question earlier, @TheFifthFreedom — I don't seem to get notifications of this :)

Perhaps the best way to modify this would be to wrap the outer traverse() call into a try-catch block, and catch a specific type of exception the callback function is allowed to throw. Think StopTraversal. It would be a bit akin to Python's native StopIteration exception, to terminate generators.

Thanks for the response Vincent! :)

As an alternative, a generator based approach yielding (path, value) tuples can be used:

def walk(obj, parent_first=True):

    # Top down?
    if parent_first:
        yield (), obj

    # For nested objects, the key is the path component.
    if isinstance(obj, dict):
        children = obj.items()

    # For nested lists, the position is the path component.
    elif isinstance(obj, (list, tuple)):
        children = enumerate(obj)

    # Scalar values have no children.
    else:
        children = []

    # Recurse into children
    for key, value in children:
        for child_path, child in walk(value, parent_first):
            yield (key,) + child_path, child

    # Bottom up?
    if not parent_first:
        yield (), obj

This allows both modification of the "current object" (if parent_first=True an object can be modified before it is traversed), and allows arbitrary "stop conditions" by just breaking from the loop.

Example that operates on any nested object that contains a key called "foo", removes a key "bar" from it (if present), and will stop after the first match:

for path, value in walk(obj):
    if isinstance(value, dict) and "foo" in value:
         value.pop("bar', None)
         break

Since path contains the full path to the current object, this can also be used, e.g. to limit any operations to specific "subtrees" of the object. To find all string values occurring anywhere inside a ['toplevel']['sublevel'] nested dictionary:

obj = {
    "toplevel": {
        "sublevel": {
            "key1": "match 1",
            "key2": ["match 2", "match 3"],
        },
    },
    "too bad": "no match",
}
for path, value in walk(obj):
    if path[0:2] == ('toplevel', 'sublevel') and isinstance(value, str):
        print(value)

This prints:

match 1
match 2
match 3
Owner
nvie commented Oct 27, 2015

Nice, I like it. As a little extra: after having written this blog post, I have come across "remap" (part of the excellent boltons library) which does exactly what I tried to describe in the blog post. It even has a few extra clever constructs in there. You should read this blog post for a good intro to it.

I recently stumbled upon your post and just wanted to thank you! I deal with some very interesting json structures regularly nowadays (http://www.hl7.org/fhir/bundle-transaction.json.html) and was initially at a loss as to how to tackle this with a generalized pattern (formerly lived in flat data land). I was just curious as to your recommendation for the best way to extend this to search for values intelligently, potentially apply different functions to the extracted values individually, and then return them in a new structure like this:

return {
            "field_1" : value_1,
            "field_2" : value_2,
             etc.      
            }

HI @Erstwild

I have a similar json structure to yours and I am looking for way to be able to do the CRUD task (Mainly Create, Update, Delete,)
What solution did you go with? I am going to try both remap and as well as nvie's solution and see if I can figure it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment