Skip to content

Instantly share code, notes, and snippets.

@nvie

nvie/generic.py Secret

Created October 3, 2014 08:14
Show Gist options
  • Save nvie/f304caf3b4f1ca4c3884 to your computer and use it in GitHub Desktop.
Save nvie/f304caf3b4f1ca4c3884 to your computer and use it in GitHub Desktop.
Complete solution to the problem posed in this blog post: http://nvie.com/posts/modifying-deeply-nested-structures/
def traverse(obj, path=None, callback=None):
"""
Traverse an arbitrary Python object structure (limited to JSON data
types), calling a callback function for every element in the structure,
and inserting the return value of the callback as the new value.
"""
if path is None:
path = []
if isinstance(obj, dict):
value = {k: traverse(v, path + [k], callback)
for k, v in obj.items()}
elif isinstance(obj, list):
value = [traverse(elem, path + [[]], callback)
for elem in obj]
else:
value = obj
if callback is None:
return value
else:
return callback(path, value)
def traverse_modify(obj, target_path, action):
"""
Traverses an arbitrary object structure and where the path matches,
performs the given action on the value, replacing the node with the
action's return value.
"""
target_path = to_path(target_path)
def transformer(path, value):
if path == target_path:
return action(value)
else:
return value
return traverse(obj, callback=transformer)
def to_path(path):
"""
Helper function, converting path strings into path lists.
>>> to_path('foo')
['foo']
>>> to_path('foo.bar')
['foo', 'bar']
>>> to_path('foo.bar[]')
['foo', 'bar', []]
"""
if isinstance(path, list):
return path # already in list format
def _iter_path(path):
for parts in path.split('[]'):
for part in parts.strip('.').split('.'):
yield part
yield []
return list(_iter_path(path))[:-1]
from operator import itemgetter
from generic import traverse_modify
d = {
"timestamp": 1412282459,
"res": [
{
"group": "1",
"catlist": [
{
"cat": "1",
"start": "none",
"stop": "none",
"points": [
{"point": "1", "start": "13.00", "stop": "13.35"},
{"point": "2", "start": "11.00", "stop": "14.35"}
]
}
]
}
]
}
def sort_points(points):
"""Will sort a list of points."""
return sorted(points, reverse=True, key=itemgetter('stop'))
print(traverse_modify(d, 'res[].catlist[].points', sort_points))
@TheFifthFreedom
Copy link

Very elegant solution Vincent! There's one question that still lingers in my mind and would want your opinion on: how would you refactor the traverse function to stop the traversal as soon as you found a specified obj?

@nvie
Copy link
Author

nvie commented Jun 18, 2015

Sorry for not seeing your question earlier, @TheFifthFreedom — I don't seem to get notifications of this :)

Perhaps the best way to modify this would be to wrap the outer traverse() call into a try-catch block, and catch a specific type of exception the callback function is allowed to throw. Think StopTraversal. It would be a bit akin to Python's native StopIteration exception, to terminate generators.

@TheFifthFreedom
Copy link

Thanks for the response Vincent! :)

@wbolster
Copy link

As an alternative, a generator based approach yielding (path, value) tuples can be used:

def walk(obj, parent_first=True):

    # Top down?
    if parent_first:
        yield (), obj

    # For nested objects, the key is the path component.
    if isinstance(obj, dict):
        children = obj.items()

    # For nested lists, the position is the path component.
    elif isinstance(obj, (list, tuple)):
        children = enumerate(obj)

    # Scalar values have no children.
    else:
        children = []

    # Recurse into children
    for key, value in children:
        for child_path, child in walk(value, parent_first):
            yield (key,) + child_path, child

    # Bottom up?
    if not parent_first:
        yield (), obj

This allows both modification of the "current object" (if parent_first=True an object can be modified before it is traversed), and allows arbitrary "stop conditions" by just breaking from the loop.

Example that operates on any nested object that contains a key called "foo", removes a key "bar" from it (if present), and will stop after the first match:

for path, value in walk(obj):
    if isinstance(value, dict) and "foo" in value:
         value.pop("bar', None)
         break

Since path contains the full path to the current object, this can also be used, e.g. to limit any operations to specific "subtrees" of the object. To find all string values occurring anywhere inside a ['toplevel']['sublevel'] nested dictionary:

obj = {
    "toplevel": {
        "sublevel": {
            "key1": "match 1",
            "key2": ["match 2", "match 3"],
        },
    },
    "too bad": "no match",
}
for path, value in walk(obj):
    if path[0:2] == ('toplevel', 'sublevel') and isinstance(value, str):
        print(value)

This prints:

match 1
match 2
match 3

@nvie
Copy link
Author

nvie commented Oct 27, 2015

Nice, I like it. As a little extra: after having written this blog post, I have come across "remap" (part of the excellent boltons library) which does exactly what I tried to describe in the blog post. It even has a few extra clever constructs in there. You should read this blog post for a good intro to it.

@smrtslckr
Copy link

I recently stumbled upon your post and just wanted to thank you! I deal with some very interesting json structures regularly nowadays (http://www.hl7.org/fhir/bundle-transaction.json.html) and was initially at a loss as to how to tackle this with a generalized pattern (formerly lived in flat data land). I was just curious as to your recommendation for the best way to extend this to search for values intelligently, potentially apply different functions to the extracted values individually, and then return them in a new structure like this:

return {
            "field_1" : value_1,
            "field_2" : value_2,
             etc.      
            }

@dariushazimi
Copy link

HI @Erstwild

I have a similar json structure to yours and I am looking for way to be able to do the CRUD task (Mainly Create, Update, Delete,)
What solution did you go with? I am going to try both remap and as well as nvie's solution and see if I can figure it out.

@bpandola
Copy link

bpandola commented Feb 2, 2018

Just wanted to say Thanks for the very interesting and useful blog post and gist!

@rdavey
Copy link

rdavey commented Jun 20, 2018

Nice code!

How would one go about using this code to modify each value in the dict based on a lookup of the key and return the new dict?

@peterwwillis
Copy link

@nvie I have a small bug fix here: https://gist.github.com/peterwwillis/3b14de8d2c7e9f6ce8899266d8aeea6d when the data structure is a list and its first element is a dict, it can insert an empty string at the beginning of the target_path which I couldn't match against, so I added a continue on empty parts of the path

@metinsenturk
Copy link

This gist is super awesome !

I was looking into ways of nested search and was even started a few tests on how I could make it into a package! I think, getting a value from a dict with dot syntax, like {}.get('animals.cats') can make our lives super easier.

I will work on my version of this idea and practice this gist a lot. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment