-
-
Save nvie/f304caf3b4f1ca4c3884 to your computer and use it in GitHub Desktop.
def traverse(obj, path=None, callback=None): | |
""" | |
Traverse an arbitrary Python object structure (limited to JSON data | |
types), calling a callback function for every element in the structure, | |
and inserting the return value of the callback as the new value. | |
""" | |
if path is None: | |
path = [] | |
if isinstance(obj, dict): | |
value = {k: traverse(v, path + [k], callback) | |
for k, v in obj.items()} | |
elif isinstance(obj, list): | |
value = [traverse(elem, path + [[]], callback) | |
for elem in obj] | |
else: | |
value = obj | |
if callback is None: | |
return value | |
else: | |
return callback(path, value) | |
def traverse_modify(obj, target_path, action): | |
""" | |
Traverses an arbitrary object structure and where the path matches, | |
performs the given action on the value, replacing the node with the | |
action's return value. | |
""" | |
target_path = to_path(target_path) | |
def transformer(path, value): | |
if path == target_path: | |
return action(value) | |
else: | |
return value | |
return traverse(obj, callback=transformer) | |
def to_path(path): | |
""" | |
Helper function, converting path strings into path lists. | |
>>> to_path('foo') | |
['foo'] | |
>>> to_path('foo.bar') | |
['foo', 'bar'] | |
>>> to_path('foo.bar[]') | |
['foo', 'bar', []] | |
""" | |
if isinstance(path, list): | |
return path # already in list format | |
def _iter_path(path): | |
for parts in path.split('[]'): | |
for part in parts.strip('.').split('.'): | |
yield part | |
yield [] | |
return list(_iter_path(path))[:-1] |
from operator import itemgetter | |
from generic import traverse_modify | |
d = { | |
"timestamp": 1412282459, | |
"res": [ | |
{ | |
"group": "1", | |
"catlist": [ | |
{ | |
"cat": "1", | |
"start": "none", | |
"stop": "none", | |
"points": [ | |
{"point": "1", "start": "13.00", "stop": "13.35"}, | |
{"point": "2", "start": "11.00", "stop": "14.35"} | |
] | |
} | |
] | |
} | |
] | |
} | |
def sort_points(points): | |
"""Will sort a list of points.""" | |
return sorted(points, reverse=True, key=itemgetter('stop')) | |
print(traverse_modify(d, 'res[].catlist[].points', sort_points)) |
Nice, I like it. As a little extra: after having written this blog post, I have come across "remap" (part of the excellent boltons library) which does exactly what I tried to describe in the blog post. It even has a few extra clever constructs in there. You should read this blog post for a good intro to it.
I recently stumbled upon your post and just wanted to thank you! I deal with some very interesting json structures regularly nowadays (http://www.hl7.org/fhir/bundle-transaction.json.html) and was initially at a loss as to how to tackle this with a generalized pattern (formerly lived in flat data land). I was just curious as to your recommendation for the best way to extend this to search for values intelligently, potentially apply different functions to the extracted values individually, and then return them in a new structure like this:
return {
"field_1" : value_1,
"field_2" : value_2,
etc.
}
HI @Erstwild
I have a similar json structure to yours and I am looking for way to be able to do the CRUD task (Mainly Create, Update, Delete,)
What solution did you go with? I am going to try both remap and as well as nvie's solution and see if I can figure it out.
Just wanted to say Thanks for the very interesting and useful blog post and gist!
Nice code!
How would one go about using this code to modify each value in the dict based on a lookup of the key and return the new dict?
@nvie I have a small bug fix here: https://gist.github.com/peterwwillis/3b14de8d2c7e9f6ce8899266d8aeea6d when the data structure is a list and its first element is a dict, it can insert an empty string at the beginning of the target_path which I couldn't match against, so I added a continue on empty parts of the path
This gist is super awesome !
I was looking into ways of nested search and was even started a few tests on how I could make it into a package! I think, getting a value from a dict with dot syntax, like {}.get('animals.cats')
can make our lives super easier.
I will work on my version of this idea and practice this gist a lot. Thanks!
As an alternative, a generator based approach yielding
(path, value)
tuples can be used:This allows both modification of the "current object" (if
parent_first=True
an object can be modified before it is traversed), and allows arbitrary "stop conditions" by just breaking from the loop.Example that operates on any nested object that contains a key called "foo", removes a key "bar" from it (if present), and will stop after the first match:
Since
path
contains the full path to the current object, this can also be used, e.g. to limit any operations to specific "subtrees" of the object. To find all string values occurring anywhere inside a['toplevel']['sublevel']
nested dictionary:This prints: