Create a gist now

Instantly share code, notes, and snippets.

@mahmoud /remerge.py
Last active Apr 11, 2017

What would you like to do?
Recursively merging dictionaries with boltons.iterutils.remap. Useful for @hynek's configs. https://twitter.com/hynek/status/696720593002041345
"""
This is an extension of the technique first detailed here:
http://sedimental.org/remap.html#add_common_keys
In short, it calls remap on each container, back to front, using the accumulating
previous values as the default for the current iteration.
"""
from boltons.iterutils import remap, get_path, default_enter, default_visit
defaults = {'host': '127.0.0.1',
'port': 8000,
'endpoints': {'persistence': {'host': '127.0.0.1',
'port': 8888},
'cache': {'host': '127.0.0.1',
'port': 8889}},
'owners': {'secondary': ['alice']},
'zones': [{'a': 1}],
'notes': ['this is the default']}
overlay = {'host': '127.0.0.1',
'port': 8080,
'endpoints': {'persistence': {'host': '10.2.2.2',
'port': 5433}},
'overlay_version': '5.0',
'owners': {'primary': ['bob'], 'secondary': ['charles']},
'zones': [{'a': 2}],
'notes': ['this is the overlay']}
cache_host_override = {'endpoints': {'cache': {'host': '127.0.0.2'}}}
def remerge(target_list, sourced=False):
"""Takes a list of containers (e.g., dicts) and merges them using
boltons.iterutils.remap. Containers later in the list take
precedence (last-wins).
By default, returns a new, merged top-level container. With the
*sourced* option, `remerge` expects a list of (*name*, container*)
pairs, and will return a source map: a dictionary mapping between
path and the name of the container it came from.
"""
if not sourced:
target_list = [(id(t), t) for t in target_list]
ret = None
source_map = {}
def remerge_enter(path, key, value):
new_parent, new_items = default_enter(path, key, value)
if ret and not path and key is None:
new_parent = ret
try:
cur_val = get_path(ret, path + (key,))
except KeyError:
pass
else:
# TODO: type check?
new_parent = cur_val
if isinstance(value, list):
# lists are purely additive. See https://github.com/mahmoud/boltons/issues/81
new_parent.extend(value)
new_items = []
return new_parent, new_items
for t_name, target in target_list:
if sourced:
def remerge_visit(path, key, value):
source_map[path + (key,)] = t_name
return True
else:
remerge_visit = default_visit
ret = remap(target, enter=remerge_enter, visit=remerge_visit)
if not sourced:
return ret
return ret, source_map
def main():
from pprint import pprint
merged, source_map = remerge([('defaults', defaults),
('overlay', overlay),
('cache_host_override', cache_host_override)],
sourced=True)
assert merged['host'] == '127.0.0.1'
assert merged['port'] == 8080
assert merged['endpoints']['persistence']['host'] == '10.2.2.2'
assert merged['endpoints']['persistence']['port'] == 5433
assert merged['endpoints']['cache']['host'] == '127.0.0.2'
assert merged['endpoints']['cache']['port'] == 8889
assert merged['overlay_version'] == '5.0'
pprint(merged)
print
pprint(source_map)
print(len(source_map), 'paths')
if __name__ == '__main__':
main()
{'endpoints': {'cache': {'host': '127.0.0.2', 'port': 8889},
'persistence': {'host': '10.2.2.2', 'port': 5433}},
'host': '127.0.0.1',
'notes': ['this is the default', 'this is the overlay'],
'overlay_version': '5.0',
'owners': {'primary': ['bob'], 'secondary': ['alice', 'charles']},
'port': 8080,
'zones': [{'a': 1}, {'a': 2}]}
{('endpoints',): 'cache_host_override',
('endpoints', 'cache'): 'cache_host_override',
('endpoints', 'cache', 'host'): 'cache_host_override',
('endpoints', 'cache', 'port'): 'defaults',
('endpoints', 'persistence'): 'overlay',
('endpoints', 'persistence', 'host'): 'overlay',
('endpoints', 'persistence', 'port'): 'overlay',
('host',): 'overlay',
('notes',): 'overlay',
('overlay_version',): 'overlay',
('owners',): 'overlay',
('owners', 'primary'): 'overlay',
('owners', 'secondary'): 'overlay',
('port',): 'overlay',
('zones',): 'overlay'}
(15, 'paths')

pleasantone commented Jun 15, 2016 edited

There is a bug (limitation) in this implementation, it will not work for lists inside your configuration (python 3.5, boltons 16.4.1)

If you have a list of values, remap will create a circular reference:

for example, add

    'zones': [{'a': 1}],

to overlay

and you will get the following output:

{0: {'a': 1},  <-----
 'endpoints': {'cache': {'host': '127.0.0.2', 'port': 8889},
               'persistence': {'host': '10.2.2.2', 'port': 5433}},
 'host': '127.0.0.1',
 'overlay_version': '5.0',
 'port': 8080,
 'zones': <Recursion on dict with id=140400054909704>} <-----
Owner

mahmoud commented Jun 15, 2016

@pleasantone, that is very strange. I updated the gist with the case I tried and the behavior I see. Really not sure how this could have happened.

Owner

mahmoud commented Jun 15, 2016

Oh wait, I see one difference. On Twitter, we'd said a dict with a list in it, but here we have a list with a dict in it. OK, let me try that.

Owner

mahmoud commented Jun 15, 2016

(I got the reproduction, will debug this evening.)

Owner

mahmoud commented Jun 15, 2016

Aaaand fixed. The description, process, etc. is all here in Boltons issue #81.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment