Create a gist now

Instantly share code, notes, and snippets.

@mahmoud /remerge.py
Last active Apr 11, 2017

What would you like to do?
Recursively merging dictionaries with boltons.iterutils.remap. Useful for @hynek's configs. https://twitter.com/hynek/status/696720593002041345
"""
This is an extension of the technique first detailed here:
http://sedimental.org/remap.html#add_common_keys
In short, it calls remap on each container, back to front, using the accumulating
previous values as the default for the current iteration.
"""
from boltons.iterutils import remap, get_path, default_enter, default_visit
defaults = {'host': '127.0.0.1',
'port': 8000,
'endpoints': {'persistence': {'host': '127.0.0.1',
'port': 8888},
'cache': {'host': '127.0.0.1',
'port': 8889}},
'owners': {'secondary': ['alice']},
'zones': [{'a': 1}],
'notes': ['this is the default']}
overlay = {'host': '127.0.0.1',
'port': 8080,
'endpoints': {'persistence': {'host': '10.2.2.2',
'port': 5433}},
'overlay_version': '5.0',
'owners': {'primary': ['bob'], 'secondary': ['charles']},
'zones': [{'a': 2}],
'notes': ['this is the overlay']}
cache_host_override = {'endpoints': {'cache': {'host': '127.0.0.2'}}}
def remerge(target_list, sourced=False):
"""Takes a list of containers (e.g., dicts) and merges them using
boltons.iterutils.remap. Containers later in the list take
precedence (last-wins).
By default, returns a new, merged top-level container. With the
*sourced* option, `remerge` expects a list of (*name*, container*)
pairs, and will return a source map: a dictionary mapping between
path and the name of the container it came from.
"""
if not sourced:
target_list = [(id(t), t) for t in target_list]
ret = None
source_map = {}
def remerge_enter(path, key, value):
new_parent, new_items = default_enter(path, key, value)
if ret and not path and key is None:
new_parent = ret
try:
cur_val = get_path(ret, path + (key,))
except KeyError:
pass
else:
# TODO: type check?
new_parent = cur_val
if isinstance(value, list):
# lists are purely additive. See https://github.com/mahmoud/boltons/issues/81
new_parent.extend(value)
new_items = []
return new_parent, new_items
for t_name, target in target_list:
if sourced:
def remerge_visit(path, key, value):
source_map[path + (key,)] = t_name
return True
else:
remerge_visit = default_visit
ret = remap(target, enter=remerge_enter, visit=remerge_visit)
if not sourced:
return ret
return ret, source_map
def main():
from pprint import pprint
merged, source_map = remerge([('defaults', defaults),
('overlay', overlay),
('cache_host_override', cache_host_override)],
sourced=True)
assert merged['host'] == '127.0.0.1'
assert merged['port'] == 8080
assert merged['endpoints']['persistence']['host'] == '10.2.2.2'
assert merged['endpoints']['persistence']['port'] == 5433
assert merged['endpoints']['cache']['host'] == '127.0.0.2'
assert merged['endpoints']['cache']['port'] == 8889
assert merged['overlay_version'] == '5.0'
pprint(merged)
print
pprint(source_map)
print(len(source_map), 'paths')
if __name__ == '__main__':
main()
{'endpoints': {'cache': {'host': '127.0.0.2', 'port': 8889},
'persistence': {'host': '10.2.2.2', 'port': 5433}},
'host': '127.0.0.1',
'notes': ['this is the default', 'this is the overlay'],
'overlay_version': '5.0',
'owners': {'primary': ['bob'], 'secondary': ['alice', 'charles']},
'port': 8080,
'zones': [{'a': 1}, {'a': 2}]}
{('endpoints',): 'cache_host_override',
('endpoints', 'cache'): 'cache_host_override',
('endpoints', 'cache', 'host'): 'cache_host_override',
('endpoints', 'cache', 'port'): 'defaults',
('endpoints', 'persistence'): 'overlay',
('endpoints', 'persistence', 'host'): 'overlay',
('endpoints', 'persistence', 'port'): 'overlay',
('host',): 'overlay',
('notes',): 'overlay',
('overlay_version',): 'overlay',
('owners',): 'overlay',
('owners', 'primary'): 'overlay',
('owners', 'secondary'): 'overlay',
('port',): 'overlay',
('zones',): 'overlay'}
(15, 'paths')
@pleasantone
pleasantone commented Jun 15, 2016 edited

There is a bug (limitation) in this implementation, it will not work for lists inside your configuration (python 3.5, boltons 16.4.1)

If you have a list of values, remap will create a circular reference:

for example, add

    'zones': [{'a': 1}],

to overlay

and you will get the following output:

{0: {'a': 1},  <-----
 'endpoints': {'cache': {'host': '127.0.0.2', 'port': 8889},
               'persistence': {'host': '10.2.2.2', 'port': 5433}},
 'host': '127.0.0.1',
 'overlay_version': '5.0',
 'port': 8080,
 'zones': <Recursion on dict with id=140400054909704>} <-----
@mahmoud
Owner
mahmoud commented Jun 15, 2016

@pleasantone, that is very strange. I updated the gist with the case I tried and the behavior I see. Really not sure how this could have happened.

@mahmoud
Owner
mahmoud commented Jun 15, 2016

Oh wait, I see one difference. On Twitter, we'd said a dict with a list in it, but here we have a list with a dict in it. OK, let me try that.

@mahmoud
Owner
mahmoud commented Jun 15, 2016

(I got the reproduction, will debug this evening.)

@mahmoud
Owner
mahmoud commented Jun 15, 2016

Aaaand fixed. The description, process, etc. is all here in Boltons issue #81.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment