Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Recursively merging dictionaries with boltons.iterutils.remap. Useful for @hynek's configs. https://twitter.com/hynek/status/696720593002041345
"""
This is an extension of the technique first detailed here:
http://sedimental.org/remap.html#add_common_keys
In short, it calls remap on each container, back to front, using the accumulating
previous values as the default for the current iteration.
"""
from boltons.iterutils import remap, get_path, default_enter, default_visit
defaults = {'host': '127.0.0.1',
'port': 8000,
'endpoints': {'persistence': {'host': '127.0.0.1',
'port': 8888},
'cache': {'host': '127.0.0.1',
'port': 8889}},
'owners': {'secondary': ['alice']},
'zones': [{'a': 1}],
'notes': ['this is the default']}
overlay = {'host': '127.0.0.1',
'port': 8080,
'endpoints': {'persistence': {'host': '10.2.2.2',
'port': 5433}},
'overlay_version': '5.0',
'owners': {'primary': ['bob'], 'secondary': ['charles']},
'zones': [{'a': 2}],
'notes': ['this is the overlay']}
cache_host_override = {'endpoints': {'cache': {'host': '127.0.0.2'}}}
def remerge(target_list, sourced=False):
"""Takes a list of containers (e.g., dicts) and merges them using
boltons.iterutils.remap. Containers later in the list take
precedence (last-wins).
By default, returns a new, merged top-level container. With the
*sourced* option, `remerge` expects a list of (*name*, container*)
pairs, and will return a source map: a dictionary mapping between
path and the name of the container it came from.
"""
if not sourced:
target_list = [(id(t), t) for t in target_list]
ret = None
source_map = {}
def remerge_enter(path, key, value):
new_parent, new_items = default_enter(path, key, value)
if ret and not path and key is None:
new_parent = ret
try:
cur_val = get_path(ret, path + (key,))
except KeyError:
pass
else:
# TODO: type check?
new_parent = cur_val
if isinstance(value, list):
# lists are purely additive. See https://github.com/mahmoud/boltons/issues/81
new_parent.extend(value)
new_items = []
return new_parent, new_items
for t_name, target in target_list:
if sourced:
def remerge_visit(path, key, value):
source_map[path + (key,)] = t_name
return True
else:
remerge_visit = default_visit
ret = remap(target, enter=remerge_enter, visit=remerge_visit)
if not sourced:
return ret
return ret, source_map
def main():
from pprint import pprint
merged, source_map = remerge([('defaults', defaults),
('overlay', overlay),
('cache_host_override', cache_host_override)],
sourced=True)
assert merged['host'] == '127.0.0.1'
assert merged['port'] == 8080
assert merged['endpoints']['persistence']['host'] == '10.2.2.2'
assert merged['endpoints']['persistence']['port'] == 5433
assert merged['endpoints']['cache']['host'] == '127.0.0.2'
assert merged['endpoints']['cache']['port'] == 8889
assert merged['overlay_version'] == '5.0'
pprint(merged)
print
pprint(source_map)
print(len(source_map), 'paths')
if __name__ == '__main__':
main()
{'endpoints': {'cache': {'host': '127.0.0.2', 'port': 8889},
'persistence': {'host': '10.2.2.2', 'port': 5433}},
'host': '127.0.0.1',
'notes': ['this is the default', 'this is the overlay'],
'overlay_version': '5.0',
'owners': {'primary': ['bob'], 'secondary': ['alice', 'charles']},
'port': 8080,
'zones': [{'a': 1}, {'a': 2}]}
{('endpoints',): 'cache_host_override',
('endpoints', 'cache'): 'cache_host_override',
('endpoints', 'cache', 'host'): 'cache_host_override',
('endpoints', 'cache', 'port'): 'defaults',
('endpoints', 'persistence'): 'overlay',
('endpoints', 'persistence', 'host'): 'overlay',
('endpoints', 'persistence', 'port'): 'overlay',
('host',): 'overlay',
('notes',): 'overlay',
('overlay_version',): 'overlay',
('owners',): 'overlay',
('owners', 'primary'): 'overlay',
('owners', 'secondary'): 'overlay',
('port',): 'overlay',
('zones',): 'overlay'}
(15, 'paths')
@pleasantone

This comment has been minimized.

Copy link

commented Jun 15, 2016

There is a bug (limitation) in this implementation, it will not work for lists inside your configuration (python 3.5, boltons 16.4.1)

If you have a list of values, remap will create a circular reference:

for example, add

    'zones': [{'a': 1}],

to overlay

and you will get the following output:

{0: {'a': 1},  <-----
 'endpoints': {'cache': {'host': '127.0.0.2', 'port': 8889},
               'persistence': {'host': '10.2.2.2', 'port': 5433}},
 'host': '127.0.0.1',
 'overlay_version': '5.0',
 'port': 8080,
 'zones': <Recursion on dict with id=140400054909704>} <-----
@mahmoud

This comment has been minimized.

Copy link
Owner Author

commented Jun 15, 2016

@pleasantone, that is very strange. I updated the gist with the case I tried and the behavior I see. Really not sure how this could have happened.

@mahmoud

This comment has been minimized.

Copy link
Owner Author

commented Jun 15, 2016

Oh wait, I see one difference. On Twitter, we'd said a dict with a list in it, but here we have a list with a dict in it. OK, let me try that.

@mahmoud

This comment has been minimized.

Copy link
Owner Author

commented Jun 15, 2016

(I got the reproduction, will debug this evening.)

@mahmoud

This comment has been minimized.

Copy link
Owner Author

commented Jun 15, 2016

Aaaand fixed. The description, process, etc. is all here in Boltons issue #81.

@rca

This comment has been minimized.

Copy link

commented Dec 16, 2017

This is great, thank you!

@gsemet

This comment has been minimized.

Copy link

commented Apr 9, 2019

Hi. This gist seems to work fine. It is possible to get in in boltons.dictutils?

Here my humble contribution (merge list flag + unit tests):

# Third Party Libraries
from boltons.iterutils import default_enter
from boltons.iterutils import default_visit
from boltons.iterutils import get_path
from boltons.iterutils import remap
from structlog import get_logger

log = get_logger()

__all__ = ["remerge"]


def remerge(target_list, sourced=False, replace_lists=False):  # noqa: C901
    """Merge a list of dicts.

    Takes a list of containers (e.g., dicts) and merges them using
    boltons.iterutils.remap. Containers later in the list take
    precedence (last-wins).

    By default (``replace_lists=False``), items with the "list" type are not
    replaced but items are appended. Setting ``replace_lists==True`` means
    lists content are replaced when overriden.

    By default, returns a new, merged top-level container.

    With the *sourced* option, `remerge` expects a list of (*name*, container*)
    pairs, and will return a source map: a dictionary mapping between
    path and the name of the container it came from.

    Example:

    .. code-block:: python

        merged, source_map = remerge([('defaults', defaults),
                                      ('overlay', overlay),
                                      ('cache_host_override', cache_host_override),
                                     ],
                                     sourced=True)
    """
    # Discusson in :
    # https://gist.github.com/pleasantone/c99671172d95c3c18ed90dc5435ddd57
    # Final gist in:
    # https://gist.github.com/mahmoud/db02d16ac89fa401b968

    if not sourced:
        target_list = [(id(t), t) for t in target_list]

    ret = None
    source_map = {}

    def remerge_enter(path, key, value):
        new_parent, new_items = default_enter(path, key, value)
        if ret and not path and key is None:
            new_parent = ret
        try:
            cur_val = get_path(ret, path + (key, ))
        except KeyError:
            pass
        else:
            # TODO: type check?
            new_parent = cur_val

        if isinstance(value, list):
            if replace_lists:
                new_parent = value
            else:
                # lists are purely additive. See https://github.com/mahmoud/boltons/issues/81
                new_parent.extend(value)
            new_items = []

        return new_parent, new_items

    for t_name, target in target_list:
        if sourced:

            def remerge_visit(path, key, _value):
                source_map[path + (key, )] = t_name  # pylint: disable=cell-var-from-loop
                return True
        else:
            remerge_visit = default_visit

        ret = remap(target, enter=remerge_enter, visit=remerge_visit)

    if not sourced:
        return ret
    return ret, source_map

Unit test:

# coding: utf-8

# Standard Library
from pprint import pprint

# Gitlab Project Configurator Modules
from gpc.helpers.remerge import remerge


def test_override_string():
    defaults = {'key_to_override': 'value_from_defaults'}

    first_override = {'key_to_override': 'value_from_first_override'}

    merged, source_map = remerge([('defaults', defaults),
                                  ('first_override', first_override),
                                  ],
                                 sourced=True)

    expected_merged = {'key_to_override': 'value_from_first_override'}
    assert merged == expected_merged
    assert source_map == {('key_to_override', ): 'first_override'}

    merged = remerge([defaults, first_override], sourced=False)
    assert merged == expected_merged


def test_override_subdict():
    defaults = {
        'subdict': {
            'other_subdict': {
                'key_to_override': 'value_from_defaults',
                'integer_to_override': 2222
            }
        }
    }

    first_override = {
        'subdict': {
            'other_subdict': {
                'key_to_override': 'value_from_first_override',
                'integer_to_override': 5555
            }
        }
    }

    expected_merge = {
        'subdict': {
            'other_subdict': {
                'integer_to_override': 5555,
                'key_to_override': 'value_from_first_override'
            }
        }
    }

    merged, source_map = remerge([('defaults', defaults),
                                  ('first_override', first_override),
                                  ],
                                 sourced=True)
    assert merged == expected_merge
    assert source_map == {
        ('subdict',
         ): 'first_override',
        ('subdict',
         'other_subdict'): 'first_override',
        ('subdict',
         'other_subdict',
         'integer_to_override'): 'first_override',
        ('subdict',
         'other_subdict',
         'key_to_override'): 'first_override'
    }

    merged = remerge([defaults, first_override], sourced=False)
    assert merged == expected_merge


def test_override_list_append():
    defaults = {'list_to_append': [{'a': 1}]}
    first_override = {'list_to_append': [{'b': 1}]}

    merged, source_map = remerge([('defaults', defaults),
                                  ('first_override', first_override),
                                  ],
                                 sourced=True)
    expected_merged = {'list_to_append': [{'a': 1}, {'b': 1}]}

    assert merged == expected_merged
    assert source_map == {('list_to_append', ): 'first_override'}

    merged = remerge([defaults, first_override], sourced=False)
    assert merged == expected_merged


def test_override_list_replace():
    defaults = {'list_to_replace': [{'a': 1}]}
    first_override = {'list_to_replace': [{'b': 1}]}

    merged, source_map = remerge([('defaults', defaults),
                                  ('first_override', first_override),
                                  ],
                                 sourced=True, replace_lists=True)
    expected_merged = {'list_to_replace': [{'b': 1}]}

    assert merged == expected_merged
    assert source_map == {('list_to_replace', ): 'first_override'}

    merged = remerge([defaults, first_override], sourced=False, replace_lists=True)
    assert merged == expected_merged


def test_complex_dict():
    defaults = {
        'key_to_override': 'value_from_defaults',
        'integer_to_override': 1111,
        'list_to_append': [{
            'a': 1
        }],
        'subdict': {
            'other_subdict': {
                'key_to_override': 'value_from_defaults',
                'integer_to_override': 2222
            },
            'second_subdict': {
                'key_to_override': 'value_from_defaults',
                'integer_to_override': 3333
            }
        }
    }

    first_override = {
        'key_to_override': 'value_from_first_override',
        'integer_to_override': 4444,
        'list_to_append': [{
            'b': 2
        }],
        'subdict': {
            'other_subdict': {
                'key_to_override': 'value_from_first_override',
                'integer_to_override': 5555
            }
        },
        'added_in_first_override': 'some_string'
    }

    second_override = {
        'subdict': {
            'second_subdict': {
                'key_to_override': 'value_from_second_override'
            }
        }
    }

    merged, source_map = remerge([('defaults', defaults),
                                  ('first_override', first_override),
                                  ('second_override', second_override),
                                  ],
                                 sourced=True)
    print("")
    print("'merged' dictionary:")
    pprint(merged)
    print("")
    pprint(source_map)
    print(len(source_map), 'paths')

    assert merged['key_to_override'] == 'value_from_first_override'
    assert merged['integer_to_override'] == 4444
    assert merged['subdict']['other_subdict']['key_to_override'] == 'value_from_first_override'
    assert merged['subdict']['other_subdict']['integer_to_override'] == 5555
    assert merged['subdict']['second_subdict']['key_to_override'] == 'value_from_second_override'
    assert merged['subdict']['second_subdict']['integer_to_override'] == 3333
    assert merged['added_in_first_override'] == 'some_string'
    assert merged["list_to_append"] == [{'a': 1}, {'b': 2}]
@JoanEliot

This comment has been minimized.

Copy link

commented Aug 19, 2019

Thank you very much for this very helpful gift. I'm out of my depth with respect to your code above but I did bolt on remerge to the cranky machine I'm making with python. My results were great until I passed it, as the first argument, a dictionary with keys of None type. That caused a breakdown. To get the thing running again I only had to create empty dictionaries for those keys first, but I thought you'd like to know that keys having value of None may need attention.

My problem dictionaries looked something like this:

  1. {'info': None, 'settings': None}
  2. {'info': {'measures': 3, 'clef': 'Treble'}, 'settings': {'format':{....}, 'processing': {....}}}
    .
    ..but I'm not positive I had more than one level of nesting in the second dictionary, short on sleep.
@mahmoud

This comment has been minimized.

Copy link
Owner Author

commented Sep 3, 2019

@gsemet, I wouldn't mind having it in boltons (though probably in iterutils, just for ease of dependence), we could continue the review process there if you want to prepare a PR.

@JoanEliot, that's true, you need the structures to roughly match. Maybe it makes sense for Nones to be overridden, but by that same token, it might make sense to preprocess one side to remove Nones? I could go either way. I'm glad you got it to work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.