Skip to content

Instantly share code, notes, and snippets.

@mahmoud
Last active November 29, 2023 09:08
Show Gist options
  • Star 22 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save mahmoud/db02d16ac89fa401b968 to your computer and use it in GitHub Desktop.
Save mahmoud/db02d16ac89fa401b968 to your computer and use it in GitHub Desktop.
Recursively merging dictionaries with boltons.iterutils.remap. Useful for @hynek's configs. https://twitter.com/hynek/status/696720593002041345
"""
This is an extension of the technique first detailed here:
http://sedimental.org/remap.html#add_common_keys
In short, it calls remap on each container, back to front, using the accumulating
previous values as the default for the current iteration.
"""
from boltons.iterutils import remap, get_path, default_enter, default_visit
defaults = {'host': '127.0.0.1',
'port': 8000,
'endpoints': {'persistence': {'host': '127.0.0.1',
'port': 8888},
'cache': {'host': '127.0.0.1',
'port': 8889}},
'owners': {'secondary': ['alice']},
'zones': [{'a': 1}],
'notes': ['this is the default']}
overlay = {'host': '127.0.0.1',
'port': 8080,
'endpoints': {'persistence': {'host': '10.2.2.2',
'port': 5433}},
'overlay_version': '5.0',
'owners': {'primary': ['bob'], 'secondary': ['charles']},
'zones': [{'a': 2}],
'notes': ['this is the overlay']}
cache_host_override = {'endpoints': {'cache': {'host': '127.0.0.2'}}}
def remerge(target_list, sourced=False):
"""Takes a list of containers (e.g., dicts) and merges them using
boltons.iterutils.remap. Containers later in the list take
precedence (last-wins).
By default, returns a new, merged top-level container. With the
*sourced* option, `remerge` expects a list of (*name*, container*)
pairs, and will return a source map: a dictionary mapping between
path and the name of the container it came from.
"""
if not sourced:
target_list = [(id(t), t) for t in target_list]
ret = None
source_map = {}
def remerge_enter(path, key, value):
new_parent, new_items = default_enter(path, key, value)
if ret and not path and key is None:
new_parent = ret
try:
cur_val = get_path(ret, path + (key,))
except KeyError:
pass
else:
# TODO: type check?
new_parent = cur_val
if isinstance(value, list):
# lists are purely additive. See https://github.com/mahmoud/boltons/issues/81
new_parent.extend(value)
new_items = []
return new_parent, new_items
for t_name, target in target_list:
if sourced:
def remerge_visit(path, key, value):
source_map[path + (key,)] = t_name
return True
else:
remerge_visit = default_visit
ret = remap(target, enter=remerge_enter, visit=remerge_visit)
if not sourced:
return ret
return ret, source_map
def main():
from pprint import pprint
merged, source_map = remerge([('defaults', defaults),
('overlay', overlay),
('cache_host_override', cache_host_override)],
sourced=True)
assert merged['host'] == '127.0.0.1'
assert merged['port'] == 8080
assert merged['endpoints']['persistence']['host'] == '10.2.2.2'
assert merged['endpoints']['persistence']['port'] == 5433
assert merged['endpoints']['cache']['host'] == '127.0.0.2'
assert merged['endpoints']['cache']['port'] == 8889
assert merged['overlay_version'] == '5.0'
pprint(merged)
print
pprint(source_map)
print(len(source_map), 'paths')
if __name__ == '__main__':
main()
{'endpoints': {'cache': {'host': '127.0.0.2', 'port': 8889},
'persistence': {'host': '10.2.2.2', 'port': 5433}},
'host': '127.0.0.1',
'notes': ['this is the default', 'this is the overlay'],
'overlay_version': '5.0',
'owners': {'primary': ['bob'], 'secondary': ['alice', 'charles']},
'port': 8080,
'zones': [{'a': 1}, {'a': 2}]}
{('endpoints',): 'cache_host_override',
('endpoints', 'cache'): 'cache_host_override',
('endpoints', 'cache', 'host'): 'cache_host_override',
('endpoints', 'cache', 'port'): 'defaults',
('endpoints', 'persistence'): 'overlay',
('endpoints', 'persistence', 'host'): 'overlay',
('endpoints', 'persistence', 'port'): 'overlay',
('host',): 'overlay',
('notes',): 'overlay',
('overlay_version',): 'overlay',
('owners',): 'overlay',
('owners', 'primary'): 'overlay',
('owners', 'secondary'): 'overlay',
('port',): 'overlay',
('zones',): 'overlay'}
(15, 'paths')
@ankostis
Copy link

ankostis commented Apr 20, 2020

Thank you for this utility.

I believe that functions that change their return-type based on their inputs,
are hard to work with, even if it's only for debugging aid.
Also needed provenance information for merged lists,
so i did the following changes:

  • refact: feed the source_map when calling the function to be populated only if not None;
    now the return type is always the same.
  • enh: when merging lists, source_map lists all mergers (important when debugging).
  • optimize: do not to create a new remerge_visit() closure on each(!) container to merge, but decide it up-front.
  • optimize: don't recreate input container with dummy id() just to fit the remerge_visit() for when source_map is asked;
    actually don't even override default enter-function when nosource_map is asked, simply run a simpler loop.
  • refact: renamed target_list --> *containers, so as not to have to create the list-of-containers when calling it.
  • refact: combine trivial else code with try-except.
  • Doc: terse sphinx docstring with a doctested example.
  • Doc: explain that the input-dicts order is NOT preserved in the results when source_map is asked.
  • [edit:] dropped replace_lists argument (my apologies, just wasn't necessary to me, and copy pasted from my project),
def remerge(*containers, source_map: list = None):
    """
    Merge recursively dicts or lists with :func:`boltons.iterutils.remap()`.

    :param containers:
        a list of dicts or lists to merge; later ones take precedence
        (last-wins).
        If `source_map` is given, these must be 2-tuples of ``(name: container)``.
    :param source_map:
        If given, it must be a dictionary, and `containers` arg must be 2-tuples
        like ``(name: container)``.
        The `source_map` will be populated with mappings between path and the name
        of the container it came from.

        .. Warning::
            if source_map given, the order of input dictionaries is NOT preserved
            is the results  (important if your code rely on PY3.7 stable dictionaries).

    :return:
        returns a new, merged top-level container.

    - Adapted from https://gist.github.com/mahmoud/db02d16ac89fa401b968
    - Discusson in: https://gist.github.com/pleasantone/c99671172d95c3c18ed90dc5435ddd57


    **Example**

    >>> defaults = {
    ...     'subdict': {
    ...         'as_is': 'hi',
    ...         'overridden_key1': 'value_from_defaults',
    ...         'overridden_key1': 2222,
    ...         'merged_list': ['hi', {'untouched_subdict': 'v1'}],
    ...     }
    ... }

    >>> overrides = {
    ...     'subdict': {
    ...         'overridden_key1': 'overridden value',
    ...         'overridden_key2': 5555,
    ...         'merged_list': ['there'],
    ...     }
    ... }


    >>> source_map = {}
    >>> remerge(
    ...     ("defaults", defaults),
    ...     ("overrides", overrides),
    ...     source_map=source_map)
     {'subdict': {'as_is': 'hi',
                  'overridden_key1': 'overridden value',
                  'merged_list': ['hi', {'untouched_subdict': 'v1'}, 'there'],
                  'overridden_key2': 5555}}
    >>> source_map
    {('subdict', 'as_is'): 'defaults',
     ('subdict', 'overridden_key1'): 'overrides',
     ('subdict', 'merged_list'):  ['defaults', 'overrides'],
     ('subdict',): 'overrides',
     ('subdict', 'overridden_key2'): 'overrides'}
    """

    ret = None

    def remerge_enter(path, key, value):
        new_parent, new_items = default_enter(path, key, value)
        if ret and not path and key is None:
            new_parent = ret
        try:
            # TODO: type check?
            new_parent = get_path(ret, path + (key,))
        except KeyError:
            pass

        if isinstance(value, list):
            # lists are purely additive. See https://github.com/mahmoud/boltons/issues/81
            new_parent.extend(value)
            new_items = []

        return new_parent, new_items

    if source_map is not None:

        def remerge_visit(path, key, value):
            full_path = path + (key,)
            if isinstance(value, list):
                old = source_map.get(full_path)
                if old:
                    old.append(t_name)
                else:
                    source_map[full_path] = [t_name]
            else:
                source_map[full_path] = t_name
            return True

        for t_name, cont in containers:
            ret = remap(cont, enter=remerge_enter, visit=remerge_visit)
    else:
        for cont in containers:
            ret = remap(cont, enter=remerge_enter)


        ret = remap(cont, enter=remerge_enter, visit=remerge_visit)

    return ret

Unit-tests (without replace-list :-()

# coding: utf-8

# Standard Library
from pprint import pprint

# Gitlab Project Configurator Modules
# from gpc.helpers.remerge import remerge


def test_override_string():
    defaults = {"key_to_override": "value_from_defaults"}

    first_override = {"key_to_override": "value_from_first_override"}

    source_map = {}
    merged = remerge(
        ("defaults", defaults),
        ("first_override", first_override),
        source_map=source_map,
    )

    expected_merged = {"key_to_override": "value_from_first_override"}
    assert merged == expected_merged
    assert source_map == {("key_to_override",): "first_override"}

    merged = remerge(defaults, first_override, source_map=None)
    assert merged == expected_merged


def test_override_subdict():
    defaults = {
        "subdict": {
            "other_subdict": {
                "key_to_override": "value_from_defaults",
                "integer_to_override": 2222,
            }
        }
    }

    first_override = {
        "subdict": {
            "other_subdict": {
                "key_to_override": "value_from_first_override",
                "integer_to_override": 5555,
            }
        }
    }

    expected_merge = {
        "subdict": {
            "other_subdict": {
                "integer_to_override": 5555,
                "key_to_override": "value_from_first_override",
            }
        }
    }

    source_map = {}
    merged = remerge(
        ("defaults", defaults),
        ("first_override", first_override),
        source_map=source_map,
    )
    assert merged == expected_merge
    assert source_map == {
        ("subdict",): "first_override",
        ("subdict", "other_subdict"): "first_override",
        ("subdict", "other_subdict", "integer_to_override"): "first_override",
        ("subdict", "other_subdict", "key_to_override"): "first_override",
    }

    merged = remerge(defaults, first_override, source_map=None)
    assert merged == expected_merge


def test_override_list_append():
    defaults = {"list_to_append": [{"a": 1}]}
    first_override = {"list_to_append": [{"b": 1}]}

    source_map = {}
    merged = remerge(
        ("defaults", defaults),
        ("first_override", first_override),
        source_map=source_map,
    )
    expected_merged = {"list_to_append": [{"a": 1}, {"b": 1}]}

    assert merged == expected_merged
    assert source_map == {("list_to_append",): "first_override"}

    merged = remerge(defaults, first_override, source_map=None)
    assert merged == expected_merged


def test_complex_dict():
    defaults = {
        "key_to_override": "value_from_defaults",
        "integer_to_override": 1111,
        "list_to_append": [{"a": 1}],
        "subdict": {
            "other_subdict": {
                "key_to_override": "value_from_defaults",
                "integer_to_override": 2222,
            },
            "second_subdict": {
                "key_to_override": "value_from_defaults",
                "integer_to_override": 3333,
            },
        },
    }

    first_override = {
        "key_to_override": "value_from_first_override",
        "integer_to_override": 4444,
        "list_to_append": [{"b": 2}],
        "subdict": {
            "other_subdict": {
                "key_to_override": "value_from_first_override",
                "integer_to_override": 5555,
            }
        },
        "added_in_first_override": "some_string",
    }

    second_override = {
        "subdict": {"second_subdict": {"key_to_override": "value_from_second_override"}}
    }

    source_map = {}
    merged = remerge(
        ("defaults", defaults),
        ("first_override", first_override),
        ("second_override", second_override),
        source_map=source_map,
    )
    print("")
    print("'merged' dictionary:")
    pprint(merged)
    print("")
    pprint(source_map)
    print(len(source_map), "paths")

    assert merged["key_to_override"] == "value_from_first_override"
    assert merged["integer_to_override"] == 4444
    assert (
        merged["subdict"]["other_subdict"]["key_to_override"]
        == "value_from_first_override"
    )
    assert merged["subdict"]["other_subdict"]["integer_to_override"] == 5555
    assert (
        merged["subdict"]["second_subdict"]["key_to_override"]
        == "value_from_second_override"
    )
    assert merged["subdict"]["second_subdict"]["integer_to_override"] == 3333
    assert merged["added_in_first_override"] == "some_string"
    assert merged["list_to_append"] == [{"a": 1}, {"b": 2}]

@mahmoud
Copy link
Author

mahmoud commented Apr 21, 2020

@ankostis Cool! Glad this old gem continues to provide utility. Thanks for sharing :)

@ankostis
Copy link

ankostis commented Apr 29, 2020

Realized the code was a bit inefficient, so i did these two changes on the code above:

  • optimize: do not to create a new remerge_visit() closure on each(!) container to merge, but decide it up-front.
  • optimize: don't recreate input container with dummy id() just to fit the remerge_visit() for when source_map is asked;

I kept the edited code it in the same comment, above, for future reference.

[edit:] bu i'm still bugged by the handling of None when extending lists :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment