Skip to content

Instantly share code, notes, and snippets.

@hrldcpr
Last active September 1, 2024 07:04
Show Gist options
  • Save hrldcpr/2012250 to your computer and use it in GitHub Desktop.
Save hrldcpr/2012250 to your computer and use it in GitHub Desktop.
one-line tree in python

One-line Tree in Python

Using Python's built-in defaultdict we can easily define a tree data structure:

def tree(): return defaultdict(tree)

That's it!

It simply says that a tree is a dict whose default values are trees.

(If you're following along at home, make sure to from collections import defaultdict)

(Also: Hacker News reader @zbuc points out that this is called autovivification. Cool!)

Examples

JSON-esque

Now we can create JSON-esque nested dictionaries without explicitly creating sub-dictionaries—they magically come into existence as we reference them:

users = tree()
users['harold']['username'] = 'hrldcpr'
users['handler']['username'] = 'matthandlersux'

We can print this as json with print(json.dumps(users)) and we get the expected:

{"harold": {"username": "hrldcpr"}, "handler": {"username": "matthandlersux"}}

Without assignment

We can even create structure with no assignment at all, since merely referencing an entry creates it:

taxonomy = tree()
taxonomy['Animalia']['Chordata']['Mammalia']['Carnivora']['Felidae']['Felis']['cat']
taxonomy['Animalia']['Chordata']['Mammalia']['Carnivora']['Felidae']['Panthera']['lion']
taxonomy['Animalia']['Chordata']['Mammalia']['Carnivora']['Canidae']['Canis']['dog']
taxonomy['Animalia']['Chordata']['Mammalia']['Carnivora']['Canidae']['Canis']['coyote']
taxonomy['Plantae']['Solanales']['Solanaceae']['Solanum']['tomato']
taxonomy['Plantae']['Solanales']['Solanaceae']['Solanum']['potato']
taxonomy['Plantae']['Solanales']['Convolvulaceae']['Ipomoea']['sweet potato']

We'll prettyprint this time, which requires us to convert to standard dicts first:

def dicts(t): return {k: dicts(t[k]) for k in t}

Now we can prettyprint the structure with pprint(dicts(taxonomy)):

{'Animalia': {'Chordata': {'Mammalia': {'Carnivora': {'Canidae': {'Canis': {'coyote': {},
                                                                            'dog': {}}},
                                                      'Felidae': {'Felis': {'cat': {}},
                                                                  'Panthera': {'lion': {}}}}}}},
 'Plantae': {'Solanales': {'Convolvulaceae': {'Ipomoea': {'sweet potato': {}}},
                           'Solanaceae': {'Solanum': {'potato': {},
                                                      'tomato': {}}}}}}

So the substructures we referenced now exist as dicts, with empty dicts at the leaves.

Iteration

This tree can be fun to iteratively walk through, again because structure comes into being simply by referring to it.

For example, suppose we are parsing a list of new animals to add to our taxonomy above, so we want to call a function like:

add(taxonomy,
    'Animalia>Chordata>Mammalia>Cetacea>Balaenopteridae>Balaenoptera>blue whale'.split('>'))

We can implement this simply as:

def add(t, path):
  for node in path:
    t = t[node]

Again we are never assigning to the dictionary, but just by referencing the keys we have created our new structure:

{'Animalia': {'Chordata': {'Mammalia': {'Carnivora': {'Canidae': {'Canis': {'coyote': {},
                                                                            'dog': {}}},
                                                      'Felidae': {'Felis': {'cat': {}},
                                                                  'Panthera': {'lion': {}}}},
                                        'Cetacea': {'Balaenopteridae': {'Balaenoptera': {'blue whale': {}}}}}}},
 'Plantae': {'Solanales': {'Convolvulaceae': {'Ipomoea': {'sweet potato': {}}},
                           'Solanaceae': {'Solanum': {'potato': {},
                                                      'tomato': {}}}}}}

Conclusion

This probably isn't very useful, and it makes for some perplexing code (see add() above).

But if you like Python then I hope this was fun to think about or worthwhile to understand.

There was a good discussion of this gist on Hacker News.

@wolph
Copy link

wolph commented Apr 24, 2012

JakubOboza: a B-tree is 1-dimensional from the users perspective, this object is multi-dimensional from the users perspective. They are 2 completely different things.

@JakubOboza
Copy link

@wolph now you talk about syntax :) i'm not hater or something i just pointed out that i like this in form of hack but i don't like when people use this type of hacks in real software :)

If Python would be like Ocaml or Scheme you could implement multi dimensional syntax also :) So i don't wanna talk about syntax :>

Cheers!

@wolph
Copy link

wolph commented Apr 24, 2012

@JakubOboza it is not about syntax, it is about a completely different data structure and matching usage.

But please do amuse me a little here, what kind of data structure would you use to store recursive objects which don't require sorting? I am genuinely interested, I think this structure is fairly optimal for that usage pattern.

@aJanuary
Copy link

@JakubOboza You're talking about searching the tree, while @wolph was talking about walking it.

Walking from root to a particular node without use of an auxiliary data structure will take O(m) (where m is the distance from root to node) no matter how the tree is sorted.

If you want to find a node that meets a particular condition you can sort the tree in such a way as to not need to walk the entire tree (worst case O(n) where n is the number of nodes).

Copy link

ghost commented Apr 24, 2012

I recently implemented a similar dict inside of dict data structure for a simple search program. @JakubOboza is correct, it was a memory disaster.

@crazy4groovy
Copy link

Nice inspiration for a similar Groovy implementation at: https://gist.github.com/2478499

@jul
Copy link

jul commented Apr 25, 2012

@JakubOboza : abstract data structures are mainly APIs with properties (index, keys, values, parent, next .... whatever). And API is a view on an implementation.
If you badly need a property and it worths it, then API matters.

Complexity is mostly an implementation problem,which performance will vary according to your real case of use. With python's duck typing and ctypes, performance will mostly not lie in the API but in the implementation.

For instance matrix are either 1D array accessed by [ index % line_length + int(index/line_length * line_length) ] or a 2D array, or a dict (sparse matrix). Performance is (mostly) not a matter of data API but a matter of use case and implementation. o(n) are only best case implementation. Life is not always best case, even with well known data structures.

So, I still do think that reasoning on best case complexity instead of API is a premature optimisation.

API matters, since what you mean matters. If you need, it worths it. Complexity is just an implementation detail, which is a concern for 0.01% of real developpers.

@markwatson
Copy link

Here's some code I'm using for pretty printing without json in older versions of python (that don't have dictionary comprehensions):

def dicts(t):
    try:
        return dict((k, dicts(t[k])) for k in t)
    except TypeError:
        return t

@westurner
Copy link

You can add a field delimiter and 'flatten' the keys if you would like to avoid the additional lookup overhead.

e.g. instead of data['one']['two']['three'], it's just data['one.two.three'], or data[('one','two','three')]

@JakubOboza : a trie would be faster. There are many implementations of tries for Python.

@westurner
Copy link

There exist implementations of DefaultOrderedDict : http://stackoverflow.com/a/6190500

@westurner
Copy link

This works well for serializing to Unicode:

json.dumps(obj, indent=2)

A custom JSONEncoder with support for ISO datetimes can be helpful: http://stackoverflow.com/a/2680060

@domenkozar
Copy link

Exactly the trick that I use in mr.bob (http://mrbob.readthedocs.org/en/latest/) to make namespace of keys :)

@bj0
Copy link

bj0 commented Feb 18, 2013

Adding a simple subclass:

class mydefaultdict(defaultdict):
    def __getattr__(self, attr):
        return self[attr]
    def __setattr__(self, attr, val):
        self[attr] = val

def tree(): return mydefaultdict(tree)

will turn:

taxonomy['Animalia']['Chordata']['Mammalia']['Carnivora']['Felidae']['Felis']['cat']

into:

taxonomy.Animalia.Chordata.Mammalia.Carnivora.Felidae.Felis.cat

@dgagnon
Copy link

dgagnon commented Nov 1, 2013

You can use the same trick to edit the files as well:

def add(t, tkeys, value):
  i = 0
  keys = tkeys.split('.')
  for key in keys:
    i = i + 1
    if i == len(keys) and value != None:
      t[key] = value
    t = t[key]

@ashumeow
Copy link

ashumeow commented Nov 5, 2013

Good! =)

@metaperl
Copy link

Very cool. I tend to resort to objects and methods for everything. And you did this in 1-2 lines.

@gVallverdu
Copy link

Nice solution !
I wrote that lines to print the tree :

def ptr(t, depth = 0):
    """ print a tree """
    for k in t.keys():
        print("%s %2d %s" % ("".join(depth * ["    "]), depth, k))
        depth += 1
        ptratree(t[k], depth)
        depth -= 1

using your example it gives :

  0 Animalia
      1 Chordata
          2 Mammalia
              3 Carnivora
                  4 Canidae
                      5 Canis
                          6 coyote
                          6 dog
                  4 Felidae
                      5 Felis
                          6 cat
                      5 Panthera
                          6 lion
  0 Plantae
      1 Solanales
          2 Convolvulaceae
              3 Ipomoea
                  4 sweet potato
          2 Solanaceae
              3 Solanum
                  4 tomato
                  4 potato

@drasko
Copy link

drasko commented May 18, 2014

In case someone needs it off-the-shelf, code to create JSON representation of the file tree using this technique:

import os
import collections
import json

def tree(): return collections.defaultdict(tree)

def add(t, keys):
  for key in keys:
    t = t[key]

s = tree()

# Set the directory you want to start from
path = 'path/to/directory'

for root, dirs, files in os.walk(path):
    for f in files:
        l = root.split('/')
        l.append(f)

        add(s, l)

print(json.dumps(s, indent=4, sort_keys=True))

@kzinglzy
Copy link

kzinglzy commented Jun 7, 2014

Wow! It is amazing. Thanks

@period331
Copy link

amazing. Thanks!

@adugin
Copy link

adugin commented Dec 26, 2014

def tree(levels=0, func=None):
    if levels > 0 and func:
        return defaultdict(reduce(lambda f, i: lambda: defaultdict(f), xrange(levels-1), func))
    else:
        return defaultdict(tree)

Usage:

>>> d = tree()
>>> d[1][2][3][4][5] = 6
>>> d = tree(3, float)
>>> d
defaultdict(<function <lambda> at 0x0280E3F0>, {})
>>> d[1][2][3]
0.0
>>> d[7][8][9] += 10
>>> d[7][8][9]
10.0
>>> d = tree(0, int)
>>> d
defaultdict(<function tree at 0x0280C830>, {})
>>> d = tree(1, int)
>>> d
defaultdict(<type 'int'>, {})
>>> d[1]
0
>>> 

@adugin
Copy link

adugin commented Dec 30, 2014

def tree(func=lambda: tree(), depth=0):
    return defaultdict(reduce(lambda f, i: lambda: defaultdict(f), xrange(depth-1), func))

@uralbash
Copy link

Same with ordered tree:

from collections import OrderedDict, defaultdict

def tree():
    """
    Autovivication ordered hash
    """
    return OrderedDict(defaultdict(tree))

@n3h3m
Copy link

n3h3m commented May 17, 2016

I guess this attracts more appreciation than it deserves. In the reality it will break for a very basic case.

>>> from collections import defaultdict
>>> def tree(): return defaultdict(tree)
...
>>> t = tree()
>>> t['a']['b']['c'] = 'hello'
>>> t['a']['b']['c']['d'] = 'world'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
>>>

I believe I don't have to explain why it fails.

@ionFreeman
Copy link

ionFreeman commented Aug 9, 2016

@nehemiahjacob Yes, you can break it by assigning to it. Don't assign to it.

t = tree()
t['a']['b']['c']['hello']
t['a']['b']['c']['d']['world']

@sshadmand
Copy link

sshadmand commented Dec 3, 2016

This is great! Question though: How would I programmatically create a dynamic set list of keys into a tree with depth?

For example, I have a variable passed in as a list of terms: ['a', 'b', 'c'], and, based on that list I want to create a tree where each item is the child of the previous item.

Based on your example, the output would be equivalent to the output generated by this assignment:

t = tree()
t["a"]["b"]["c"]

But again, I want to create that tree dynamically based on the list of keys I am sent.

Any ideas? :)

UPDATE

I found the answer on S.O. here:

http://stackoverflow.com/questions/14692690/access-nested-dictionary-items-via-a-list-of-keys

@c0fec0de
Copy link

There is a powerful python tree library http://anytree.readthedocs.io/

@martinym
Copy link

martinym commented Dec 27, 2017

There's a very good answer discussion on stackoverflow about the different ways of doing this and in particular one very simple one that doesn't use defaultdict. It's very portable (Python 2.5+ & 3.x) and another potentially very nice thing about it is the resulting subclass instances print just like a normal dicts.

If you're too lazy to look (TL;DR), here's the code (all of it):

class Vividict(dict):
    def __missing__(self, key):
        value = self[key] = type(self)() # retain local pointer to value
        return value                     # faster to return than dict lookup

@jedie
Copy link

jedie commented May 3, 2018

@dvizzini
Copy link

dvizzini commented Aug 24, 2022

Added a wrapper:

class TreeBuilder:
    """Returns mutable trees structure, allowing for the following:
    tree_builder = utils.TreeBuilder()
    tree_builder["harold"]["username"] = "hrldcpr"
    tree_builder['handler']['username'] = 'matthandlersux'
    users = tree_builder.to_dict()

    See https://gist.github.com/hrldcpr/2012250.
    """

    def __init__(self):
        self._builder = self._tree()

    def __getitem__(self, key):
        return self._builder[key]

    @classmethod
    def _tree(cls) -> dict:
        return defaultdict(cls._tree)

    @classmethod
    def _to_dict(cls, node):
        if isinstance(node, dict):
            return {k: cls._to_dict(node[k]) for k in node}
        return node

    def to_dict(self) -> dict:
        """Returns trees are either values or a builtin python dict."""
        return self._to_dict(self._builder)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment