tl;dr: see this.

I stumbled across this interesting Gist called One-line Tree in Python, which demonstrates and explains the following code:


def tree(): return defaultdict(tree)

What does it do actually? Imagine that you need to create a data structure without actually knowing what would be in it, it may look like:


blog = tree()
blog['name'] = 'YJL --verbose'
blog['URL'] = 'https://yjlv.blogspot.com'
blog['author']['name'] = 'Yu Jie Lin'
blog['author']['email'] = 'someone@example.com'

Still don’t see the point? The first two can be done with dict() instead of tree(), but not the author part. You would have to set blog['author'] = {} first, that’s to initialize it as a dict object before assign name and email. But with autovivification, you don’t need to.

There are a few ways to implement autovivification in Python. The code above is just one of them. Before collections.defaultdict was born, many defined their own class.

All the codes can be found in this Gist, which includes some simple tests, dates, and links.

1   Autovivification

From Wikipedia:

Autovivification is the automatic creation of a variable reference when an undefined value is dereferenced.

It seems Perl is the first programming language to make this a feature—meaning you don’t need to do anything to have it—and known to everyone who is interested. According to Glossary of Programming Perl 3rd:

autovivification
A Greco-Roman word meaning “to bring oneself to life”. In Perl, storage locations (lvalues) spontaneously generate themselves as needed, including the creation of any hard reference values to point to the next level of storage. The assignment $a[5][5][5][5][5] = “quintet” potentially creates five scalar storage locations, plus four references (in the first four scalar locations) pointing to four new anonymous arrays (to hold the last four scalar locations). But the point of autovivification is that you don’t have to worry about it.

Greco-Roman period was the time around ancient Greece and Rome. “To bring oneself to life” is a very strange concept. If one isn’t alive, how could it do anything let along to bring itself to life? You have to be a living life form to perform an action, such as “bringing,” right?

Anyway, that isn’t the topic of this post.

2   class method

It’s basically the precursor of defaultdict, one example by Mirko Dziadzka:


class auto_dict(dict):
def __getitem__(self, key):
print(key)
return self.setdefault(key, self.__class__())

It utilizes dict.setdefault, if the key doesn’t exist, then creates a new auto_dict object and assigns it to that key.


>>> t = auto_dict()
>>> t[1][2] = 3
3
>>> t
{1: {2: 3}}

When first reference to t[1] a new auto_dict is created, so 3 can be assigned to the new auto_dict with key 2.

3   defaultdict method

The first entry I can find was by Ruslan Spivak, which has depth limitation, then a modified version with defaultdict on his blog post. It can be boiled down to just use lambda as shown below, a more elegant way than the opening code from the Gist, which squeezes function definition into an awkward one-liner.


from collections import defaultdict

tree = lambda: defaultdict(tree)

defaultdict created a dict just like previous class method, when a key doesn’t exist, it uses tree() to initialize a value for the key. Since tree is used as the default_factory, the show goes on and on when deeper key needs a default value.

4   Y combinator method

Rusian Spivak also wrote a Y combinator version as shown below, unfortunately, I am not familiar with Y combinator. I know what it does, only don’t know how to get to solution, although I can somehow derive the same solution, I just don’t know what it actually means. If you are interested, read The Y Combinator (Slight Return).


def Y(g):
return ((lambda f: f(f))
(lambda f:
g(lambda x: f(f)
(x))))

defdict = Y(lambda mk_dict:
lambda x=None: defaultdict(lambda x=None: mk_dict(x)))

A commentator on the same blog post, Adam, posted another Y combinator:


Y2 = lambda g: ( lambda f: f(f) )( lambda f: g( lambda *args: f(f)(*args) ) )
defdict2 = Y2(lambda mk_dict: lambda: defaultdict(mk_dict))

5   How about tree.trunk.branches.leaves?

Attributes, the favorites of lazy coders, we all love it because you don’t need to type bracket-quote-stuff-quote-bracket like:


tree['trunk']['branches']['leaves']

See I use single quotation mark? Why? Because double quotation mark needs Shift, I can never understand why other coders choose to type in " when both quotation marks has no actual syntactic difference, same in JavaScript, not in Bash.

Anyway, someone did already implement a code using attribute:


class AutoObject(object):
def __init__(self):
self.__store = {}


def __getattr__(self, name):
# automatically create objects accessed via an_auto_object.name
# Only get to this if normal lookup failed
obj = AutoObject()
setattr(self, name, obj)
return obj


def __getitem__(self, name):
# Automatically create objects accessed via [name] if they
# don't exist already.
return self.__store.setdefault(name, AutoObject())

Unfortunately, this doesn’t work well. It can automatically create new AutoObject if it’s via item access, and you have to assign value via attribute:


>>> t = AutoObject()
>>> t[1] = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'AutoObject' object does not support item assignment
>>> t[1].foo = 2

It requires some additional works.

6   Best of Both Worlds: dict and attribute

The sky’s the limit, this has been done by Roman Evstifeev in comment of this blog post:


class objdict(defaultdict):
def __getattr__(self, key):
try:
return self.__dict__[key]
except KeyError:
return self.__getitem__(key)

__setattr__ = lambda self, k, v: self.__setitem__(k,v)

objtree = lambda: objdict(objtree)

With this, you can:


>>> t = objtree()
>>> t.foo.bar = 1
>>> print(t['foo']['bar'])
1
>>> t.foo[2].bar = 4
>>>
>>> def dd2dr(dd):
... '''defaultdict to dict recursively'''
... return dict(
... (k, dd2dr(v) if isinstance(v, defaultdict) else v)
... for k, v in dd.items()
... )
...
>>> pprint.pprint(dd2dr(t))
{'foo': {2: {'bar': 4}, 'bar': 1}}

This is perfect solution, you have best of both worlds at the same time and I haven’t seen any problems with it.

7   Best of Three Worlds: dict, attribute, and one-liner

One-liner, huh? You asked and you have it:


objtree2 = lambda: type('objdict2', (defaultdict, ), dict(__getattr__=lambda self, k: self.__dict__[k] if k in self.__dict__ else self.__getitem__(k), __setattr__=lambda self, k, v: self.__setitem__(k,v)))(objtree2)

I modified the previous method and made it a one-liner. This combines the labmda form function and the class into one by using type to create a new type.

This post opened with a one-liner, then it must end with another.

The space’s the limit.