Skip to content

Instantly share code, notes, and snippets.

@Mercerenies
Last active November 6, 2022 00:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Mercerenies/df8fb4243d39f45615e1a7d10a66f011 to your computer and use it in GitHub Desktop.
Save Mercerenies/df8fb4243d39f45615e1a7d10a66f011 to your computer and use it in GitHub Desktop.

Field Access in Python

This page is an attempt to describe the following Python expression.

foo.bar

It's surprisingly complicated exactly what happens behind the scenes when you access a field on a Python object. And I've written several StackOverflow answers summarizing different parts of this process. But I've never seen a canonical description of the entire process in one place. This page aims to be a one-stop source for what happens when you access a field on a Python object, that people (myself included) can link to in StackOverflow answers as a source for the behavior of Python's complex metaprogramming model.

Introduction

To delve into this, I'll be using several resources. Anytime I reference the code, I'm referring specifically to the official CPython implementation1 of Python. My understanding is that everything I'm saying here should be fairly standardized and should apply to other Python implementations, but I'll be focusing on CPython if there are ever any discrepancies.

Further, I'll be focusing on Python 3.102. I'll try to note any differences between other recent versions of Python (say, dating back to 3.7 or so). But this article is not about Python 2. Especially back when old-style classes were a thing, a lot of this worked very differently, and differences between Python 2 and Python 3 are out of scope of this article.

Finally, this article does not claim to be an introduction to Python or an introduction to OOP. This article is targeted at experienced Python developers who want to know a bit more of the behind-the-scenes work that goes on in Python. If you're just learning Python, this is probably not the right page for you.

Objects in Python

First, we need to discuss a bit about how objects work in Python. Every object is an instance of a class. This also includes type objects, which are themselves classes and are instances of the type object3. Classes in Python have one or more superclasses, eventually culminating in the root class object, which is the only class in Python that has zero superclasses.

With a handful of exceptions, every object in Python has a dictionary where it stores its own fields. This dictionary is conventionally called __dict__ and, in the absence of the sort of shenanigans we'll be getting into today, it can be accessed on an object foo with foo.__dict__. An instance's dictionary stores the fields defined specific to that instance. In Java, we would conventionally call these instance variables. An instance's __dict__ does not include instance methods or class-level constants which are the same for all instances of the class.

Nearly every object in Python has a __dict__. Many built-in Python types do not have a __dict__, for efficiency reasons (we can store an int more compactly if we don't need to associate a hashtable with it, for instance). Objects whose type is object (i.e. instances created with the object() constructor which are not instances of any subclass of object) also lack a __dict__, as object() creates very minimal objects that have no additional features. Finally, a user-defined class in Python can opt out of __dict__ by defining a field called __slots__. We'll talk more about __slots__ later on, but at least for the first part of this discussion, we'll assume that the objects we're discussing have a __dict__.

Method Resolution Order

There's one other crucial part of Python classes we need to discuss first, and that's the method resolution order. Some languages like Java are single-inheritance languages. That means that every class (with the exception of the root class Object) has a single supertype. Then, when we go to look up a name, be it a method or a public field, there is a clear order in which those names should resolve. We start at the runtime class of the object, and if we don't see the name there, then we check the immediate superclass, and then its immediately superclass, and so on. In Python pseudocode, this lookup process could be summarized as

def find_name(my_type, name):
    while my_type is not None:
        if name in my_type.__dict__:
            return my_type.__dict__[name]
        else:
            my_type = my_type.__base__
    raise AttributeError(name)

However, Python is a language that supports multiple inheritance. A class can have one or more superclasses. That complicates name lookup, even in our simple "raw lookup" case. In order to use this same algorithm, we need a method resolution order. A method resolution order, often abbreviated "MRO", takes a type (which may have a complicated inheritance hierarchy) and returns a linear list of superclasses indicating the order in which to look for fields. Basically, it tells us which superclasses should be considered first.

The MRO for Python versions prior to 2.3 was a depth-first search. That is, if a class A had superclasses B and C, then we would always look in A, then in B, then recursively in all of B's superclasses, and only if the entire lookup in B failed would we come back and check C. This makes sense at a glance, but it fails to have some nice properties, monotonicity being a key one.

Starting in Python 2.3 (and going to the present day), Python uses an MRO called the C3 linearization method45. C3 (named thusly because it's designed to be consistent with three nice properties) is a slightly more complicated algorithm, which I won't describe the details of here (the linked pages do a great job of that on their own). The C3 algorithm takes a Python type as input and produces an ordered sequence (actually, a tuple) of itself and its superclasses in MRO order. We can see this order ourselves in Python with the __mro__ attribute on a type.

>>> class A: pass
...
>>> class B(A): pass
...
>>> class C(A): pass
...
>>> class D(B, C): pass
...
>>> D.__mro__
(<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <class 'object'>)

Every MRO in Python starts with the class itself and ends with object.

One other interesting consequence of the C3 algorithm is that we can actually get Python into a bind where it will reject our inheritance hierarchy. For example, this code will fail when we try to create the class F.

class A: pass
class B(A): pass
class C(A): pass
class D(B, C): pass
class E(C, B): pass
class F(D, E): pass # This line fails!

So, when I say to "get a field N on type T", what I mean is this. Take the type T and figure out its MRO. The MRO will start with T and end with object. Then, for each type in the MRO in order, see if that type has a slot named N. If so, return whatever is in that slot. If not, try the next one, and then the next. If none of the types have a slot named N, then fail. To summarize in a flowchart, we have

flowchart TD
    Start([Start])
    Mro[Get the MRO for type T]
    NameEx{Does the name N exist<br>on the type T?}
    IsObj{Is T the<br>root object type?}
    SetParent[Set T to be the<br>next class in<br>the MRO]
    Fail([Fail])
    Result([Return the value at N on T])

    Start-->Mro-->NameEx
    NameEx-->|No|IsObj
    IsObj-->|No|SetParent-->NameEx
    IsObj-->|Yes|Fail
    NameEx-->|Yes|Result

For types specifically, this lookup doesn't quite go through a __dict__. Type objects work a bit differently in Python, but conceptually you can still think of them as having a __dict__-like object backing them. In fact, calling object.__dict__ will return a special sort of dictionary-like object that represents the in-memory data backing object.

Descriptors

Next, we need to discuss the concept of descriptors6. In several places below, I'll say we "get a descriptor N on object X". When I say that, I mean the following process.

flowchart TD
    Start([Start])
    LetType[Let T be the type of X]
    GetD[Let D be the value of the field<br>N on the type T]
    NameEx{Does D exist?}
    Fail([Fail])
    IsBI{Is D an instance of<br>certain built-in classes?}
    CMagic[Do C-level magic<br>on object D]
    Result([Return D])
    Getter["Get field '__get__' on the type of D"]
    GetterExists{Did '__get__' exist?}
    Call["Call __get__(D, X, type(X))"]
    SetResult[Set D to the result<br>of the call]

    Start-->LetType-->GetD-->NameEx
    NameEx-->|No|Fail
    NameEx-->|Yes|LetD-->IsBI

    IsBI-->|Yes|CMagic-->Result
    IsBI-->|No|Getter-->GetterExists

    GetterExists-->|No|Result
    GetterExists-->|Yes|Call-->SetResult-->Result

Let's break this down. When we get a descriptor, the __dict__ of the object X doesn't matter. We only really care about the type of X. First, we need to find the name on the type of X, using our field lookup.

If field lookup fails, then the descriptor lookup also fails. Otherwise, we have a descriptor that we've gotten our hands on; call that descriptor object D. This is still an ordinary Python object. It might be an instance of a built-in type like function, or it might be some user-defined object.

Next, we have some C-level special cases. There are a handful of types for which this "descriptor lookup" process has a hand-written C function that performs some special magic. This is done both for efficiency and to break the recursion we'll see in a minute. These are for things like built-in functions, methods, and class methods. Basically, if you have a Python object of a built-in type that's defined in C, it'll just do what you expect, for some definition of "what you expect".

In the other case, where we have a user-defined object, we get the field with name __get__ on the type of D, using the MRO of type(D) this time. Then we call __get__ with three arguments: D, X, and type(X). There's another little subtlety here. You might think __get__ takes a self argument in the same way that most instance methods do, but that's not really true. Normally, instance methods bind their self argument through the exact process we're describing right now (the type function is one of those special types with C-level magic to do the binding), so we can't use that technique here since we're not done inventing it. Instead, we get the field called __get__, which is (hopefully) a callable object, and we call whatever is there directly with three arguments: D, X, and type(X). If we do the conventional Python thing and name the first argument self, then it'll look like everything just magically worked, even though we never actually constructed a bound method in this case.

Finally, once we've made the call, we set D to the result of the call and return it. This is the mechanism used for user-defined descriptor objects. For instance, if we wanted to reimplement the property7 descriptor directly in Python, it's actually quite straightforward and we would do so8 using __get__. If there was no __get__, then that's not an error; we just return D as-is. This is what happens if you have an ordinary class variable (say, a constant number or something like that) and access it through an instance.

Calling Functions

Now there's one more piece to this little puzzle. I said "call __get__ with three arguments", but strictly speaking, I haven't defined the word "call". Remember that, unlike in Javascript, arbitrary objects in Python can be called, not just functions. When I say "call F with arguments Xs...", what I mean9 is the following.

flowchart TD
    Start([Start])
    IsBI{Is F an instance of<br>certain built-in classes?}
    CMagic[Do C-level magic<br>to perform the call]
    Result([Return the result<br>of the call])
    Caller["Get descriptor '__call__'<br>on the type of F"]
    CallerExists{Did '__call__' exist?}
    Call["Call __call__(Xs...)"]
    Fail([Fail])

    Start-->IsBI

    IsBI-->|Yes|CMagic-->Result
    IsBI-->|No|Caller-->CallerExists

    CallerExists-->|No|Fail
    CallerExists-->|Yes|Call-->Result

This is basically the same process we used to get a descriptor, but there are some subtleties worth pointing out. First, we check if F is an instance of certain built-in types, such as function or classmethod. If it is, we do some special-casing in C to deal with it. If not, then we get the descriptor __call__ on the type of F. Read that very carefully. Our technique for calling a function involves getting a descriptor and our technique for getting a descriptor involves calling a function. This whole thing is recursive, and we could have a __get__ which has a custom __call__ which has a custom __get__, as far down as we want to go. The only thing that can break the loop is a built-in Python object, such as function.

Once we have the descriptor __call__, we call it and return the result. If __call__ doesn't exist, then the call fails.

Note, also, that I didn't say anything about self here. None of the code being discussed in this section cares at all about self or bound methods or any of that. Bound methods are handled by __get__. The C-level special-casing for the descriptor function constructs a special "bound method" object when we first get the descriptor, and then we call that object. Once we've reached the point where we have a concrete object to call, self is no longer a party to the contract; it's been dealt with.

Getting a Field on an Object

Now, with all of that background, we can finally get back to the premise of our question. What happens when we access a field on an object?

foo.bar

Here's the big picture of what happens. We'll still have to break down a couple of these steps in a minute, but this is what actually happens when you get a field called N on an object X.

flowchart TD
    Start([Start])
    GetAttro[Get the descriptor<br>'__getattribute__' on the<br>type of X]
    CallAttro[Call '__getattribute__' with one<br>argument: the string N]
    ResultAttro{What happened?}
    ReturnResult([Return the result of<br>the call])
    Propagate([Propagate the exception to<br>the caller])

    StoreExc[Let E be the AttributeError]
    GetAttr[Get the descriptor<br>'__getattr__' on the<br>type of X]
    GetAttrExist{Does '__getattr__' exist?}

    RaiseE([Re-raise the exception E])
    CallAttr[Call `__getattr__` with one<br>argument: the string N]
    ReturnResultA([Return the result of the call])

    Start-->GetAttro-->CallAttro-->ResultAttro

    ResultAttro-->|Function returned normally|ReturnResult
    ResultAttro-->|Raised exception other than AttributeError|Propagate
    ResultAttro-->|Raised AttributeError|StoreExc-->GetAttr-->GetAttrExist

    GetAttrExist-->|No|RaiseE
    GetAttrExist-->|Yes|CallAttr-->ReturnResultA

The first thing we do is get the descriptor __getattribute__. Again, this is a descriptor lookup, which means it goes through the whole process of calling __get__ (or doing grungy C shenanigans equivalent to __get__) on __getattribute__. Then we call __getattribute__ (which is probably, though not necessarily, a bound method object) with one argument: the string name of the field we're trying to get.

If __getattribute__ returns a value successfully, we're done. We return that value, and everyone is happy. If __getattribute__ raises an exception that isn't an instance of AttributeError, then that exception propagates back to the call site.

Then there's the final case: If __getattribute__ raises AttributeError, we fall back to __getattr__. We get the descriptor called __getattr__ (This is not a typo. There are two distinct magic methods called __getattribute__ and __getattr__. The one with shorter Huffman coding is the one you're intended to override more frequently). If this descriptor doesn't exist, simply let the original AttributeError propagate. If the descriptor does exist, then call it with the same string name and return the result. If __getattr__ raises an exception, let it propagate regardless of its type.

It's very possible for __getattr__ to not exist on a given object, and in that case we propagate the prior exception. There's no case in the above flowchart for __getattribute__ to not exist. That's because the root object object defines a function called __getattribute__, and we'll delve into its implementation in just a moment. (No, Python won't let you do del object.__getattribute__. Trust me, I just tried it, and the interpreter politely told me it was revoking my Python license)

So that's the big picture. foo.bar expands to, essentially,

attro_descr = type(foo).__getattribute__
attro = type(attro_descr).__get__(attro_descr, foo, type(foo))
call_descr = type(attro).__call__
call = type(call_descr).__get__(call_descr, attro, type(attro))
call('bar')

The Default __getattribute__

You're free to override __getattribute__ on your own classes. If you do so, then the above flowchart is a complete description of the way field lookups work. However, most classes will simply inherit the default __getattribute__ from object, so it's worth looking at that implementation10 as well.

Before I show the flowchart, there's one other minor piece of terminology we need. So far, we've talked about descriptor objects having a __get__ field. It's also possible for descriptors to describe what happens when we set or delete the field, with magic methods __set__ and __delete__ (Note that __del__ is not a descriptor magic method; it does something entirely different that's not relevant to this article at all). We'll talk more about the implementation of those later, but we need them to distinguish between data descriptors and non-data descriptors. A data descriptor11 is a descriptor that defines __set__ and/or __delete__. A non-data descriptor is a descriptor that defines only __get__. (An object that defines none of the three isn't really a descriptor at all, in any sense of the word, it's just an ordinary Python object)

With that out of the way, let's take a look at what the implementation of object.__getattribute__ does. Remember that __getattribute__ is called with one argument: the name N we're trying to get. We also have access to the object X we're getting the field on, since we're defining a true built-in Python function (hence, the C-level special cases kick in and bind self for us).

flowchart TD
    Start([Start])
    GetName[Let D be the value<br>at the name N<br>on the type of X]
    GetNameResult{Is D a<br>data descriptor?}
    DGet{Does the type of<br>D have a '__get__'?}
    GetDataD[Get the field '__get__'<br>on the type of D]
    CallData["Call __get__(D, X, type(X))"]
    ReturnData([Return the result])
    Dict{"Does X have a __dict__?"}
    NameInDict[Look up N in X.__dict__]
    NameInDictEx{"Does X.__dict__[N] exist?"}
    ReturnNameInDict(["Return X.__dict__[N]"])
    DExist{Does D exist?}
    Fail([Fail with AttributeError])
    ReturnD([Return D])

    Start-->GetName-->GetNameResult

    GetNameResult-->|Yes|DGet
    GetNameResult-->|No|Dict
    GetNameResult-->|D does not exist|Dict

    Dict-->|Yes|NameInDict-->NameInDictEx
    Dict-->|No|DExist
    NameInDictEx-->|Yes|ReturnNameInDict
    NameInDictEx-->|No|DExist

    DExist-->|No|Fail
    DExist-->|Yes|DGet
    DGet-->|Yes|GetDataD-->CallData-->ReturnData
    DGet-->|No|ReturnD

There's a lot to take in here. It can be broadly summarized as

  1. Try to return the __get__ of a data descriptor.
  2. If that fails, use the object's __dict__.
  3. If that fails (or __dict__ didn't exist), then try to return the __get__ of a non-data descriptor.
  4. Return a class attribute (without calling __get__), or fail if it really doesn't exist.

More in-depth, we start by getting the value of N on the type of X. We don't call __get__ yet; that comes later. Then we check if D is a data descriptor (i.e. if it defines __set__ or __delete__). If so, then we're going to use D, even if the name N exists on the object's __dict__. Data descriptors take precedent over __dict__, but __dict__ takes precedent over non-data descriptors.

If D is not a data descriptor or if it didn't exist, we try to use __dict__. If N exists in __dict__, then we return that value. We never call __get__ on an object retrieved from __dict__; we just return the object as-is.

If the __dict__ did not contain N (or if __dict__ didn't exist at all), then we ask if D even exists (i.e. is it a non-data descriptor?). If it doesn't, then we fail, which will (in our prior flowchart) fall back to __getattr__.

If we elect to use D (whether by it being a data descriptor or by __dict__ failing), then we call __get__ if it exists, or return D if not, just like if we were getting a descriptor at the C-level.

And that's that. That's how you access fields in Python, in full generality. There are other implementations of __getattribute__. Notably, type12 implements its own13 __getattribute__. I won't go in detail into that one, except to note that type.__getattribute__ will call __get__(descriptor_object, None, type_object) if it finds the name on the type itself. That is, __get__ should be prepared to handle None as its second argument. For example, if we define the following two classes

class Descriptor:
    def __get__(self, inst, owner):
        print((self, inst, owner))
        return None

class Foo:
    x = Descriptor()

Then Foo().x will print out (<Descriptor object>, <Foo object>, <class Foo>), by the object.__getattribute__ implementation we showed above. On the other hand, Foo.x will print out (<Descriptor object>, None, <class Foo>) by type.__getattribute__.

Setting a Field on an Object

Okay, believe it or not, the hard part is done. Getting fields is the most complicated part, for the simple reason that it's recursive. To get a field, we end up having to get __getattribute__ and __getattr__, which are also fields and therefore could also invoke the same behavior.

Setting a field is much simpler. When we have the expression

foo.bar = baz

This always does the same thing. There's no fallback to a different magic method like with __getattribute__. Setting bar on foo to the value baz will always get the descriptor __setattr__14 on type(foo). We're getting a descriptor, so if __setattr__ is some object with a __get__, then __get__ will be called. Yes, __get__ will be called while setting a field.

Once we have __setattr__, we call it with two arguments: the name "bar" as a string and the value baz. Like with __getattribute__, __setattr__ will always exist, since it exists on object. And the object implementation is not too crazy. Specifically, to set the field with name N on the object X to the value Y, object.__setattr__15 does the following.

flowchart TD
    Start([Start])
    GetName[Let D be the value<br>at the name N<br>on the type of X]
    GetNameResult{Does the type of D<br>have a '__set__'?}
    GetSet[Get the descriptor '__set__' on the type of D]
    CallSet["Call __set__(X, Y)"]
    Done([Return None])
    Dict{"Does X have a __dict__?"}
    Fail([Fail with AttributeError])
    Assign["Assign X.__dict__[N] = Y"]

    Start-->GetName-->GetNameResult
    GetNameResult-->|Yes|GetSet-->CallSet-->Done
    GetNameResult-->|No|Dict
    GetNameResult-->|D does not exist|Dict

    Dict-->|No|Fail
    Dict-->|Yes|Assign-->Done

In object.__setattr__, similar to object.__getattribute__, first we look for a descriptor. This time we're looking to see if the descriptor has a __set__ field. If it does, we call it and we're done. Remember that assignment statements in Python don't return anything, so any return value from __set__ (or from __setattr__, if you override that) will be ignored. (Assignment expressions16, affectionately dubbed the "walrus operator", can only be used to assign to simple variables, not slots on objects, so they'll never go through this process either)

If the descriptor did not exist or didn't have a __set__ (even if it does have a __get__ and a __delete__), then we look on the object's __dict__. If the object doesn't have a __dict__, then we fail. If it does, we do simple assignment to the object's own __dict__.

Deletion of the form del foo.bar works identically to assignment. We get the descriptor __delattr__17 and call it with one argument: the string name of the slot we want deleted. object.__delattr__ works like object.__setattr__ as well, but for the sake of completeness here it is.

flowchart TD
    Start([Start])
    GetName[Let D be the value<br>at the name N<br>on the type of X]
    GetNameResult{Does the type of D<br>have a '__delete__'?}
    GetSet[Get the descriptor '__delete__' on the type of D]
    CallSet["Call __delete__(X)"]
    Done([Return None])
    Dict{"Does X have a __dict__?"}
    Fail([Fail with AttributeError])
    DictField{"Does X.__dict__<br>contain the key N?"}
    Remove["Remove N from X.__dict__"]

    Start-->GetName-->GetNameResult
    GetNameResult-->|Yes|GetSet-->CallSet-->Done
    GetNameResult-->|No|Dict
    GetNameResult-->|D does not exist|Dict

    Dict-->|No|Fail
    Dict-->|Yes|DictField

    DictField-->|No|Fail
    DictField-->|Yes|Remove-->Done

The only major difference is that we can't delete things that don't exist. So, on objects with __dict__, trying to assign to a slot will happily either create a new slot or replace an existing value. But trying to delete a slot on a __dict__ will only work if the slot actually exists, failing with AttributeError if not. Other than that, we call __delete__18 on descriptors, just like we did for __set__ earlier.

__slots__

Finally, there's __slots__. __slots__ is actually magic, in the sense that we can't make something as efficient as __slots__ in pure Python without the help of C. But we could actually get something semantically equivalent but slower. So it's only a little bit magic.

When a class finishes evaluating its body (that is, when we hit the dedent at the end of a class body), Python does a check to see if __slots__ exists on our newly-defined type. If it doesn't, then the type gets a __dict__. If it does, then we do something special.

First, when we create an instance of our special type, rather than giving it a __dict__, we allocate exactly enough memory to store all of the slots, no more and no less. Second, at class creation time, Python defines several19 descriptor objects, one for each slot we asked for. These descriptor objects automagically know where to look in our allocated object for their own field, and they get, set, or delete that field. Note that "delete", in this context, means "set to NULL". That's not the Python object None, that's a genuine C-style null pointer.

Contributors

This document was written by Mercerenies. If I've made a mistake or something is unclear, feel free to add a comment, and I'd be happy to edit the document and add your name down here. If you're making a contribution, do note that you're releasing that contribution under CC BY-SA 4.020.

License

I fully intend that this post be useful to Python programmers and be a resource for StackOverflow and other sites to link to. This entire article is licensed under CC BY-SA 4.020.


Footnotes

  1. https://github.com/python/cpython/

  2. https://github.com/python/cpython/tree/3.10

  3. https://docs.python.org/3/library/functions.html#type

  4. https://www.python.org/download/releases/2.3/mro/

  5. https://en.wikipedia.org/wiki/C3_linearization

  6. https://docs.python.org/3/howto/descriptor.html

  7. https://docs.python.org/3/library/functions.html#property

  8. https://docs.python.org/3/howto/descriptor.html#properties

  9. https://github.com/python/cpython/blob/ae5317d309f3b730c25797b07b3fbfc3a1357e7d/Objects/call.c#L170-L225

  10. https://github.com/python/cpython/blob/ae5317d309f3b730c25797b07b3fbfc3a1357e7d/Objects/object.c#L1216-L1329

  11. https://docs.python.org/release/3.10.0/reference/datamodel.html#object.__set__

  12. https://docs.python.org/3/c-api/typeobj.html

  13. https://github.com/python/cpython/blob/ae5317d309f3b730c25797b07b3fbfc3a1357e7d/Objects/typeobject.c#L3885-L3972

  14. https://docs.python.org/release/3.10.0/reference/datamodel.html#object.__setattr__

  15. https://github.com/python/cpython/blob/982273ae799c01e5f28fb6314d77591e0379813d/Objects/object.c#L1338-L1410

  16. https://peps.python.org/pep-0572/

  17. https://docs.python.org/release/3.10.0/reference/datamodel.html#object.__delattr__

  18. https://docs.python.org/release/3.10.0/reference/datamodel.html#object.__delete__

  19. https://docs.python.org/3/howto/descriptor.html#id29

  20. https://creativecommons.org/licenses/by-sa/4.0/ 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment