This page is an attempt to describe the following Python expression.
foo.bar
It's surprisingly complicated exactly what happens behind the scenes when you access a field on a Python object. And I've written several StackOverflow answers summarizing different parts of this process. But I've never seen a canonical description of the entire process in one place. This page aims to be a one-stop source for what happens when you access a field on a Python object, that people (myself included) can link to in StackOverflow answers as a source for the behavior of Python's complex metaprogramming model.
To delve into this, I'll be using several resources. Anytime I reference the code, I'm referring specifically to the official CPython implementation1 of Python. My understanding is that everything I'm saying here should be fairly standardized and should apply to other Python implementations, but I'll be focusing on CPython if there are ever any discrepancies.
Further, I'll be focusing on Python 3.102. I'll try to note any differences between other recent versions of Python (say, dating back to 3.7 or so). But this article is not about Python 2. Especially back when old-style classes were a thing, a lot of this worked very differently, and differences between Python 2 and Python 3 are out of scope of this article.
Finally, this article does not claim to be an introduction to Python or an introduction to OOP. This article is targeted at experienced Python developers who want to know a bit more of the behind-the-scenes work that goes on in Python. If you're just learning Python, this is probably not the right page for you.
First, we need to discuss a bit about how objects work in Python. Every object is an instance of a class. This also includes type objects, which are themselves classes and are instances of the type
object3. Classes in Python have one or more superclasses, eventually culminating in the root class object
, which is the only class in Python that has zero superclasses.
With a handful of exceptions, every object in Python has a dictionary where it stores its own fields. This dictionary is conventionally called __dict__
and, in the absence of the sort of shenanigans we'll be getting into today, it can be accessed on an object foo
with foo.__dict__
. An instance's dictionary stores the fields defined specific to that instance. In Java, we would conventionally call these instance variables. An instance's __dict__
does not include instance methods or class-level constants which are the same for all instances of the class.
Nearly every object in Python has a __dict__
. Many built-in Python types do not have a __dict__
, for efficiency reasons (we can store an int
more compactly if we don't need to associate a hashtable with it, for instance). Objects whose type is object
(i.e. instances created with the object()
constructor which are not instances of any subclass of object
) also lack a __dict__
, as object()
creates very minimal objects that have no additional features. Finally, a user-defined class in Python can opt out of __dict__
by defining a field called __slots__
. We'll talk more about __slots__
later on, but at least for the first part of this discussion, we'll assume that the objects we're discussing have a __dict__
.
There's one other crucial part of Python classes we need to discuss first, and that's the method resolution order. Some languages like Java are single-inheritance languages. That means that every class (with the exception of the root class Object
) has a single supertype. Then, when we go to look up a name, be it a method or a public field, there is a clear order in which those names should resolve. We start at the runtime class of the object, and if we don't see the name there, then we check the immediate superclass, and then its immediately superclass, and so on. In Python pseudocode, this lookup process could be summarized as
def find_name(my_type, name):
while my_type is not None:
if name in my_type.__dict__:
return my_type.__dict__[name]
else:
my_type = my_type.__base__
raise AttributeError(name)
However, Python is a language that supports multiple inheritance. A class can have one or more superclasses. That complicates name lookup, even in our simple "raw lookup" case. In order to use this same algorithm, we need a method resolution order. A method resolution order, often abbreviated "MRO", takes a type (which may have a complicated inheritance hierarchy) and returns a linear list of superclasses indicating the order in which to look for fields. Basically, it tells us which superclasses should be considered first.
The MRO for Python versions prior to 2.3 was a depth-first search. That is, if a class A
had superclasses B
and C
, then we would always look in A
, then in B
, then recursively in all of B
's superclasses, and only if the entire lookup in B
failed would we come back and check C
. This makes sense at a glance, but it fails to have some nice properties, monotonicity being a key one.
Starting in Python 2.3 (and going to the present day), Python uses an MRO called the C3 linearization method45. C3 (named thusly because it's designed to be consistent with three nice properties) is a slightly more complicated algorithm, which I won't describe the details of here (the linked pages do a great job of that on their own). The C3 algorithm takes a Python type as input and produces an ordered sequence (actually, a tuple) of itself and its superclasses in MRO order. We can see this order ourselves in Python with the __mro__
attribute on a type.
>>> class A: pass
...
>>> class B(A): pass
...
>>> class C(A): pass
...
>>> class D(B, C): pass
...
>>> D.__mro__
(<class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <class 'object'>)
Every MRO in Python starts with the class itself and ends with object
.
One other interesting consequence of the C3 algorithm is that we can actually get Python into a bind where it will reject our inheritance hierarchy. For example, this code will fail when we try to create the class F
.
class A: pass
class B(A): pass
class C(A): pass
class D(B, C): pass
class E(C, B): pass
class F(D, E): pass # This line fails!
So, when I say to "get a field N
on type T
", what I mean is this. Take the type T
and figure out its MRO. The MRO will start with T
and end with object
. Then, for each type in the MRO in order, see if that type has a slot named N
. If so, return whatever is in that slot. If not, try the next one, and then the next. If none of the types have a slot named N
, then fail. To summarize in a flowchart, we have
flowchart TD
Start([Start])
Mro[Get the MRO for type T]
NameEx{Does the name N exist<br>on the type T?}
IsObj{Is T the<br>root object type?}
SetParent[Set T to be the<br>next class in<br>the MRO]
Fail([Fail])
Result([Return the value at N on T])
Start-->Mro-->NameEx
NameEx-->|No|IsObj
IsObj-->|No|SetParent-->NameEx
IsObj-->|Yes|Fail
NameEx-->|Yes|Result
For types specifically, this lookup doesn't quite go through a __dict__
. Type objects work a bit differently in Python, but conceptually you can still think of them as having a __dict__
-like object backing them. In fact, calling object.__dict__
will return a special sort of dictionary-like object that represents the in-memory data backing object
.
Next, we need to discuss the concept of descriptors6. In several places below, I'll say we "get a descriptor N
on object X
". When I say that, I mean the following process.
flowchart TD
Start([Start])
LetType[Let T be the type of X]
GetD[Let D be the value of the field<br>N on the type T]
NameEx{Does D exist?}
Fail([Fail])
IsBI{Is D an instance of<br>certain built-in classes?}
CMagic[Do C-level magic<br>on object D]
Result([Return D])
Getter["Get field '__get__' on the type of D"]
GetterExists{Did '__get__' exist?}
Call["Call __get__(D, X, type(X))"]
SetResult[Set D to the result<br>of the call]
Start-->LetType-->GetD-->NameEx
NameEx-->|No|Fail
NameEx-->|Yes|LetD-->IsBI
IsBI-->|Yes|CMagic-->Result
IsBI-->|No|Getter-->GetterExists
GetterExists-->|No|Result
GetterExists-->|Yes|Call-->SetResult-->Result
Let's break this down. When we get a descriptor, the __dict__
of the object X
doesn't matter. We only really care about the type of X
. First, we need to find the name on the type of X, using our field lookup.
If field lookup fails, then the descriptor lookup also fails. Otherwise, we have a descriptor that we've gotten our hands on; call that descriptor object D. This is still an ordinary Python object. It might be an instance of a built-in type like function
, or it might be some user-defined object.
Next, we have some C-level special cases. There are a handful of types for which this "descriptor lookup" process has a hand-written C function that performs some special magic. This is done both for efficiency and to break the recursion we'll see in a minute. These are for things like built-in functions, methods, and class methods. Basically, if you have a Python object of a built-in type that's defined in C, it'll just do what you expect, for some definition of "what you expect".
In the other case, where we have a user-defined object, we get the field with name __get__
on the type of D
, using the MRO of type(D)
this time. Then we call __get__
with three arguments: D
, X
, and type(X)
. There's another little subtlety here. You might think __get__
takes a self
argument in the same way that most instance methods do, but that's not really true. Normally, instance methods bind their self
argument through the exact process we're describing right now (the type function
is one of those special types with C-level magic to do the binding), so we can't use that technique here since we're not done inventing it. Instead, we get the field called __get__
, which is (hopefully) a callable object, and we call whatever is there directly with three arguments: D
, X
, and type(X)
. If we do the conventional Python thing and name the first argument self
, then it'll look like everything just magically worked, even though we never actually constructed a bound method in this case.
Finally, once we've made the call, we set D
to the result of the call and return it. This is the mechanism used for user-defined descriptor objects. For instance, if we wanted to reimplement the property
7 descriptor directly in Python, it's actually quite straightforward and we would do so8 using __get__
. If there was no __get__
, then that's not an error; we just return D
as-is. This is what happens if you have an ordinary class variable (say, a constant number or something like that) and access it through an instance.
Now there's one more piece to this little puzzle. I said "call __get__
with three arguments", but strictly speaking, I haven't defined the word "call". Remember that, unlike in Javascript, arbitrary objects in Python can be called, not just functions. When I say "call F
with arguments Xs...
", what I mean9 is the following.
flowchart TD
Start([Start])
IsBI{Is F an instance of<br>certain built-in classes?}
CMagic[Do C-level magic<br>to perform the call]
Result([Return the result<br>of the call])
Caller["Get descriptor '__call__'<br>on the type of F"]
CallerExists{Did '__call__' exist?}
Call["Call __call__(Xs...)"]
Fail([Fail])
Start-->IsBI
IsBI-->|Yes|CMagic-->Result
IsBI-->|No|Caller-->CallerExists
CallerExists-->|No|Fail
CallerExists-->|Yes|Call-->Result
This is basically the same process we used to get a descriptor, but there are some subtleties worth pointing out. First, we check if F
is an instance of certain built-in types, such as function
or classmethod
. If it is, we do some special-casing in C to deal with it. If not, then we get the descriptor __call__
on the type of F
. Read that very carefully. Our technique for calling a function involves getting a descriptor and our technique for getting a descriptor involves calling a function. This whole thing is recursive, and we could have a __get__
which has a custom __call__
which has a custom __get__
, as far down as we want to go. The only thing that can break the loop is a built-in Python object, such as function
.
Once we have the descriptor __call__
, we call it and return the result. If __call__
doesn't exist, then the call fails.
Note, also, that I didn't say anything about self
here. None of the code being discussed in this section cares at all about self
or bound methods or any of that. Bound methods are handled by __get__
. The C-level special-casing for the descriptor function
constructs a special "bound method" object when we first get the descriptor, and then we call that object. Once we've reached the point where we have a concrete object to call, self
is no longer a party to the contract; it's been dealt with.
Now, with all of that background, we can finally get back to the premise of our question. What happens when we access a field on an object?
foo.bar
Here's the big picture of what happens. We'll still have to break down a couple of these steps in a minute, but this is what actually happens when you get a field called N
on an object X
.
flowchart TD
Start([Start])
GetAttro[Get the descriptor<br>'__getattribute__' on the<br>type of X]
CallAttro[Call '__getattribute__' with one<br>argument: the string N]
ResultAttro{What happened?}
ReturnResult([Return the result of<br>the call])
Propagate([Propagate the exception to<br>the caller])
StoreExc[Let E be the AttributeError]
GetAttr[Get the descriptor<br>'__getattr__' on the<br>type of X]
GetAttrExist{Does '__getattr__' exist?}
RaiseE([Re-raise the exception E])
CallAttr[Call `__getattr__` with one<br>argument: the string N]
ReturnResultA([Return the result of the call])
Start-->GetAttro-->CallAttro-->ResultAttro
ResultAttro-->|Function returned normally|ReturnResult
ResultAttro-->|Raised exception other than AttributeError|Propagate
ResultAttro-->|Raised AttributeError|StoreExc-->GetAttr-->GetAttrExist
GetAttrExist-->|No|RaiseE
GetAttrExist-->|Yes|CallAttr-->ReturnResultA
The first thing we do is get the descriptor __getattribute__
. Again, this is a descriptor lookup, which means it goes through the whole process of calling __get__
(or doing grungy C shenanigans equivalent to __get__
) on __getattribute__
. Then we call __getattribute__
(which is probably, though not necessarily, a bound method object) with one argument: the string name of the field we're trying to get.
If __getattribute__
returns a value successfully, we're done. We return that value, and everyone is happy. If __getattribute__
raises an exception that isn't an instance of AttributeError
, then that exception propagates back to the call site.
Then there's the final case: If __getattribute__
raises AttributeError
, we fall back to __getattr__
. We get the descriptor called __getattr__
(This is not a typo. There are two distinct magic methods called __getattribute__
and __getattr__
. The one with shorter Huffman coding is the one you're intended to override more frequently). If this descriptor doesn't exist, simply let the original AttributeError
propagate. If the descriptor does exist, then call it with the same string name and return the result. If __getattr__
raises an exception, let it propagate regardless of its type.
It's very possible for __getattr__
to not exist on a given object, and in that case we propagate the prior exception. There's no case in the above flowchart for __getattribute__
to not exist. That's because the root object object
defines a function called __getattribute__
, and we'll delve into its implementation in just a moment. (No, Python won't let you do del object.__getattribute__
. Trust me, I just tried it, and the interpreter politely told me it was revoking my Python license)
So that's the big picture. foo.bar
expands to, essentially,
attro_descr = type(foo).__getattribute__
attro = type(attro_descr).__get__(attro_descr, foo, type(foo))
call_descr = type(attro).__call__
call = type(call_descr).__get__(call_descr, attro, type(attro))
call('bar')
You're free to override __getattribute__
on your own classes. If you do so, then the above flowchart is a complete description of the way field lookups work. However, most classes will simply inherit the default __getattribute__
from object
, so it's worth looking at that implementation10 as well.
Before I show the flowchart, there's one other minor piece of terminology we need. So far, we've talked about descriptor objects having a __get__
field. It's also possible for descriptors to describe what happens when we set or delete the field, with magic methods __set__
and __delete__
(Note that __del__
is not a descriptor magic method; it does something entirely different that's not relevant to this article at all). We'll talk more about the implementation of those later, but we need them to distinguish between data descriptors and non-data descriptors. A data descriptor11 is a descriptor that defines __set__
and/or __delete__
. A non-data descriptor is a descriptor that defines only __get__
. (An object that defines none of the three isn't really a descriptor at all, in any sense of the word, it's just an ordinary Python object)
With that out of the way, let's take a look at what the implementation of object.__getattribute__
does. Remember that __getattribute__
is called with one argument: the name N
we're trying to get. We also have access to the object X
we're getting the field on, since we're defining a true built-in Python function (hence, the C-level special cases kick in and bind self
for us).
flowchart TD
Start([Start])
GetName[Let D be the value<br>at the name N<br>on the type of X]
GetNameResult{Is D a<br>data descriptor?}
DGet{Does the type of<br>D have a '__get__'?}
GetDataD[Get the field '__get__'<br>on the type of D]
CallData["Call __get__(D, X, type(X))"]
ReturnData([Return the result])
Dict{"Does X have a __dict__?"}
NameInDict[Look up N in X.__dict__]
NameInDictEx{"Does X.__dict__[N] exist?"}
ReturnNameInDict(["Return X.__dict__[N]"])
DExist{Does D exist?}
Fail([Fail with AttributeError])
ReturnD([Return D])
Start-->GetName-->GetNameResult
GetNameResult-->|Yes|DGet
GetNameResult-->|No|Dict
GetNameResult-->|D does not exist|Dict
Dict-->|Yes|NameInDict-->NameInDictEx
Dict-->|No|DExist
NameInDictEx-->|Yes|ReturnNameInDict
NameInDictEx-->|No|DExist
DExist-->|No|Fail
DExist-->|Yes|DGet
DGet-->|Yes|GetDataD-->CallData-->ReturnData
DGet-->|No|ReturnD
There's a lot to take in here. It can be broadly summarized as
- Try to return the
__get__
of a data descriptor. - If that fails, use the object's
__dict__
. - If that fails (or
__dict__
didn't exist), then try to return the__get__
of a non-data descriptor. - Return a class attribute (without calling
__get__
), or fail if it really doesn't exist.
More in-depth, we start by getting the value of N
on the type of X
. We don't call __get__
yet; that comes later. Then we check if D
is a data descriptor (i.e. if it defines __set__
or __delete__
). If so, then we're going to use D
, even if the name N
exists on the object's __dict__
. Data descriptors take precedent over __dict__
, but __dict__
takes precedent over non-data descriptors.
If D
is not a data descriptor or if it didn't exist, we try to use __dict__
. If N
exists in __dict__
, then we return that value. We never call __get__
on an object retrieved from __dict__
; we just return the object as-is.
If the __dict__
did not contain N
(or if __dict__
didn't exist at all), then we ask if D
even exists (i.e. is it a non-data descriptor?). If it doesn't, then we fail, which will (in our prior flowchart) fall back to __getattr__
.
If we elect to use D
(whether by it being a data descriptor or by __dict__
failing), then we call __get__
if it exists, or return D
if not, just like if we were getting a descriptor at the C-level.
And that's that. That's how you access fields in Python, in full generality. There are other implementations of __getattribute__
. Notably, type
12 implements its own13 __getattribute__
. I won't go in detail into that one, except to note that type.__getattribute__
will call __get__(descriptor_object, None, type_object)
if it finds the name on the type itself. That is, __get__
should be prepared to handle None
as its second argument. For example, if we define the following two classes
class Descriptor:
def __get__(self, inst, owner):
print((self, inst, owner))
return None
class Foo:
x = Descriptor()
Then Foo().x
will print out (<Descriptor object>, <Foo object>, <class Foo>)
, by the object.__getattribute__
implementation we showed above. On the other hand, Foo.x
will print out (<Descriptor object>, None, <class Foo>)
by type.__getattribute__
.
Okay, believe it or not, the hard part is done. Getting fields is the most complicated part, for the simple reason that it's recursive. To get a field, we end up having to get __getattribute__
and __getattr__
, which are also fields and therefore could also invoke the same behavior.
Setting a field is much simpler. When we have the expression
foo.bar = baz
This always does the same thing. There's no fallback to a different magic method like with __getattribute__
. Setting bar
on foo
to the value baz
will always get the descriptor __setattr__
14 on type(foo)
. We're getting a descriptor, so if __setattr__
is some object with a __get__
, then __get__
will be called. Yes, __get__
will be called while setting a field.
Once we have __setattr__
, we call it with two arguments: the name "bar"
as a string and the value baz
. Like with __getattribute__
, __setattr__
will always exist, since it exists on object.
And the object
implementation is not too crazy. Specifically, to set the field with name N
on the object X
to the value Y
, object.__setattr__
15 does the following.
flowchart TD
Start([Start])
GetName[Let D be the value<br>at the name N<br>on the type of X]
GetNameResult{Does the type of D<br>have a '__set__'?}
GetSet[Get the descriptor '__set__' on the type of D]
CallSet["Call __set__(X, Y)"]
Done([Return None])
Dict{"Does X have a __dict__?"}
Fail([Fail with AttributeError])
Assign["Assign X.__dict__[N] = Y"]
Start-->GetName-->GetNameResult
GetNameResult-->|Yes|GetSet-->CallSet-->Done
GetNameResult-->|No|Dict
GetNameResult-->|D does not exist|Dict
Dict-->|No|Fail
Dict-->|Yes|Assign-->Done
In object.__setattr__
, similar to object.__getattribute__
, first we look for a descriptor. This time we're looking to see if the descriptor has a __set__
field. If it does, we call it and we're done. Remember that assignment statements in Python don't return anything, so any return value from __set__
(or from __setattr__
, if you override that) will be ignored. (Assignment expressions16, affectionately dubbed the "walrus operator", can only be used to assign to simple variables, not slots on objects, so they'll never go through this process either)
If the descriptor did not exist or didn't have a __set__
(even if it does have a __get__
and a __delete__
), then we look on the object's __dict__
. If the object doesn't have a __dict__
, then we fail. If it does, we do simple assignment to the object's own __dict__
.
Deletion of the form del foo.bar
works identically to assignment. We get the descriptor __delattr__
17 and call it with one argument: the string name of the slot we want deleted. object.__delattr__
works like object.__setattr__
as well, but for the sake of completeness here it is.
flowchart TD
Start([Start])
GetName[Let D be the value<br>at the name N<br>on the type of X]
GetNameResult{Does the type of D<br>have a '__delete__'?}
GetSet[Get the descriptor '__delete__' on the type of D]
CallSet["Call __delete__(X)"]
Done([Return None])
Dict{"Does X have a __dict__?"}
Fail([Fail with AttributeError])
DictField{"Does X.__dict__<br>contain the key N?"}
Remove["Remove N from X.__dict__"]
Start-->GetName-->GetNameResult
GetNameResult-->|Yes|GetSet-->CallSet-->Done
GetNameResult-->|No|Dict
GetNameResult-->|D does not exist|Dict
Dict-->|No|Fail
Dict-->|Yes|DictField
DictField-->|No|Fail
DictField-->|Yes|Remove-->Done
The only major difference is that we can't delete things that don't exist. So, on objects with __dict__
, trying to assign to a slot will happily either create a new slot or replace an existing value. But trying to delete a slot on a __dict__
will only work if the slot actually exists, failing with AttributeError
if not. Other than that, we call __delete__
18 on descriptors, just like we did for __set__
earlier.
Finally, there's __slots__
. __slots__
is actually magic, in the sense that we can't make something as efficient as __slots__
in pure Python without the help of C. But we could actually get something semantically equivalent but slower. So it's only a little bit magic.
When a class finishes evaluating its body (that is, when we hit the dedent at the end of a class body), Python does a check to see if __slots__
exists on our newly-defined type. If it doesn't, then the type gets a __dict__
. If it does, then we do something special.
First, when we create an instance of our special type, rather than giving it a __dict__
, we allocate exactly enough memory to store all of the slots, no more and no less. Second, at class creation time, Python defines several19 descriptor objects, one for each slot we asked for. These descriptor objects automagically know where to look in our allocated object for their own field, and they get, set, or delete that field. Note that "delete", in this context, means "set to NULL
". That's not the Python object None
, that's a genuine C-style null pointer.
This document was written by Mercerenies. If I've made a mistake or something is unclear, feel free to add a comment, and I'd be happy to edit the document and add your name down here. If you're making a contribution, do note that you're releasing that contribution under CC BY-SA 4.020.
I fully intend that this post be useful to Python programmers and be a resource for StackOverflow and other sites to link to. This entire article is licensed under CC BY-SA 4.020.
Footnotes
-
https://docs.python.org/3/howto/descriptor.html#properties ↩
-
https://github.com/python/cpython/blob/ae5317d309f3b730c25797b07b3fbfc3a1357e7d/Objects/call.c#L170-L225 ↩
-
https://github.com/python/cpython/blob/ae5317d309f3b730c25797b07b3fbfc3a1357e7d/Objects/object.c#L1216-L1329 ↩
-
https://docs.python.org/release/3.10.0/reference/datamodel.html#object.__set__ ↩
-
https://github.com/python/cpython/blob/ae5317d309f3b730c25797b07b3fbfc3a1357e7d/Objects/typeobject.c#L3885-L3972 ↩
-
https://docs.python.org/release/3.10.0/reference/datamodel.html#object.__setattr__ ↩
-
https://github.com/python/cpython/blob/982273ae799c01e5f28fb6314d77591e0379813d/Objects/object.c#L1338-L1410 ↩
-
https://docs.python.org/release/3.10.0/reference/datamodel.html#object.__delattr__ ↩
-
https://docs.python.org/release/3.10.0/reference/datamodel.html#object.__delete__ ↩