OOP in Python, A Graded Knowledge Check
Table of Contents
- Level 0 - Background Knowledge
- Level 1
- Level 2
- Level 3
- Level 4
- Level 5 - Beyond
This document tries to provide a checklist of important concepts in object-oriented programming with a heavy focus on Python. This document is not:
- a complete (or even good) introduction to object oriented programming.
- intended for learning new concepts; my only goal is to help you identify holes in your knowledge.
Everything in bold is a specific, commonly used, technical term. Unless otherwise specified, the technical terms are generic and not Python-specific. I recommend that you try to always keep the distinction between general concepts and their language-specific realization (e.g. how they are in Python) separate.
The internal implementation of OOP in Python has changed a significant amount
between versions 2 and 3, I think the latter is more consistent and easy to
follow. All code examples are in Python 3; the output
Level 0 - Background Knowledge
0A Types, Values, and Variables
We need to distinguish between types, values and variables. For
example, when we say
x = 'abraham' this statement does a bunch of things:
- create a value
- create a variable
strand assigns the value
If we then say
print('My name is %s.' % x). This will:
- evaluate the expression
'My name is %s.' % xwhich produces a new
'My name is abraham.'. This value is as legitimate of a value as the original
'abraham'even though it is not assigned to any variable.
- It calls the
'My name is abraham.'.
The definition of a type, roughly speaking, captures two
things: the kinds of data it can hold and the behavior (aka operators)
associated with it. For example, when we say
x = 'abraham':
- The fact that the sequence of bits (0s and 1s) that store this string in
memory is interpreted as a string (and not say an integer) is part of the
definition of the type
- The fact that we can do things like
x.title()(which evaluates to
x + ' lincoln'(which evaluates to
'abraham lincoln') is part of the definition of the type
0B What happens when a function is executed?
At a very high level when a function is called (aka invoked, aka executed):
- caller invokes the function and provides values for its arguments,
- new scope is created,
- arguments are passed from the caller and assigned to variables in the new scope,
- function body is executed and a value is returned to the caller.
Follow the above steps in this example:
def square(x): return x ** 2 print(square(2)) # 4 print(square(square(2)) # 16
Note that there are a variety of ways in which a function might "do its job":
- returning an output,
- modifying the arguments themselves,
- neither of the two (e.g. storing/sending data somewhere else)
1A Classes and Instances
Instances (aka objects) are related to classes in the same way that
values are related to their types. In Python this analogy is literally true; a
class literally defines a new type in the same way that, say,
int is a type:
x = 2 type(x) == int # True isinstance(int, type) # True class Human: pass h = Human() type(h) == Human # True isinstance(Human, type) # True
The types (i.e. classes) provided by the language itself are called
dict (and a bunch more)
are all built-in types.
1B Instantiating a class
When we create an instance of a class we say we are instantiating that
class. In Python, calling
__init__ is the last, and most commonly modified,
step of the instantiation process (more on this in level 5).
class Human: def __init__(self): print('executing __init__') h = Human() # executing __init__
- In most other languages the process of instantiation is handled by a
function called a constructor, Python has something sort of similar to
that which is
__new__. The subtle difference between
__new__(we rarely work directly with the latter) is for level 5. Ignore all of this for now, but know that the word constructor is a very commonly used word and to a good approximation, the Python version of it is
- Notice that instantiating a class has the same syntax in Python as function
calls. This is a Python-specific feature. Many other languages (e.g. Java,
newkeyword that is used when instantiating classes (e.g. you would say
h = new Human(). In Python there is no
1C Instances have state and behavior
An object (a class instance) has state (i.e. data) and behavior (i.e.
code). The behavior of an object is all its methods (almost all OOP
languages use this term) and its state is its attributes (this is
Python-specific terminology; most other languages call these instance
variables). It is important to know how to work with attributes and
methods of an instance, their scope and how to access them via
Attributes are like variables but they belong to an instance (aka object). You can get them or set them (aka read them or write to them) like any other variable:
class Person: def __init__(self, name): self.name = name p = Person('Mary') print(p.name) # Mary p.name = 'John' print(p.name) # John
Methods are like functions but they belong to an instance (more specifically they are bound to that instance). You can call them like any other function:
class Person: def __init__(self, name): self.name = name def hello(self): print('Hello! My name is %s.' % self.name) p = Person('Julie') p.hello() # Hello! My name is Julie. p.name = 'Bob' p.hello() # Hello! My name is Bob.
Notice the magic happening in the signature of methods: the first argument (called
thiskeyword. There is no such thing in Python. The fact that
selfis explicit in Python is a reflection of its philosophy of "Explicit is better than implicit."
Python has the notion of a property which is a method that behaves like an attribute. The whole point of this is convenience. For example:
from datetime import datetime class Person: def __init__(self, yob): self.yob = yob @property def age(self): return datetime.now().year - self.yob p = Person(2000) p.age # 19
Aside: A lot of languages (e.g. Java, C++, PHP) require attributes and
methods to be either private or public, or (in some languages)
protected. None of this exists in Python. However, there are conventions
that kind of achieve the same goal in the end, and that is the use of a leading
_some_func) to signal to other programmers "don't muck with
this". You can ignore this whole business at this level.
2A Two important relationships: has-a and Is-a
Two useful words to describe certain relationships are: is-a (
str) and has-a (
'hello' has-a length). For example, one might say:
- Abraham Lincoln (instance) is-a Human (type).
- Every Human (type) is-a Mammal (type); every Mammal is-a Vertebrate (type); and every Vertebrate is-a Animal (type).
- Abraham Lincoln, by extension, is-a Mammal, Vertebrate, and Animal.
- In contrast, my dog (the individual, instance) is-a Mammal but not is-a Human.
- Any Mammal has-a neocortex, and therefore, both Abraham Lincoln and my dog has-a neocortex.
- Similarly, Lonesome George, is-a vertebrate, but not is-a Mammal, has-a backbone, but not has-a neocortex.
2B Inheritance and class hierarchies
Similar to the above intuitive idea, class hierarchies can be built through
inheritance. A class
B can be a subclass another class
A (aka class
B extends class
A is a superclass, or base class of
B). This means that:
- Any instance of
B, aside from is-a
B, also is-a
A(i.e. is an instance of class
- The relationship between an instance and its attributes and methods is has-a.
- Any instance of
Binherits the behavior (i.e. methods) defined in class
class Mammal: def eat(self): print('eating...') class Human(Mammal): def speak(self): print('It is I!') h = Human() h.speak() # It is I! h.eat() # eating...
2C Overriding inherited behavior
A subclass can override the behavior in its superclass.
class Mammal: def eat(self): print('eating...') class Human(Mammal): def eat(self): print('say grace ...') print('eating ...') m = Mammal() m.eat() # eating ... h = Human() h.eat() # say grace ... # eating ...
- The mechanism through which an OOP language achieves this is part of how its
method resolution works. In the above example, when you call
hPython needs to resolve which of the two definitions of
eatto execute, the one defined in
Mammalor the one in
- The ability to override behavior inherited from a superclass allows us to
achieve what is called polymorphism. For example the behavior (i.e method)
eatin the above example is polymorphic between
Mammal. At this level, the word polymorphism is synonymous with (and fancy speak for) overriding; just know that this term exists.
2D Simple usage of
It is often necessary or useful for an overridden method in a subclass to
delegate parts of its job to superclass's (overridden) method. This is where
super() comes in. In the above example the
eat method of the
class is merely adding an additional step before doing the exact same thing
as its superclass. To keep the code simpler (and more DRY, standing for
don't repeat yourself) we can write:
class Human(Mammal): def eat(self): print('say grace ...') super().eat() h = Human() h.eat() # say grace ... # eating ...
3A Object identity vs state
It is important to distinguish between the identity of an object and its state (its data). Two objects of the same class have the same state if they contain identical data. But they have the same identity only if they are literally stored in the same place in memory. Equal identity implies equal state, but not vice versa.
x = ['hello'] y = ['hello'] x == y # this compares state # True x is y # this compares identity # False z = x # this defines a new variable (i.e. a name) that points the # exact same place in memory as x z is x # True z.append('world') print(x) # ['hello', 'world'] w = x.copy() # this creates a new identity, a new place in memory # with identical contents as the original w == x # True w is x # False
We can access the identity of an object in Python by using the
function which returns the memory address of the object it's given. This is the
only certain way to verify that two variables have values that are identical in
identity and not just state (i.e. modifying one will modify the other one).
Aside: Not all types allow the state of their instances to be modified.
These are called immutable types (and their instances are also called
immutable). The immutable types are all built-in types:
tuple. Other built-in types are mutable:
All user-defined classes (types defined in code) are mutable too.
3B Passing objects as function arguments
Objects (instances of classes) can be used as any other value, specifically they can be passed as arguments to functions. It is important to understand how the passed object is treated in the new scope of the function:
- The new variable (in the function scope) has the name as defined by the signature of the function and the value as provided by the caller (equal state).
- the new variable also has equal identity as the object provided by the caller.
- This means that if one changes the state of the passed object in the function, this change is state will be seen by the caller. This is sometimes desired and sometimes undesired.
def f(some_list): some_list.append('world') return some_list def g(some_list): some_list = some_list.copy() some_list.append('world') return some_list x = ['hello'] y = f(x) print(y) # ['hello', 'world'] print(x) # ['hello', 'world'] x = ['hello'] z = g(x) print(z) # ['hello', 'world'] print(x) # ['hello']
3C Class attributes and methods
Classes themselves can have attributes and methods. These are variables and functions, respectively, that are shared between (aka common to) all instances of that class. In Python these are called class attributes (as opposed to instance attributes or just attributes) and class methods (as opposed to instance methods or just methods).
class HomoSapiens: speciation_age = 350000 # this is a class attribute @classmethod def describe_species(cls): # this is a class method return '%s, a %d years old species' % (cls.__name__, cls.speciation_age) def __init__(self, name): self.name = name def introduce(self): print('Hello! I am %s. I am a %s.' % (self.name, self.describe_species())) print(HomoSapiens.speciation_age) # 350000 print(HomoSapiens.describe_species()) # HomoSapiens, a 350000 year old species h = HomoSapiens('John') print(h.name) # 'John' print(h.speciation_age) # 350000 print(h.describe_species()) # HomoSapiens, a 350000 year old species print(h.introduce()) # Hello! I am John. I am a HomoSapiens, a 350000 year old species.
- Notice the magic happening in the signature of class methods: the first
clsby convention) is automatically set by the language to the class; you have no control over it. This is similar to the way the first argument of instance methods (called
selfby convention) is automatically set by the language to the bound instance.
- Notice the fact that Python allows you to access class attributes and
class methods both from the class
HomoSapiensand from the instance
h.speciation_age). But be careful! This shothand mechanism only works for reading class attributes, not for writing to them: If we say
h.speciation_age = 12this would create a new instance attribute for the instance
hand set it to 12. This will not affect the class attribute value and no other instance of
hwill see that new attribute.
- Notice how
@is used to define class methods in a similar way as the way properties (see above) are defined. These are both examples of a feature in Python called decorators (
propertyare both decorators, and
@propertydecorate the functions that immediately follows them). Decorators are not particular to OOP and are very useful. You can even define your own decorators!
- It is sometimes useful to modify a method in a class from "the outside" (i.e. when we cannot or would prefer not to modify the source code of that class). There is a way to do this which is called monkeypatching (more on this in level 5).
4A Abstract Classes
Abstract classes are a mechanism for us to define the interface of a class without specifying its implementation. What makes an abstract class abstract is its abstract methods which define the signature of a method without specifying its implementation. An abstract class cannot be instantiated. Instead one needs to define non-abstract subclasses of the abstract class which provide an implementation for all abstract methods of the abstract superclass. Such a subclass can then be instantiated as usual.
Not all OOP languages provide a mechanism for this (e.g. Ruby does not) and the ones that do (e.g. Python, Java, and PHP all do) provide it in a variety of ways.
In Python abstract classes are defined by extending a special base class from
abc module, example:
from abc import ABC, abstractmethod class AbstractCarnivore(ABC): @abstractmethod def hunt(self): pass def eat(self): self.hunt() print('eating ...') class Human(AbstractCarnivore): def hunt(self): print('hunting ...') x = AbstractCarnivore() # TypeError: Can't instantiate abstract class AbstractCarnivore with abstract methods hunt h = Human() h.eat() # hunting ... # eating ...
Aside: The whole point of abstract classes is ease of extensibility: the author of an abstract class is merely communicating to other programmers the contract that their subclasses must satisfy (the contract being the abstract methods) for it to take advantage of the other (non-abstract, implemented) aspects of the base class.
4B Multiple Inheritance
Multiple inheritance is a mechanism in some programming languages that allows classes to inherit from multiple superclasses (as opposed to a single superclass in single inheritance). Under single inheritance all class hierarchies are trees in the end. With multiple inheritance class hierarchies can become DAGs instead of trees.
- Multiple inheritance can easily get really gnarly; a good simple example of
how things can get messy is what is called the diamond problem: Suppose
A, and that
Adefines a method
f()that is overridden both by
Cbut not by
Dwhich version of it should be inherited by
- Central to understanding Python's version of multiple inheritance is its
method resolution order (MRO) algorithm which dictates how
super()gets resolved under multiple inheritance. It is the MRO that is responsible for, say, addressing the diamond problem.
- There are two common and useful design patterns in multiple inheritance: mixins and cooperative multiple inheritance. You should probably know about them before trying to write multiple inheritance code in production.
- There is an OOP principle called composition over inheritance which recommends that it's often better to achieve the desired behavior by composing different classes through has-a relationships rather than inheritance (is-a relationships). There is a lot of truth to this; but then again, inheritance (single and multiple) are both extremely useful. Finding the right balance is a matter of problem context, experience, and to some extent, taste.
This example illustrates how to address the diamond problem using cooperative inheritance in Python:
class A: def __init__(self): print("A") super().__init__() class B(A): def __init__(self): print("B") super().__init__() class C(A): def __init__(self): print("C") super().__init__() class D(B, C): def __init__(self): print("D") super().__init__() D() # D # B # C # A
4C Magic methods in Python
Magic (aka dunder) methods are methods with names of the form
and they have special (magic) properties. A lot of them exist in all Python
objects (they are inherited from the
object class, the superclass of all
classes). But there are also a lot of them that could be implemented by a class
to give it special properties.
Magic methods are a very versatile bunch that provide us, the programmer, with a lot of power that is unique to Python. Here is an incomplete list that covers the majority of magic methods, in a very rough and subjective order of usefulness:
__str__allows an object to control how it behaves when it's cast to a string (e.g. when it's given to
__exit__allow an object to become a context manager (i.e. you can use it in a
__getattribute__expose the internal mechanics of attribute resolution and allow you to have more control over how they work in a class.
__hash__allow an object to take control of how it behaves under equality comparisons (i.e. when used in
==) and when its hashed (i.e. passed to
hash, e.g. when it's used as a dictionary key), respectively. These two dunder methods are deeply related and must coordinate their behavior.
__next__allow an object to become iterable (i.e. can be iterated through, e.g. with a
forloop). Related to this is
__len__which is allows an iterable to specify its length (i.e. what happens when it's given to
__contains__which allows an iterably to check membership (i.e. what happens when one uses the
__call__allows an object to become callable (i.e. can be called, just like a function)
- numeric operators, e.g.
__ge__that expose the internal mechanics of how common numeric operators work. The specific list of examples above correspond to
>=(there are many more of theses). When a class overrides these methods we say that it's overloading operators (e.g. Pandas and numpy make extensive use of operator overloading to provide syntactic convenience).
__setitem__allow an object to behave like a dictionary.
__repr__allows an object to control how it behaves when it's given to
repr: the goal is to generate a Python expression (i.e. code) that would reproduce that object when executed. A good rule of thumb is that one should have
x == eval(repr(x)).
__set__allows you to define descriptors and give you even more control over how attributes work (this is how
propertyis internally implemented).
__setstate__, and friends allow an object to implement the pickle protocol.
__slots__are magic attributes (not methods) that expose the internal machinery of how attributes are stored in standard objects.
__del__are the other friends of
__init__and part of the machinery of object lifetime.
Level 5 - Beyond
The following are language-specific and advanced features of the Python OOP model. The bad news is that there is a lot of nuance and subtlety in each of them which can be quite confusing when you are new to the ideas summarized in this document. The good news is that knowledge of them is only useful in very specific scenarios; that means you can safely ignore them for a while.
- In Python, everything is an object. And everything literally means (almost) everything. Classes, functions, and modules are all objects! This obviously has a lot of implications, a lot of which you are probably already using (e.g. having functions that return functions, which is what allows decorators to be possible).
- Under the hood of object lifetime:
- Dynamic creation of new types and modification of existing ones (e.g.
monkeypatching) using the
- Reflection in python: dynamic inspection of objects (and by extension
modules, functions, classes, etc.) using the