Skip to content

Instantly share code, notes, and snippets.

@amirkdv amirkdv/oop.md
Last active Dec 12, 2019

Embed
What would you like to do?
Object Oriented Programming in Python, A Graded Knowledge Check

OOP in Python, A Graded Knowledge Check

Table of Contents

Introduction

This document tries to provide a checklist of important concepts in object-oriented programming with a heavy focus on Python. This document is not:

  • a complete (or even good) introduction to object oriented programming.
  • intended for learning new concepts; my only goal is to help you identify holes in your knowledge.

Everything in bold is a specific, commonly used, technical term. Unless otherwise specified, the technical terms are generic and not Python-specific. I recommend that you try to always keep the distinction between general concepts and their language-specific realization (e.g. how they are in Python) separate.

The internal implementation of OOP in Python has changed a significant amount between versions 2 and 3, I think the latter is more consistent and easy to follow for mere mortals like us. All code examples are in Python 3; the output of print calls are shown as comments in the following line.

Level 0 - Background Knowledge

0A Types, Values, and Variables

We need to distinguish between types, values and variables. For example, when we say x = 'abraham' this statement does a bunch of things:

  • create a value 'abraham' of type str,
  • create a variable x of type str and assigns the value 'abraham' to it.

If we then say print('My name is %s.' % x). This will:

  • evaluate the expression 'My name is %s.' % x which produces a new str value 'My name is abraham.'. This value is as legitimate of a value as the original 'abraham' even though it is not assigned to any variable.
  • It calls the print function with the argument, which must be a value, 'My name is abraham.'.

The definition of a type, roughly speaking, captures two things: the kinds of data it can hold and the behavior (aka operators) associated with it. For example, when we say x = 'abraham':

  1. The fact that the sequence of bits (0s and 1s) that store this string in memory is interpreted as a string (and not say an integer) is part of the definition of the type str
  2. The fact that we can do things like x.title() (which evaluates to Abraham) or x + ' lincoln' (which evaluates to 'abraham lincoln') is part of the definition of the type str.

0B What happens when a function is executed?

At a very high level when a function is called (aka invoked, aka executed):

  1. caller invokes the function and provides values for its arguments,
  2. new scope is created,
  3. arguments are passed from the caller and assigned to variables in the new scope,
  4. function body is executed and a value is returned to the caller.

Exercise

Follow the above steps in this example:

def square(x):
    return x ** 2

print(square(2))
# 4

print(square(square(2))
# 16

Note that there are a variety of ways in which a function might "do its job":

  • returning an output,
  • modifying the arguments themselves,
  • neither of the two (e.g. storing/sending data somewhere else)

Level 1

1A Classes and Instances

Instances (aka objects) are related to classes in the same way that values are related to their types. In Python this analogy is literally true; a class literally defines a new type in the same way that, say, int is a type:

x = 2
type(x) == int
# True

isinstance(int, type)
# True

class Human:
    pass

h = Human()
type(h) == Human
# True

isinstance(Human, type)
# True

The types (i.e. classes) provided by the language itself are called built-in types; int, float, set, list, dict (and a bunch more) are all built-in types.

1B Instantiating a class

When we create an instance of a class we say we are instantiating that class. In Python, calling __init__ is the last, and most commonly modified, step of the instantiation process (more on this in level 5).

Example

class Human:
    def __init__(self):
        print('executing __init__')

h = Human()
# executing __init__
  • In most other languages the process of instantiation is handled by a function called a constructor, Python has something sort of similar to that which is __new__. The subtle difference between __init__ and __new__ (we rarely work directly with the latter) is for level 5. Ignore all of this for now, but know that the word constructor is a very commonly used word and to a good approximation, the Python version of it is __init__.
  • Notice that instantiating a class has the same syntax in Python as function calls. This is a Python-specific feature. Many other languages (e.g. Java, C++, PHP, JavaScript) have a new keyword that is used when instantiating classes (e.g. you would say h = new Human(). In Python there is no new keyword.

1C Instances have state and behavior

An object (a class instance) has state (i.e. data) and behavior (i.e. code). The behavior of an object is all its methods (almost all OOP languages use this term) and its state is its attributes (this is Python-specific terminology; most other languages call these instance variables). It is important to know how to work with attributes and methods of an instance, their scope and how to access them via self.

Attributes are like variables but they belong to an instance (aka object). You can get them or set them (aka read them or write to them) like any other variable:

class Person:
    def __init__(self, name):
        self.name = name

p = Person('Mary')
print(p.name)
# Mary

p.name = 'John'
print(p.name)
# John

Methods are like functions but they belong to an instance (more specifically they are bound to that instance). You can call them like any other function:

class Person:
    def __init__(self, name):
        self.name = name

    def hello(self):
        print('Hello! My name is %s.' % self.name)

p = Person('Julie')
p.hello()
# Hello! My name is Julie.

p.name = 'Bob'
p.hello()
# Hello! My name is Bob.
  • Notice the magic happening in the signature of methods: the first argument (called self by convention) is automatically set by the language to the bound instance; you have no control over it. In a lot of programming languages (e.g. Java, C++, PHP, JavaScript) this magic happens implicitly: in the body of a method you can access the bound instance via a this keyword. There is no such thing in Python. The fact that self is explicit in Python is a reflection of its philosophy of "Explicit is better than implicit."

  • Python has the notion of a property which is a method that behaves like an attribute. The whole point of this is convenience. For example:

    from datetime import datetime
    
    class Person:
        def __init__(self, yob):
            self.yob = yob
    
        @property
        def age(self):
            return datetime.now().year - self.yob
    
    p = Person(2000)
    p.age
    # 19

Aside: A lot of languages (e.g. Java, C++, PHP) require attributes and methods to be either private or public, or (in some languages) protected. None of this exists in Python. However, there are conventions that kind of achieve the same goal in the end, and that is the use of a leading underscore (e.g. _some_func) to signal to other programmers "don't muck with this". You can ignore this whole business at this level.

Level 2

2A Two important relationships: has-a and Is-a

Two useful words to describe certain relationships are: is-a ('hello' is-a str) and has-a ('hello' has-a length). For example, one might say:

  • Abraham Lincoln (instance) is-a Human (type).
  • Every Human (type) is-a Mammal (type); every Mammal is-a Vertebrate (type); and every Vertebrate is-a Animal (type).
  • Abraham Lincoln, by extension, is-a Mammal, Vertebrate, and Animal.
  • In contrast, my dog (the individual, instance) is-a Mammal but not is-a Human.
  • Any Mammal has-a neocortex, and therefore, both Abraham Lincoln and my dog has-a neocortex.
  • Similarly, Lonesome George, is-a vertebrate, but not is-a Mammal, has-a backbone, but not has-a neocortex.

2B Inheritance and class hierarchies

Similar to the above intuitive idea, class hierarchies can be built through inheritance. A class B can be a subclass another class A (aka class B extends class A, aka A is a superclass, or base class of B). This means that:

  • Any instance of B, aside from is-a B, also is-a A (i.e. is an instance of class A as well).
  • The relationship between an instance and its attributes and methods is has-a.
  • Any instance of B inherits the behavior (i.e. methods) defined in class A.

Example

class Mammal:
    def eat(self):
        print('eating...')

class Human(Mammal):
    def speak(self):
        print('It is I!')

h = Human()
h.speak()
# It is I!

h.eat()
# eating...

2C Overriding inherited behavior

A subclass can override the behavior in its superclass.

Example

class Mammal:
    def eat(self):
        print('eating...')

class Human(Mammal):
    def eat(self):
        print('say grace ...')
        print('eating ...')

m = Mammal()
m.eat()
# eating ...

h = Human()
h.eat()
# say grace ...
# eating ...
  • The mechanism through which an OOP language achieves this is part of how its method resolution works. In the above example, when you call eat on h Python needs to resolve which of the two definitions of eat to execute, the one defined in Mammal or the one in Human.
  • The ability to override behavior inherited from a superclass allows us to achieve what is called polymorphism. For example the behavior (i.e method) eat in the above example is polymorphic between Human and Mammal. At this level, the word polymorphism is synonymous with (and fancy speak for) overriding; just know that this term exists.

2D Simple usage of super

It is often necessary or useful for an overridden method in a subclass to delegate parts of its job to superclass's (overridden) method. This is where super() comes in. In the above example the eat method of the Human class is merely adding an additional step before doing the exact same thing as its superclass. To keep the code simpler (and more DRY, standing for don't repeat yourself) we can write:

class Human(Mammal):
    def eat(self):
        print('say grace ...')
        super().eat()

h = Human()
h.eat()
# say grace ...
# eating ...

Level 3

3A Object identity vs state

It is important to distinguish between the identity of an object and its state (its data). Two objects of the same class have the same state if they contain identical data. But they have the same identity only if they are literally stored in the same place in memory. Equal identity implies equal state, but not vice versa.

Example

x = ['hello']
y = ['hello']

x == y        # this compares state
# True

x is y        # this compares identity
# False

z = x         # this defines a new variable (i.e. a name) that points the
              # exact same place in memory as x
z is x
# True

z.append('world')
print(x)
# ['hello', 'world']

w = x.copy() # this creates a new identity, a new place in memory
             # with identical contents as the original
w == x
# True
w is x
# False

We can access the identity of an object in Python by using the id() built-in function which returns the memory address of the object it's given. This is the only certain way to verify that two variables have values that are identical in identity and not just state (i.e. modifying one will modify the other one).

Aside: Not all types allow the state of their instances to be modified. These are called immutable types (and their instances are also called immutable). The immutable types are all built-in types: int, bool, float, str, and tuple. Other built-in types are mutable: dict, list, and set. All user-defined classes (types defined in code) are mutable too.

3B Passing objects as function arguments

Objects (instances of classes) can be used as any other value, specifically they can be passed as arguments to functions. It is important to understand how the passed object is treated in the new scope of the function:

  • The new variable (in the function scope) has the name as defined by the signature of the function and the value as provided by the caller (equal state).
  • the new variable also has equal identity as the object provided by the caller.
  • This means that if one changes the state of the passed object in the function, this change is state will be seen by the caller. This is sometimes desired and sometimes undesired.

Example

def f(some_list):
    some_list.append('world')
    return some_list

def g(some_list):
    some_list = some_list.copy()
    some_list.append('world')
    return some_list

x = ['hello']
y = f(x)
print(y)
# ['hello', 'world']
print(x)
# ['hello', 'world']

x = ['hello']
z = g(x)
print(z)
# ['hello', 'world']
print(x)
# ['hello']

3C Class attributes and methods

Classes themselves can have attributes and methods. These are variables and functions, respectively, that are shared between (aka common to) all instances of that class. In Python these are called class attributes (as opposed to instance attributes or just attributes) and class methods (as opposed to instance methods or just methods).

In a lot of programming languages (e.g. Java, C++, PHP, JavaScript) methods that belong to a class (i.e. shared between all instances) are called static methods. In Python these are called class methods. Unfortunately for a beginner, Python also has static methods which are slightly different (and simpler, and less useful) than class methods.

Example

class HomoSapiens:

    speciation_age = 350000 # this is a class attribute

    @classmethod
    def describe_species(cls): # this is a class method
        return '%s, a %d years old species' % (cls.__name__, cls.speciation_age)

    def __init__(self, name):
        self.name = name

    def introduce(self):
        print('Hello! I am %s. I am a %s.' % (self.name, self.describe_species()))

print(HomoSapiens.speciation_age)
# 350000
print(HomoSapiens.describe_species())
# HomoSapiens, a 350000 year old species

h = HomoSapiens('John')
print(h.name)
# 'John'
print(h.speciation_age)
# 350000
print(h.describe_species())
# HomoSapiens, a 350000 year old species
print(h.introduce())
# Hello! I am John. I am a HomoSapiens, a 350000 year old species.

Notes:

  • Notice the magic happening in the signature of class methods: the first argument (called cls by convention) is automatically set by the language to the class; you have no control over it. This is similar to the way the first argument of instance methods (called self by convention) is automatically set by the language to the bound instance.
  • Notice the fact that Python allows you to access class attributes and class methods both from the class HomoSapiens and from the instance h of that class. This is a Python-specific feature (this is part of how Python's name resolution works). In many other languages (e.g. Java, C++, PHP, JavaScript) you can only access class methods through the class itself (i.e. HomoSapiens.speciation_age and not h.speciation_age). But be careful! This shothand mechanism only works for reading class attributes, not for writing to them: If we say h.speciation_age = 12 this would create a new instance attribute for the instance h and set it to 12. This will not affect the class attribute value and no other instance of Human but h will see that new attribute.
  • Notice how @ is used to define class methods in a similar way as the way properties (see above) are defined. These are both examples of a feature in Python called decorators (classmethod and property are both decorators, and @classmethod and @property decorate the functions that immediately follows them). Decorators are not particular to OOP and are very useful. You can even define your own decorators!
  • It is sometimes useful to modify a method in a class from "the outside" (i.e. when we cannot or would prefer not to modify the source code of that class). There is a way to do this which is called monkeypatching (more on this in level 5).

Level 4

4A Abstract Classes

Abstract classes are a mechanism for us to define the interface of a class without specifying its implementation. What makes an abstract class abstract is its abstract methods which define the signature of a method without specifying its implementation. An abstract class cannot be instantiated. Instead one needs to define non-abstract subclasses of the abstract class which provide an implementation for all abstract methods of the abstract superclass. Such a subclass can then be instantiated as usual.

Not all OOP languages provide a mechanism for this (e.g. Ruby does not) and the ones that do (e.g. Python, Java, and PHP all do) provide it in a variety of ways.

In Python abstract classes are defined by extending a special base class from the built-in abc module, example:

from abc import ABC, abstractmethod

class AbstractCarnivore(ABC):
    @abstractmethod
    def hunt(self):
        pass

    def eat(self):
        self.hunt()
        print('eating ...')

class Human(AbstractCarnivore):
    def hunt(self):
        print('hunting ...')

x = AbstractCarnivore()
# TypeError: Can't instantiate abstract class AbstractCarnivore with abstract methods hunt

h = Human()
h.eat()
# hunting ...
# eating ...

Aside: The whole point of abstract classes is ease of extensibility: the author of an abstract class is merely communicating to other programmers the contract that their subclasses must satisfy (the contract being the abstract methods) for it to take advantage of the other (non-abstract, implemented) aspects of the base class.

4B Multiple Inheritance

Multiple inheritance is a mechanism in some programming languages that allows classes to inherit from multiple superclasses (as opposed to a single superclass in single inheritance). Under single inheritance all class hierarchies are trees in the end. With multiple inheritance class hierarchies can become DAGs instead of trees.

General notes:

  • Not all OOP languages allow multiple inheritance (e.g. Java does not) and those that do (e.g. Ruby, PHP, JavaScript ES6) provide it in different ways, with different names, and with different limitations.
  • Multiple inheritance can easily get really gnarly; a good simple example of how things can get messy is what is called the diamond problem: Suppose classes B and C extend A, and that D extends both B and C. If A defines a method f() that is overridden both by B and C but not by D which version of it should be inherited by D?
  • Central to understanding Python's version of multiple inheritance is its method resolution order (MRO) algorithm which dictates how super() gets resolved under multiple inheritance. It is the MRO that is responsible for, say, addressing the diamond problem.
  • There are two common and useful design patterns in multiple inheritance: mixins and cooperative multiple inheritance. You should probably know about them before trying to write multiple inheritance code in production.
  • There is an OOP principle called composition over inheritance which recommends that it's often better to achieve the desired behavior by composing different classes through has-a relationships rather than inheritance (is-a relationships). There is a lot of truth to this; but then again, inheritance (single and multiple) are both extremely useful. Finding the right balance is a matter of problem context, experience, and to some extent, taste.

Example

This example illustrates how to address the diamond problem using cooperative inheritance in Python:

class A:
    def __init__(self):
        print("A")
        super().__init__()

class B(A):
    def __init__(self):
        print("B")
        super().__init__()

class C(A):
    def __init__(self):
        print("C")
        super().__init__()

class D(B, C):
    def __init__(self):
        print("D")
        super().__init__()

D()

# D
# B
# C
# A

4C Magic methods in Python

Magic (aka dunder) methods are methods with names of the form __X__ and they have special (magic) properties. A lot of them exist in all Python objects (they are inherited from the object class, the superclass of all classes). But there are also a lot of them that could be implemented by a class to give it special properties.

Magic methods are a very versatile bunch that provide us, the programmer, with a lot of power that is unique to Python. Here is an incomplete list that covers the majority of magic methods, in a very rough and subjective order of usefulness:

  • __str__ allows an object to control how it behaves when it's cast to a string (e.g. when it's given to print).
  • __enter__ and __exit__ allow an object to become a context manager (i.e. you can use it in a with statement).
  • __setattr__, __getattr__ and __getattribute__ expose the internal mechanics of attribute resolution and allow you to have more control over how they work in a class.
  • __eq__ and __hash__ allow an object to take control of how it behaves under equality comparisons (i.e. when used in ==) and when its hashed (i.e. passed to hash, e.g. when it's used as a dictionary key), respectively. These two dunder methods are deeply related and must coordinate their behavior.
  • __iter__ and __next__ allow an object to become iterable (i.e. can be iterated through, e.g. with a for loop). Related to this is __len__ which is allows an iterable to specify its length (i.e. what happens when it's given to len()) and __contains__ which allows an iterably to check membership (i.e. what happens when one uses the in keyword).
  • __call__ allows an object to become callable (i.e. can be called, just like a function)
  • numeric operators, e.g. __add__, __div__, __mul__, __eq__, __ge__ that expose the internal mechanics of how common numeric operators work. The specific list of examples above correspond to +, /, *, ==, >= (there are many more of theses). When a class overrides these methods we say that it's overloading operators (e.g. Pandas and numpy make extensive use of operator overloading to provide syntactic convenience).
  • __getitem__ and __setitem__ allow an object to behave like a dictionary.
  • __repr__ allows an object to control how it behaves when it's given to repr: the goal is to generate a Python expression (i.e. code) that would reproduce that object when executed. A good rule of thumb is that one should have x == eval(repr(x)).
  • __get__ and __set__ allows you to define descriptors and give you even more control over how attributes work (this is how property is internally implemented).
  • __getstate__, __setstate__, and friends allow an object to implement the pickle protocol.
  • __dict__ and __slots__ are magic attributes (not methods) that expose the internal machinery of how attributes are stored in standard objects.
  • __new__ and __del__ are the other friends of __init__ and part of the machinery of object lifetime.

Level 5 - Beyond

The following are language-specific and advanced features of the Python OOP model. The bad news is that there is a lot of nuance and subtlety in each of them which can be quite confusing when you are new to the ideas summarized in this document. The good news is that knowledge of them is only useful in very specific scenarios; that means you can safely ignore them for a while.

  • In Python, everything is an object. And everything literally means (almost) everything. Classes, functions, and modules are all objects! This obviously has a lot of implications, a lot of which you are probably already using (e.g. having functions that return functions, which is what allows decorators to be possible).
  • Under the hood of object lifetime: __new__, __init__, __del__ and metaclasses.
  • Dynamic creation of new types and modification of existing ones (e.g. monkeypatching) using the types module.
  • Reflection in python: dynamic inspection of objects (and by extension modules, functions, classes, etc.) using the inspect module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.