amirkdv/oop.md

## oop.md

      
    Raw
  

              oop.md
            
          
    OOP in Python, A Graded Knowledge Check

Table of Contents


Introduction
Level 0 - Background Knowledge

0A Types, values and variables
0B What happens when a function is executed?


Level 1

1A Classes and instances
1B Instantiating a class
1C Instances have state and behavior


Level 2

2A Two important relationships: is-a and has-a
2B Inheritance and class hierarchies
2C Overriding inherited behavior
2D Simple usage of super


Level 3

3A Object identity vs state
3B Passing object as function arguments
3C Class attributes and methods


Level 4

4A Abstract classes
4B Multiple inheritance
4C Magic methods in Python


Level 5 - Beyond

Introduction

This document tries to provide a checklist of important concepts in
object-oriented programming with a heavy focus on Python. This document is not:

a complete (or even good) introduction to object oriented programming.
intended for learning new concepts; my only goal is to help you identify holes
in your knowledge.

Everything in bold is a specific, commonly used, technical term. Unless
otherwise specified, the technical terms are generic and not Python-specific.
I recommend that you try to always keep the distinction between general concepts
and their language-specific realization (e.g. how they are in Python) separate.
The internal implementation of OOP in Python has changed a significant amount
between versions 2 and 3, I think the latter is more consistent and easy to
follow. All code examples are in Python 3; the output
of print calls are shown as comments in the following line.
Level 0 - Background Knowledge

0A Types, Values, and Variables

We need to distinguish between types, values and variables. For
example, when we say x = 'abraham' this statement does a bunch of things:

create a value 'abraham' of type str,
create a variable x of type str and assigns the value 'abraham' to it.

If we then say print('My name is %s.' % x). This will:

evaluate the expression 'My name is %s.' % x which produces a
new str value 'My name is abraham.'. This value is as legitimate of a value
as the original 'abraham' even though it is not assigned to any variable.
It calls the print function with the argument, which must be a value, 'My name is abraham.'.

The definition of a type, roughly speaking, captures two
things: the kinds of data it can hold and the behavior (aka operators)
associated with it. For example, when we say x = 'abraham':

The fact that the sequence of bits (0s and 1s) that store this string in
memory is interpreted as a string (and not say an integer) is part of the
definition of the type str
The fact that we can do things like x.title() (which evaluates to Abraham) or
x + ' lincoln' (which evaluates to 'abraham lincoln') is part of the definition
of the type str.

0B What happens when a function is executed?

At a very high level when a function is called (aka invoked, aka
executed):

caller invokes the function and provides values for its arguments,
new scope is created,
arguments are passed from the caller and assigned to variables in the
new scope,
function body is executed and a value is returned to the caller.

Exercise

Follow the above steps in this example:
def square(x):
    return x ** 2

print(square(2))
# 4

print(square(square(2))
# 16
Note that there are a variety of ways in which a function might "do its job":

returning an output,
modifying the arguments themselves,
neither of the two  (e.g. storing/sending data somewhere else)

Level 1

1A Classes and Instances

Instances (aka objects) are related to classes in the same way that
values are related to their types. In Python this analogy is literally true; a
class literally defines a new type in the same way that, say, int is a type:
x = 2
type(x) == int
# True

isinstance(int, type)
# True

class Human:
    pass

h = Human()
type(h) == Human
# True

isinstance(Human, type)
# True
The types (i.e. classes) provided by the language itself are called
built-in types; int, float, set, list, dict (and a bunch more)
are all built-in types.
1B Instantiating a class

When we create an instance of a class we say we are instantiating that
class. In Python, calling __init__ is the last, and most commonly modified,
step of the instantiation process (more on this in level 5).
Example

class Human:
    def __init__(self):
        print('executing __init__')

h = Human()
# executing __init__

In most other languages the process of instantiation is handled by a
function called a constructor, Python has something sort of similar to
that which is __new__. The subtle difference between __init__ and
__new__ (we rarely work directly with the latter) is for level 5. Ignore
all of this for now, but know that the word constructor is a very commonly
used word and to a good approximation, the Python version of it is
__init__.
Notice that instantiating a class has the same syntax in Python as function
calls. This is a Python-specific feature. Many other languages (e.g. Java,
C++, PHP, JavaScript) have a new keyword that is used when
instantiating classes (e.g. you would say h = new Human().
In Python there is no new keyword.

1C Instances have state and behavior

An object (a class instance) has state (i.e. data) and behavior (i.e.
code). The behavior of an object is all its methods (almost all OOP
languages use this term) and its state is its attributes (this is
Python-specific terminology; most other languages call these instance
variables). It is important to know how to work with attributes and
methods of an instance, their scope and how to access them via self.
Attributes are like variables but they belong to an instance (aka
object). You can get them or set them (aka read them or write to them)
like any other variable:
class Person:
    def __init__(self, name):
        self.name = name

p = Person('Mary')
print(p.name)
# Mary

p.name = 'John'
print(p.name)
# John
Methods are like functions but they belong to an instance (more
specifically they are bound to that instance). You can call them like
any other function:
class Person:
    def __init__(self, name):
        self.name = name

    def hello(self):
        print('Hello! My name is %s.' % self.name)

p = Person('Julie')
p.hello()
# Hello! My name is Julie.

p.name = 'Bob'
p.hello()
# Hello! My name is Bob.


Notice the magic happening in the signature of methods: the first
argument (called self by convention) is automatically set by the
language to the bound instance; you have no control over it. In a lot of
programming languages (e.g. Java, C++, PHP, JavaScript) this magic
happens implicitly: in the body of a method you can access the bound
instance via a this keyword. There is no such thing in Python. The fact that
self is explicit in Python is a reflection of its philosophy of "Explicit
is better than implicit."


Python has the notion of a property which is a method that behaves like an
attribute. The whole point of this is convenience. For example:
from datetime import datetime

class Person:
    def __init__(self, yob):
        self.yob = yob

    @property
    def age(self):
        return datetime.now().year - self.yob

p = Person(2000)
p.age
# 19


Aside: A lot of languages (e.g. Java, C++, PHP) require attributes and
methods to be either private or public, or (in some languages)
protected. None of this exists in Python. However, there are conventions
that kind of achieve the same goal in the end, and that is the use of a leading
underscore (e.g.  _some_func) to signal to other programmers "don't muck with
this". You can ignore this whole business at this level.
Level 2

2A Two important relationships: has-a and Is-a

Two useful words to describe certain relationships are: is-a ('hello' is-a
str) and has-a ('hello' has-a length). For example, one might say:

Abraham Lincoln (instance) is-a Human (type).
Every Human (type) is-a Mammal (type); every Mammal is-a Vertebrate
(type); and every Vertebrate is-a Animal (type).
Abraham Lincoln, by extension, is-a Mammal, Vertebrate, and Animal.
In contrast, my dog (the individual, instance) is-a Mammal but not is-a Human.
Any Mammal has-a neocortex, and therefore, both Abraham Lincoln and my dog
has-a neocortex.
Similarly, Lonesome George, is-a vertebrate, but not is-a Mammal,
has-a backbone, but not has-a neocortex.

2B Inheritance and class hierarchies

Similar to the above intuitive idea, class hierarchies can be built through
inheritance. A class B can be a subclass another class A (aka class
B extends class A, aka A is a superclass, or base class of
B).  This means that:

Any instance of B, aside from is-a B, also is-a A (i.e. is an instance
of class A as well).
The relationship between an instance and its attributes and methods is has-a.
Any instance of B inherits the behavior (i.e. methods) defined in class A.

Example

class Mammal:
    def eat(self):
        print('eating...')

class Human(Mammal):
    def speak(self):
        print('It is I!')

h = Human()
h.speak()
# It is I!

h.eat()
# eating...
2C Overriding inherited behavior

A subclass can override the behavior in its superclass.
Example

class Mammal:
    def eat(self):
        print('eating...')

class Human(Mammal):
    def eat(self):
        print('say grace ...')
        print('eating ...')

m = Mammal()
m.eat()
# eating ...

h = Human()
h.eat()
# say grace ...
# eating ...

The mechanism through which an OOP language achieves this is part of how its
method resolution works. In the above example, when you call eat on h
Python needs to resolve which of the two definitions of eat to execute,
the one defined in Mammal or the one in Human.
The ability to override behavior inherited from a superclass allows us to
achieve what is called polymorphism. For example the behavior (i.e method)
eat in the above example is polymorphic between Human and Mammal. At
this level, the word polymorphism is synonymous with (and fancy speak for)
overriding; just know that this term exists.

2D Simple usage of super

It is often necessary or useful for an overridden method in a subclass to
delegate parts of its job to superclass's (overridden) method. This is where
super() comes in. In the above example the eat method of the Human
class is merely adding an additional step before doing the exact same thing
as its superclass. To keep the code simpler (and more DRY, standing for
don't repeat yourself) we can write:
class Human(Mammal):
    def eat(self):
        print('say grace ...')
        super().eat()

h = Human()
h.eat()
# say grace ...
# eating ...
Level 3

3A Object identity vs state

It is important to distinguish between the identity of an object and its
state (its data). Two objects of the same class have the same state if they
contain identical data. But they have the same identity only if they are
literally stored in the same place in memory. Equal identity implies equal
state, but not vice versa.
Example

x = ['hello']
y = ['hello']

x == y        # this compares state
# True

x is y        # this compares identity
# False

z = x         # this defines a new variable (i.e. a name) that points the
              # exact same place in memory as x
z is x
# True

z.append('world')
print(x)
# ['hello', 'world']

w = x.copy() # this creates a new identity, a new place in memory
             # with identical contents as the original
w == x
# True
w is x
# False
We can access the identity of an object in Python by using the id() built-in
function which returns the memory address of the object it's given. This is the
only certain way to verify that two variables have values that are identical in
identity and not just state (i.e. modifying one will modify the other one).
Aside: Not all types allow the state of their instances to be modified.
These are called immutable types (and their instances are also called
immutable).  The immutable types are all built-in types: int, bool, float,
str, and tuple. Other built-in types are mutable: dict, list, and set.
All user-defined classes (types defined in code) are mutable too.
3B Passing objects as function arguments

Objects (instances of classes) can be used as any other value, specifically
they can be passed as arguments to functions. It is important to understand how
the passed object is treated in the new scope of the function:

The new variable (in the function scope) has the name as defined by the
signature of the function and the value as provided by the caller (equal
state).
the new variable also has equal identity as the object provided by the caller.
This means that if one changes the state of the passed object in the function,
this change is state will be seen by the caller. This is sometimes desired and
sometimes undesired.

Example

def f(some_list):
    some_list.append('world')
    return some_list

def g(some_list):
    some_list = some_list.copy()
    some_list.append('world')
    return some_list

x = ['hello']
y = f(x)
print(y)
# ['hello', 'world']
print(x)
# ['hello', 'world']

x = ['hello']
z = g(x)
print(z)
# ['hello', 'world']
print(x)
# ['hello']
3C Class attributes and methods

Classes themselves can have attributes and methods. These are variables and
functions, respectively, that are shared between (aka common to) all instances
of that class.  In Python these are called class attributes (as opposed to
instance attributes or just attributes) and class methods (as opposed to
instance methods or just methods).
In a lot of programming languages (e.g. Java, C++, PHP, JavaScript) methods that
belong to a class (i.e. shared between all instances) are called static
methods. In Python these are called class methods. Unfortunately for a
beginner, Python also has static methods which are slightly different (and
simpler, and less useful) than class methods.
Example

class HomoSapiens:

    speciation_age = 350000 # this is a class attribute

    @classmethod
    def describe_species(cls): # this is a class method
        return '%s, a %d years old species' % (cls.__name__, cls.speciation_age)

    def __init__(self, name):
        self.name = name

    def introduce(self):
        print('Hello! I am %s. I am a %s.' % (self.name, self.describe_species()))

print(HomoSapiens.speciation_age)
# 350000
print(HomoSapiens.describe_species())
# HomoSapiens, a 350000 year old species

h = HomoSapiens('John')
print(h.name)
# 'John'
print(h.speciation_age)
# 350000
print(h.describe_species())
# HomoSapiens, a 350000 year old species
print(h.introduce())
# Hello! I am John. I am a HomoSapiens, a 350000 year old species.
Notes:

Notice the magic happening in the signature of class methods: the first
argument (called cls by convention) is automatically set by the
language to the class; you have no control over it. This is similar to the
way the first argument of instance methods (called self by convention)
is automatically set by the language to the bound instance.
Notice the fact that Python allows you to access class attributes and
class methods both from the class HomoSapiens and from the instance h
of that class. This is a Python-specific feature (this is part of how
Python's name resolution works). In many other languages (e.g. Java,
C++, PHP, JavaScript) you can only access class methods through the
class itself (i.e. HomoSapiens.speciation_age and not h.speciation_age).
But be careful! This shothand mechanism only works for reading class attributes,
not for writing to them: If we say h.speciation_age = 12 this would create a
new instance attribute for the instance h and set it to 12. This will not
affect the class attribute value and no other instance of Human but h will
see that new attribute.
Notice how @ is used to define class methods in a similar way as the way
properties (see above) are defined. These are both examples of a feature
in Python called decorators (classmethod and property are both
decorators, and @classmethod and @property decorate the functions that
immediately follows them). Decorators are not particular to OOP and are
very useful. You can even define your own decorators!
It is sometimes useful to modify a method in a class from "the outside" (i.e.
when we cannot or would prefer not to modify the source code of that class).
There is a way to do this which is called monkeypatching (more on this in
level 5).

Level 4

4A Abstract Classes

Abstract classes are a mechanism for us to define the interface of a
class without specifying its implementation. What makes an abstract class
abstract is its abstract methods which define the signature of a method
without specifying its implementation. An abstract class cannot be instantiated.
Instead one needs to define non-abstract subclasses of the abstract class which
provide an implementation for all abstract methods of the abstract superclass.
Such a subclass can then be instantiated as usual.
Not all OOP languages provide a mechanism for this (e.g. Ruby does not) and the
ones that do (e.g. Python, Java, and PHP all do) provide it in a variety of
ways.
In Python abstract classes are defined by extending a special base class from
the built-in abc module, example:
from abc import ABC, abstractmethod

class AbstractCarnivore(ABC):
    @abstractmethod
    def hunt(self):
        pass

    def eat(self):
        self.hunt()
        print('eating ...')

class Human(AbstractCarnivore):
    def hunt(self):
        print('hunting ...')

x = AbstractCarnivore()
# TypeError: Can't instantiate abstract class AbstractCarnivore with abstract methods hunt

h = Human()
h.eat()
# hunting ...
# eating ...
Aside: The whole point of abstract classes is ease of extensibility: the
author of an abstract class is merely communicating to other programmers the
contract that their subclasses must satisfy (the contract being the abstract
methods) for it to take advantage of the other (non-abstract, implemented)
aspects of the base class.
4B Multiple Inheritance

Multiple inheritance is a mechanism in some programming languages that
allows classes to inherit from multiple superclasses (as opposed to a single
superclass in single inheritance). Under single inheritance all class
hierarchies are trees in the end. With multiple inheritance class hierarchies
can become DAGs instead of trees.
General notes:

Not all OOP languages allow multiple inheritance (e.g. Java does not) and
those that do (e.g. Ruby, PHP, JavaScript ES6) provide it in different ways,
with different names, and with different limitations.
Multiple inheritance can easily get really gnarly; a good simple example of
how things can get messy is what is called the diamond problem: Suppose
classes B and C extend A, and that D extends both B and C. If A
defines a method f() that is overridden both by B and C but not by D which
version of it should be inherited by D?
Central to understanding Python's version of multiple inheritance is its
method resolution order (MRO) algorithm which dictates how super() gets
resolved under multiple inheritance. It is the MRO that is responsible for,
say, addressing the diamond problem.
There are two common and useful design patterns in multiple inheritance:
mixins and cooperative multiple inheritance. You should probably know
about them before trying to write multiple inheritance code in production.
There is an OOP principle called composition over inheritance which
recommends that it's often better to achieve the desired behavior by
composing different classes through has-a relationships rather than
inheritance (is-a relationships). There is a lot of truth to this; but then
again, inheritance (single and multiple) are both extremely useful. Finding
the right balance is a matter of problem context, experience, and to some
extent, taste.

Example

This example illustrates how to address the diamond problem using cooperative inheritance in Python:
class A:
    def __init__(self):
        print("A")
        super().__init__()

class B(A):
    def __init__(self):
        print("B")
        super().__init__()

class C(A):
    def __init__(self):
        print("C")
        super().__init__()

class D(B, C):
    def __init__(self):
        print("D")
        super().__init__()

D()

# D
# B
# C
# A
4C Magic methods in Python

Magic (aka dunder) methods are methods with names of the form __X__
and they have special (magic) properties. A lot of them exist in all Python
objects (they are inherited from the object class, the superclass of all
classes). But there are also a lot of them that could be implemented by a class
to give it special properties.
Magic methods are a very versatile bunch that provide us, the programmer, with a
lot of power that is unique to Python. Here is an incomplete list that covers
the majority of magic methods, in a very rough and subjective order of
usefulness:

__str__ allows an object to control how it behaves when it's cast to a
string (e.g. when it's given to print).
__enter__ and __exit__ allow an object to become a context manager (i.e.
you can use it in a with statement).
__setattr__, __getattr__ and __getattribute__ expose the internal
mechanics of attribute resolution and allow you to have more control over
how they work in a class.
__eq__ and __hash__ allow an object to take control of how it behaves
under equality comparisons (i.e. when used in ==) and when its hashed
(i.e. passed to hash, e.g. when it's used as a dictionary key), respectively.
These two dunder methods are deeply related and must coordinate their behavior.
__iter__ and __next__ allow an object to become iterable (i.e. can be
iterated through, e.g. with a for loop). Related to this is __len__
which is allows an iterable to specify its length (i.e. what happens when
it's given to len()) and __contains__ which allows an iterably to check
membership (i.e. what happens when one uses the in keyword).
__call__ allows an object to become callable (i.e. can be called, just
like a function)
numeric operators, e.g. __add__, __div__, __mul__, __eq__, __ge__
that expose the internal mechanics of how common numeric operators work. The
specific list of examples above correspond to +, /, *, ==, >=
(there are many more of theses). When a class overrides these methods we
say that it's overloading operators (e.g. Pandas and numpy make extensive
use of operator overloading to provide syntactic convenience).
__getitem__ and __setitem__ allow an object to behave like a dictionary.
__repr__ allows an object to control how it behaves when it's given to
repr: the goal is to generate a Python expression (i.e. code) that
would reproduce that object when executed. A good rule of thumb is that one
should have x == eval(repr(x)).
__get__ and __set__ allows you to define descriptors and give you even
more control over how attributes work (this is how property is internally
implemented).
__getstate__, __setstate__, and friends allow an object to implement the
pickle protocol.
__dict__ and __slots__ are magic attributes (not methods) that expose the
internal machinery of how attributes are stored in standard objects.
__new__ and __del__ are the other friends of __init__ and part of the
machinery of object lifetime.

Level 5 - Beyond

The following are language-specific and advanced features of the Python OOP
model. The bad news is that there is a lot of nuance and subtlety in each of
them which can be quite confusing when you are new to the ideas summarized
in this document. The good news is that knowledge of them is only useful in
very specific scenarios; that means you can safely ignore them for a while.

In Python, everything is an object. And everything literally means
(almost) everything. Classes, functions, and modules are all objects! This
obviously has a lot of implications, a lot of which you are probably already
using (e.g. having functions that return functions, which is what allows
decorators to be possible).
Under the hood of object lifetime: __new__, __init__, __del__ and
metaclasses.
Dynamic creation of new types and modification of existing ones (e.g.
monkeypatching) using the types module.
Reflection in python: dynamic inspection of objects (and by extension
modules, functions, classes, etc.) using the inspect module.