Skip to content

Instantly share code, notes, and snippets.

@dstufft
Last active August 29, 2015 14:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dstufft/d4fc00108b323ffa8034 to your computer and use it in GitHub Desktop.
Save dstufft/d4fc00108b323ffa8034 to your computer and use it in GitHub Desktop.

The Theory of Type Hinting

Guido van Rossum, Dec 19, 2014 (with a few more recent updates)

Introduction

This document lays out the theory of the new type hinting proposal for Python 3.5. It's not quite a full proposal or specification because there are many details that need to be worked out, but it lays out the theory without which it is hard to discuss more detailed specifications. We start by explaining gradual typing; then we state some conventions and general rules; then we define the new special types (such as Union) that can be used in annotations; and finally we define the approach to generic types. (The latter section needs more fleshing out; sorry!)

Summary of gradual typing

We define a new relationship, is-consistent-with, which is similar to is-subclass-of, except it is not transitive when the new type Any is involved. (Neither relationship is symmetric.) Assigning x to y is OK if the type of x is consistent with the type of y. (Compare this to “... if the type of x is a subclass of the type of y,” which states one of the fundamentals of OO programming.) The is-consistent-with relationship is defined by three rules:

  • A type t1 is consistent with a type t2 if t1 is a subclass of t2. (But not the other way around.)
  • Any is consistent with every type. (But Any is not a subclass of every type.)
  • Every type is a subclass of Any. (Which also makes every type consistent with Any, via rule 1.)

That's all! See Jeremy Siek's blog post What is Gradual Typing for a longer explanation and motivation. Note that rule 3 places Any at the root of the class graph. This makes it very similar to object. The difference is that object is not consistent with most types (e.g. you can't use an object() instance where an int is expected). IOW both Any and object mean “any type is allowed” when used to annotate an argument, but only Any can be passed no matter what type is expected (in essence, Any shuts up complaints from the static checker).

Here's an example showing how these rules work out in practice:

  • Say we have an Employee class, and a subclass Manager:
    • class Employee: ...
    • class Manager(Employee): ...
  • Let's say variable e is declared with type Employee:
    • e = Employee()  # type: Employee
  • Now it's okay to assign a Manager instance to e (rule 1):
    • e = Manager()
  • It's not okay to assign an Employee instance to a variable declared with type Manager:
    • m = Manager()  # type: Manager
    • m = Employee()  # Fails static check
  • However, suppose we have a variable whose type is Any:
    • a = some_func()  # type: Any
  • Now it's okay to assign a to e (rule 2):
    • e = a  # OK
  • Of course it's also okay to assign e to a (rule 3), but we didn't need the concept of consistency for that:
    • a = e  # OK

Notational conventions

  • t1, t2 etc.  and u1, u2 etc. are types or classes. Sometimes we write ti or tj to refer to “any of t1, t2, etc.”
  • X, Y etc. are type variables (defined with Var(), see below).
  • C, D etc. are classes defined with a class statement.
  • x, y etc. are objects or instances.
  • We use the terms  type and  class interchangeably, and we assume type(x) is x.__class__.

General rules

  • Instance-ness is  derived from class-ness, e.g. x is an instance of t1 if  type(x) is a subclass of t1.
  • No types defined below (i.e. Any, Union etc.) can be instantiated. (But non-abstract subclasses of Generic can be.)
  • No types defined below can be subclassed, except for Generic and classes derived from it.
  • Where a type is expected, None can be substituted for type(None); e.g. Union[t1, None] == Union[t1, type(None)].

Types

  • Any. Every class is a subclass of Any; however, to the static type checker it is also consistent with every class (see above).
  • Union[t1, t2, ...]. Classes that are subclass of at least one of t1 etc. are subclasses of this. So are unions whose components are all subclasses of t1 etc. (Example: Union[int, str] is a subclass of Union[int, float, str].) The order of the arguments doesn't matter. (Example: Union[int, str] == Union[str, int].) If ti is itself a Union the result is flattened. (Example: Union[int, Union[float, str]] == Union[int, float, str].) If ti and tj have a subclass relationship, the less specific type survives. (Example: Union[Employee, Manager] == Union[Employee].) Union[t1] returns just t1. Union[] is illegal, so is Union[()]. Corollary: Union[..., Any, ...] returns Any; Union[..., object, ...] returns object; to cut a tie, Union[Any, object] == Union[object, Any] == Any.
  • Optional[t1]. Alias for Union[t1, None], i.e. Union[t1, type(None)].
  • Tuple[t1, t2, ..., tn]. A tuple whose items are instances of t1 etc.. Example: Tuple[int, float] means a tuple of two items, the first is an int, the second a float; e.g., (42, 3.14). Tuple[u1, u2, ..., um] is a subclass of Tuple[t1, t2, ..., tn] if they have the same length (n==m) and each ui is a subclass of ti. To spell the type of the empty tuple, use Tuple[()]. There is no way to define a variadic tuple type. (TODO: Maybe Tuple[t1, ...] with literal ellipsis?)
  • Callable[[t1, t2, ..., tn], tr]. A function with positional argument types t1 etc., and return type tr. The argument list may be empty (n==0). There is no way to indicate optional or keyword arguments, nor varargs (we don't need to spell those often enough to complicate the syntax — however, Reticulated Python has a useful idea here). This is covariant in the return type, but contravariant in the arguments. “Covariant” here means that for two callable types that differ only in the return type, the subclass relationship for the callable types follows that of the return types. (Example: Callable[[], Manager] is a subclass of Callable[[], Employee].) “Contravariant“ here means that for two callable types that differ only in the type of one argument, the subclass relationship for the callable types goes in the opposite direction as for the argument types. (Example: Callable[[Employee], None] is a subclass of Callable[[Mananger], None]. Yes, you read that right.)

We might add:

  • Intersection[t1, t2, ...]. Classes that are subclass of each of t1, etc are subclasses of this. (Compare to Union, which has at least one instead of each in its definition.) The order of the arguments doesn't matter. Nested intersections are flattened, e.g. Intersection[int, Intersection[float, str]] == Intersection[int, float, str]. An intersection of fewer types is a subclass of an intersection of more types, e.g. Intersection[int, str] is a subclass of Intersection[int, float, str]. An intersection of one argument is just that argument, e.g. Intersection[int] is int. When argument have a subclass relationship, the more specific class survives, e.g. Intersection[str, Employee, Manager] is Intersection[str, Manager]. Intersection[] is illegal, so is Intersection[()]. Corollary: Any disappears from the argument list, e.g. Intersection[int, str, Any] == Intersection[int, str].Intersection[Any, object] is object. The interaction between Intersection and Union is complex but should be no surprise if you understand the interaction between intersections and unions in set theory (note that sets of types can be infinite in size, since there is no limit on the number of new subclasses).

Pragmatics

Some things are irrelevant to the theory but make practical use more convenient. (This is not a full list; I probably missed a few and some are still controversial or not fully specified.)

  • Type aliases, e.g.
    • point = Tuple[float, float]
    • def distance(p: point) -> float: ... 
  • Forward references via strings, e.g.
    • class C:
      • def compare(self, other: “C”) -> int: ...
  • If a default of None is specified, the type is implicitly optional, e.g.
    • def get(key: KT, default: VT = None) -> VT: ...
  • Don't use dynamic type expressions; use builtins and imported types only. No 'if'.
    • def display(message: str if WINDOWS else bytes):  # NOT OK
  • Type declaration in comments, e.g.
    • x = []  # type: Sequence[int]
  • Type declarations using Undefined, e.g.
    • x = Undefined(str)
  • Other things, e.g. casts, overloading and stub modules; best left to an actual PEP.

Generic types

(TODO: Explain more. See also the mypy docs on generics.)

  • X = Var('X'). Declares a unique type variable. The name must match the variable name.

- Y = Var('Y', t1, t2, ...). Ditto, constrained to t1 etc. Behaves   like Union[t1, t2, ...] for most purposes, but when used as a type variable, subclasses of t1 etc. are replaced by the most-derived base class among t1 etc. - Example of constrained type variables:

  • AnyStr = Var('AnyStr', str, bytes)
  • def longest(a: AnyStr, b: AnyStr) -> AnyStr:
    • return a if len(a) >= len(b) else b
  • x = longest('a', 'abc')  # The inferred type for x is str
  • y = longest('a', b'abc')  # Fails static type check
  • In this example, both arguments to longest() must have the same type (str or bytes), and moreover, even if the arguments are instances of a common str subclass, the return type is still str, not that subclass (see next example).
  • For comparison, if the type variable was unconstrained, the common subclass would be chosen as the return type, e.g.:
    • S = Var('S')
    • def longest(a: S, b: S) -> S:
      • return a if len(a) >= b else b
    • class MyStr(str): ...
    • x = longest(MyStr('a'), MyStr('abc'))
    • The inferred type of x is MyStr (whereas in the AnyStr example it would be str).
  • Also for comparison, if a Union is used, the return type also has to be a Union:
    • U = Union[str, bytes]
    • def longest(a: U, b: U) -> U:
      • return a if len(a) >- b else b
    • x = longest('a', 'abc')
    • The inferred type of x is still Union[str, bytes], even though both arguments are str.
  • class C(Generic[X, Y, ...]): ... Define a generic class C over type variables X etc. C itself becomes parameterizable, e.g. C[int, str, ...] is a specific class with substitutions X→int etc.
  • TODO: Explain use of generic types in function signatures. E.g. Sequence[X], Sequence[int], Sequence[Tuple[X, Y, Z]], and mixtures. Think about co*variance. No gimmicks like deriving from Sequence[Union[int, str]] or Sequence[Union[int, X]].
  • Protocol. Similar to Generic but uses structural equivalence. (TODO: Explain, and think about co*variance.)

Predefined generic types and Protocols in typing.py

(See also the mypy typing.py module.)

  • Everything from collections.abc (but Set renamed to AbstractSet).
  • Dict, List, Set, a few more. (FrozenSet?)
  • Pattern, Match. (Why?)
  • IO, TextIO, BinaryIO. (Why?)

Another reference

Lest mypy gets all the attention, I should mention Reticulated Python by Michael Vitousek as an example of a slightly different approach to gradual typing for Python. It is described in an actual academic paper written by Vitousek with Jeremy Siek and Jim Baker (the latter of Jython fame).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment