Skip to content

Instantly share code, notes, and snippets.

@Gobot1234
Last active February 29, 2024 02:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Gobot1234/f78229cd41beaeab87b24a93b5ba70ff to your computer and use it in GitHub Desktop.
Save Gobot1234/f78229cd41beaeab87b24a93b5ba70ff to your computer and use it in GitHub Desktop.

Abstract

This PEP proposes the following additions to generic classes:

  • Adding an __args__ attribute that returns the specialised parameters to the class
  • Adding any type parameters to be available directly on the class as overwrite-able instance variables
  • Substitution of default type parameters at runtime
  • Automatically adding __orig_class__ to a class's slots if it's subscriptable (even if it's defined in C)

The following change to TypeVarLikes:

  • Adding __value__ as a way to compute the specialised value of a type parameter after subscription (if it has non-default parameters)

The following change to GenericAliases:

  • Hooking __getattr__ to handle accessing __args__ by name on the instance.

Motivation

Currently getting the specialised types for :py:term:Generic types is unintuitive and unreliable

class Foo[T]: ...

Foo[int]()  # How do I get `int` inside Foo?

>>> Foo[int]().__orig_class__.__args__
(int,)

This however doesn't work inside __new__/__init__ or any methods called from them as GenericAlias.__call__(*args, **kwargs) only sets __orig_class__ after self.__origin__(*args, **kwargs) returns.

class Bar[T]:
    def __init__(self):
        self.__orig_class__

>>> Bar[int]()  # AttributeError: Bar has no attribute __orig_class__

Now what about if I subclass a generic?

class Bar(Foo[str]): ...  # how do I now get `str`?

>>> types.get_original_bases(Bar)[0].__args__
(str,)

And what about a type parameter inside a generic function?

def foo[T](): ...

>>> foo[int]()

This isn't even possible without using implementation details/frame hacks.

With the new roots of runtime type checking beginning to sprout, I think it's unacceptable to have this kind of hard-to-use interface which is full of edge cases.

e.g.

class Slotted[T]:
    __slots__ = ()


Slotted[int]().__orig_class__  # AttributeError: 'Slotted' object has no attribute '__orig_class__'

I propose a new interface design which solves all of the above problems by being easy to use and much more reliable:

>>> Foo[int]().__args__
(int,)
>>> Foo[int]().T.__value__
int

>>> Bar.__args__
(str,)
>>> Bar.T.__value__
str

def foo[T]():
    return T.__value__

>>> foo[bool]()
bool

Anecdotally I've seen many requests for such a feature and I've needed it multiple times when writing typed code to get type parameters without duplicating values throughout code.

Prior discussion:

TODO Maybe try getting some stats on how popular this could be? Reach out to pydantic and other such introspection libraries

Rationale

__args__ property

Adding this property to allows for easy checking of the current instances type parameters.

Substitution of default type parameters at runtime

I would also like to enforce the arguments to a C-defined generic type at runtime to be the correct length. This would allow us to handle type parameter defaults at runtime and correctly substitute them at runtime.

This is a step towards deprecating all the typing aliases of collections.abc and contextlib classes as currently the number of parameters passed to them is not checked and to ensure a smooth transition to removing the typing aliases they should become re-exports of the original classes. TODO Something about the typing classes supporting defaults maybe?

Adding __orig_class__ to a class's slots if it's subscriptable

This avoids having developers have to remember to set this slot themselves if they create a slotted class.

Adding __value__ as a way to compute the specialised value of a type parameter after subscription (if it has non-default parameters)

Adding any type parameters to be available directly on the class

Specification

Adding properties for each TypeVarLike in __type_params__

These properties should allow for access to a type parameter, which binds a value after the substitution has occurred.

Setting and deleting should raise deprecation warnings however as in a future version these should be read-only. Type-checkers should warn about setting or deleting these attributes.

Another note about needing __eq__ (and __hash__) on them now so Class.T == class_.T(???)

Dealing with naming collisions for the deprecation period

Code generation for accessing the specialised type parameters

class Foo[T]:
    def __init__(self, value: T):
        self.T = value  # Oops, what a strange variable name

Would generate:

class Foo[T]:
    # compiler-generated code for each type parameter
    @property
    def T(self):
        try:
            # intentionally bypass any attribute hooks as this should be entirely transparent to developers
            return object.__getattribute__(self, "__T__")
        except AttributeError:
            T = self.__orig_class__.T
            object.__setattr__(self, "__T__", T)
            return T

    @T.setter
    def T(self, value):
        warnings.warn(
            "Setting type parameters is not supported",
            DeprecationWarning,
        )
        object.__setattr__(self, "__T__", value)

    # user code
    def __init__(self, value: T):
        self.T = value

Instantiation of Foo should raise a DeprecationWarning

>>> Foo[int](1)
<stdin>:1: DeprecationWarning: Setting type parameters is not supported

With __slot__ed classes or class variables/methods there should be a DeprecationWarning if any names overlap with the __type_params__ and then the code should look something like

class Slotted[T]:
    __slots__ = ("T",)
class Slotted[T]:
    __slots__ = ("T",)  # already should be in slots but is just ignored
    warnings.warn(
        "Setting type parameters is not supported",
        DeprecationWarning,
    )

Type checkers should warn about overriding instance variables with the same name as type parameters.s

Something about methods called T cough cough numpy.

__args__ descriptor

This attribute gives access to a tuple (or None) of the type variables after any substitution has occurred and acts very similarly to GenericAlias.__args__.

This descriptor can be accessed as both a class and an instance property depending on whether the class is used as a specialised base class.

class Foo[T]:
    def __init__(self):
        print("__args__ in Foo", self.__args__)
        super().__init__()

class Baz(Foo[str]):
    def __init__(self):
        print("__args__ in Baz", self.__args__)
        super().__init__()

class Bar[T, U](Foo[T]):
    def __init__(self):
        print("__args__ in Bar", self.__args__)
        super().__init__()


>>> Foo[bool]()
__args__ in Foo (bool,)
>>> Baz()
__args__ in Baz None
__args__ in Foo (str,)
>>> Bar[int, str]()
__args__ in Bar (int, str)
__args__ in Foo (int,)

>>> Foo()  # nothing passed to __orig_class__
__args__ in Foo None

This works with multiple inheritance as you might expect.

class Spam[U, V](Baz, Bar[int, U]):
    def __init__(self):
        print("__args__ in Spam", self.__args__)
        super().__init__()


>>> Spam[complex, bool]()
__args__ in Spam (complex, bool)
__args__ in Baz None
__args__ in Bar (int, complex)
__args__ in Foo (int,)

__args__ can be accessed on the class.

class Foo[T]:
    @classmethod
    def bar(cls):
        return cls.__args__

class Baz(Foo[str]): ...

>>> Foo[int].bar()
(int,)
>>> Baz.bar()
(str,)

Type checkers should be aware of the types passed to the instantiation and their associated variable types to allow the __args__ should be statically determinable. __args__ is erased to tuple[object, ...] outside of the instance of the class to preserve safety. We are explicitly choosing to violate the Liskov substitution principle because practicality beats purity here. The __args__ property would be almost useless without this restriction as any subclass could have different parameters making introspection significantly more difficult to use without any measurable benefit.

If you wanted to access the parameters whilst pretending to emulate a particular call site TODO how?

Accessed on Example Type checker type Type checker inferred type in example
self def method(self: Bar[int, str]) tuple[T, U, V, ...] | None tuple[int, str] | None
specific class Bar[int, str]() tuple[T, U, V, ...] | None tuple[int, str]
function parameter def foo(x: Foo[int]) tuple[object, ...] | None tuple[object, ...] | None

Runtime substitution of parameters

Currently, no checking of length or substitution of parameters occurs with types.GenericAlias, this PEP requires this checking the length of any parameters passed.

The number of parameters required can be worked out from the class definition both in python and in an extension module. This proposal doesn't require any runtime changes to Generics defined in python in this regard as they already check the number of parameters passed, however, in extensions we propose adding new a new interface for defining type parameters in C.

Standard Library Changes

These changes mean that types.GenericAlias is now compatible with typing._GenericAlias for most cases. This PEP means that collections.abc and contextlib classes can use :pep:695 syntax and typing can simply re-export the classes without wrapping them like they currently do, the same applies for all builtin classes defined currently using :pep:585 in C that are wrapped by typing.

Automatically adding __orig_class__ to a class's slots

If a class uses type parameter syntax it should have __orig_class__ added to the class's __slots__ if required. This is not required if it's already included in a superclass's or the unmodified__slots__.

__orig_class__ should have type GenericAlias | None and is None unless the instance was created through GenericAlias.__call__ in which case it will be the self argument. GenericAlias.__call__ should reimplement object.__new__ for classes to set the attribute as early in instance creation as possible.

Need to provide implementations for __args__ as an attribute, don't call this attribute on self from C? and a function to get a type param as an attribute.

Adding TypeVarLike.__value__

Allows for accessing the specialised value of a type parameter after the substitution has occurred.

Currently, unused type variables in the signature which aren't bound to a parameter is a type-checking error, however, now the following snippet should type-check without any errors.

def foo[T]():
    return T.__value__

foo[int]()  # int

If someone calls a function like this without a type param default it should raise an AttributeError if they try and access __value__.

To make this work we set up a cell object to catch the type parameter specialisation. Or consider just throwing in locals?

Note about type parameters now being part of stability guarantees i.e. need to be right the first time round. It's now advised not to include variance information in the name because it can change under your feet with infer_variance and implementing a new method. Type checkers should give this type[__bound__] or if constrained, type[Union[*__constraints__]] failing both of those, implicit Any.

Adding GenericAlias.{TypeParam}

types.GenericAlias needs to add casing for __getattr__ for type parameters so the below works:

class Foo[T]: ...

class Bar(Foo[str]): ...

Foo[int].T.__value__    # int
Foo[int]().T.__value__  # int
Bar.T.__value__         # str
Bar().T.__value__       # str

It should return the first found value for the type parameter so if there are duplicates in the MRO it should work like regular attribute access.

Backwards Compatibility

There are not backwards incompatible changes introduced by this as the magic methods/attributes are reserved for internals without warning. The type parameter changes would be unlikely to cause issues for anyone even if they did override attributes inside of classes, however special care has been taken to ensure that there are no behaviour changes for developers if they do override a type parameter for at least 3 years after the PEP is accepted.

Since this PEP makes type parameters part of public API normal deprecation policy would need to be applied to them to match the process for deprecating arguments to functions.

Unfortunately to the best of my knowledge this PEP can't be backported in typing_extensions due to the large amounts of coupling to the language without using frame hacks.

Runtime Implementation

__args__

Access to __args__ property needs to be recompiled to call

TODO what about from C can you still access a field called __args__ if this is purely in compilation?

(instance: object, callee: type) -> tuple PyObject_GetArgs(self, callee) Check if isinstance(self, GenericAlias) first though

def PyObject_GetArgs(object, callee):
class Spam[U, V](Baz, Bar[int, U]):
    def __init__(self):
        print("__args__ in Spam", self.__args__)
        super().__init__()


>>> Spam[complex, bool]()
__args__ in Spam (complex, bool)
__args__ in Baz (str,)
__args__ in Bar (int, complex)
__args__ in Foo (int,)
class Foo[T]:
    def __init__(self):
        print("__args__ in Foo", self.__args__)
        super().__init__()

    @magic_descriptor
    def __args__(self, class_called_from=None):
        return (self.__orig_class__.T.__value__,)


class Bar[T, U](Foo[T]):
    def __init__(self):
        print("__args__ in Bar", self.__args__)
        super().__init__()

    @magic_descriptor
    def __args__(self, class_called_from=None):
        if class_called_from is Bar:
            return (self.__orig_class__.T.__value__, self.__orig_class__.U.__value__)
        # should only expose the parameters that Foo knows about
        return super().__args__


class Baz(Foo[str]):
    def __init__(self):
        print("__args__ in Baz", self.__args__)
        super().__init__()

    @magic_descriptor
    def __args__(self, class_called_from=None):  # this works as both a class and instance property
        return (self.__orig_bases__[0].T.__value__,)


>>> Foo[bool]()
__args__ in Foo (bool,)
>>> Bar[int, str]()
__args__ in Bar (int, str)
__args__ in Foo (int,)
>>> Baz()
__args__ in Foo (str,)

A more complicated example with multiple inheritance

class Spam[U, V](Baz, Bar[int, U]):
    def __init__(self):
        print("__args__ in Spam", self.__args__)
        super().__init__()

    @magic_descriptor
    def __args__(self, class_called_from=None):
        if class_called_from is Spam:
            return (self.__orig_class__.U.__value__, self.__orig_class__.V.__value__)
        if class_called_from is Baz:  # needs to know that this is a class property
            return (self.__orig_bases__[0].T.__value__,)
        if class_called_from is Bar:
            return (self.__orig_bases__[1].T.__value__, self.__orig_class__.U.__value__)
        if class_called_from is Foo:
            return (self.__orig_bases__[1].T.__value__,)

Implementing this requires changes to the symtable to give the __class__ if the attribute is accessed inside its class. Inside a non-class scoped function or on a specific class no special action is required.

tp_type_params

To enforce type parameters inside in a C extension module a new way to store a class's type parameters is needed. This PEP introduces a new "tp slot", tp_type_params to PyTypeObject which stores all the necessary information about a class's type parameters to bring them in-line with a pure python equivalent.

Setting again is handled the same way as currently. Needs mentioning that this also goes on function objects should current field be moved into __dict__ similarly to type?

Handles functionality of PyTypeParam but it can be used from C and python.

Caches by placing in type.__dict__ after first call which is used in fast path.

Extension module

*PyObject custom_class_type_params(PyObject *self) {
    Py_TypeVar(T);
    return PyTuple_Pack(1, T);
}


static PyTypeObject CustomClass = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    .tp_name = "CustomClass",
    // Etc.
    .tp_type_params = custom_class_type_params,
};

is equivalent to

class CustomClass[T]:
    pass

and in the more complicated case

*PyObject fancy_custom_class_type_params(PyObject *self) {
    Py_TypeVar(T, .bound = "int");
    Py_TypeVar(AnyStr, .constraints = {"str", "bytes", NULL});
    Py_TypeVar(SeqT, .bound = "Sequence[bool]", .evaluation_context = "from collections.abc import Sequence");
    Py_ParamSpec(P, .default_ = "[str]");
    return PyTuple_Pack(4, T, AnyStr, SeqT, P);
}

// snip

is roughly equivalent to

from collections.abc import Sequence

class FancyCustomClass[T: int, AnyStr: (str, bytes), SeqT: Sequence[bool], **P = [str]]:
    pass

(Sequence wouldn't be put in globals or be accessible to any type parameters other than SeqT)

Example with functions

Generic list and dict subclasses to not incur increased memory costs

Returning subclasses of list/dict that have a slot for __orig_class__. PyList_WithOrigClass

GenericAlias.__call__ setting type parameter values

For type objects

GenericAlias.__call__ should reimplement object.__new__ to set __orig_class__ so after calling self = super().__new__(cls) you can access the attribute for types.

For function objects

For functions, setting locals is a big performance loss so T.__value__ should look up the calling frame to get the GenericAlias object to get the attribute from there.

Interaction with ForwardRefs

X['Foo']() ForwardRefs cannot be safely handled (TODO why?) so .__value__ should return the literal string 'Foo' in this case. If a user choses to handle this case they can.

Reference Implementation

import enum
import typing


class TypeParamKind(enum.Enum):  # make _type_param_kind from pycore_ast.h public
    TypeVar_kind      = 1
    ParamSpec_kind    = 2
    TypeVarTuple_kind = 3


def Py_CreateTypeParam(
    kind: TypeParamKind,
    name: str,
    bound: str | None = None,
    default: str | None = None,
    constraints: str | None = None,
) -> object:
    locals_ = {}
    if evaluation_context is not None:
        evaluation_context_code = compile(evaluation_context, f"<evaluation-context for {name}>", "exec")
        exec(evaluation_context_code, locals=locals_)

    # now that locals_ may be populated, we can compile our args
    if default is not None:
        compiled_default = compile(default, f"<default for {name}>", "eval")
        evaluate_default = lambda: eval(compiled_default, locals=locals_)

    match kind:
        case TypeParamKind.TypeVar_kind:
            if bound is not None:
                compiled_bound = compile(bound, f"<bound for {name}>", "eval")
                evaluate_bound = lambda: eval(compiled_bound, locals=locals_)

            if constraints is not None:
                compiled_constraints = compile(constraints, f"<constraints for {name}>", "eval")
                evaluate_constraints = lambda: eval(compiled_constraints, locals=locals_)

            return typing.TypeVar(
                name,
                evaluate_bound=evaluate_bound,
                evaluate_default=evaluate_default,
                evaluate_constraints=evaluate_constraints,
            )
        case TypeParamKind.ParamSpec_kind:
            return typing.ParamSpec(
                name,
                evaluate_default=evaluate_default,
            )
        case TypeParamKind.TypeVarTuple_kind:
            return typing.TypeVarTuple(
                name,
                evaluate_default=evaluate_default,
            )

# "macros"
from functools import partial
Py_CreateTypeVar = partial(Py_CreateTypeParam, TypeParamKind.TypeVar_kind)
Py_CreateParamSpec = partial(Py_CreateTypeParam, TypeParamKind.ParamSpec_kind)
Py_CreateTypeVarTuple = partial(Py_CreateTypeParam, TypeParamKind.TypeVarTuple_kind)

Default behaviour is described by PyObject_TypeParams.

*PyObject PyType_TypeParams(PyObject *self) {
    return type_get_type_params(_PyType_CAST(self), NULL);
}
PyObject* Py_GetTypeParams(PyObject *self) {
    PyTypeObject *type = (PyTypeObject*)self;

    PyObject *cls_dict = PyType_GetDict(type);
    int contains = PyDict_Contains(cls_dict, _Py_ID("__type_params__"));
    if contains < 0 {
        return NULL;
    } else if contains {
        return PyDict_GetItemWithError(cls_dict, _Py_ID("__type_params__"));
    }

    typeparamsfunc tp_type_params = type->tp_type_params;
    if (tp_type_params == NULL) {
        return PyTuple_New(0);
    }
    PyObject *type_params = tp_type_params(self);
    if (type_params == NULL) {
        return NULL;
    }
    int res = type_set_type_params(Py_TYPE(self), type_params);  // we can bypass immutable check here?
    if (res < 0) {
        return NULL;
    }
    return type_params;
}

Evaluation context was chosen over the alternative of having users import, construct and call all the correct methods to initialise a type. If NULL then PyEval_EvalCode can be skipped and the bound/default/constraints can be evaluated in the normal, "empty" globals/locals. Evaluation context is also only given one value for all three type receiving kwargs bound, constraints and default as a safe optimisation as bound and constraints are mutually exclusive and default is a subtype of either.

GenericAlias.__call__ For types reimpl object new to set orig_class asap. For functions store frame in dict with value of self. tv.value should look this up.

static PyObject *
ga_call(PyObject *self, PyObject *args, PyObject *kwds)
{
    gaobject *alias = (gaobject *)self;
    PyObject *obj;
    if (PyType_Check(alias->origin)) {
        obj = type_call;
        set_orig_class;
    // else if (PyCallable_Check(alias.origin)) {
    //     obj = set_type_param_values_and_call(alias->origin, PyObject_GetAttr(alias->origin, _Py_ID(__type_params__)), alias->args);
    } else if (PyFunction_Check(alias->origin)) {
        PyFunctionObject *func = (PyFunctionObject *)alias->origin;
        obj = set_frame_(alias->origin, func->func_typeparams, alias->args);
    } else if (PyMethod_Check(alias->origin)) {
        PyMethodObject *meth = (PyMethodObject *)alias->origin;
        obj = set_type_param_values_and_call(alias->origin, meth->ifunc->func_typeparams, alias->args);
    } else {
        obj = PyObject_Call(alias->origin, args, kwds);
    }
    return set_orig_class(obj, self);
}

Rejected Ideas

Using a current design pattern

Currently one of the recommended ways around this looks something like:

class Class[T]
    def __init__(self, x: T, x_ty: T_instance):
        self.x = x
        # do something with `x_tp`

Class[int](1234, int) # or Class(1234, int)

This is not desirable as it not only duplicates information leaving room for things to become out of sync but it also requires each class to follow a certain specification if it wants to be interoperable and doesn't use the current machinery present for dealing with these cases which are outlined in the motivation section.

Recompilation of instantiations to go through GenericAlias.__call__

Whilst having

class Foo[T]: ...
x: Foo[int] = Foo()
x.T.__value__

being recompiled to

class Foo[T]: ...
x: Foo[int] = Foo[int]()
x.T.__value__

would be nice it is entirely infeasible and would require typing being used at compile time. It also makes this PEP more opt-in if better performance is desired and the runtime access is not required.

A PyTypeParam struct for the C-API

Original drafts of this PEP used a new struct PyTypeParam and some high level methods to operate on them.

This was placed in the tp_type_param tp slot but that was rejected as it increased memory for no gain on the python side.

typedef struct {  // New public C-API struct
    enum Kind {
        TypeVar,
        TypeVarTuple,
        ParamSpec,
    } kind;
    const char *name;
    const char *bound;
    const char *default_;
    const char *constraints[];
} PyTypeParam;

+--------------------------------------------------------------------------+-----------------------------------------+ | Extension module | Python module | +==========================================================================+=========================================+ | .. code-block:: c | .. code-block:: python | | | | | static PyTypeParam custom_class_type_params[] = { | class CustomClass[T]: | | {.name = "T"}, | pass | | {NULL}, | | | }; | | | | | | static PyTypeObject CustomClass = { | | | PyVarObject_HEAD_INIT(&PyType_Type, 0) | | | .tp_name = "CustomClass", | | | // Etc. | | | .tp_type_params = custom_class_type_params, | | | }; | | +--------------------------------------------------------------------------+-----------------------------------------+ | .. code-block:: c | .. code-block:: python | | | | | static PyTypeParam fancy_custom_class_type_params[] = { | from collections.abc import Sequence | | {.name = "T", .bound = "int"}, | | | {.name = "AnyStr", .constraints = {"str", "bytes", NULL}}, | class FancyCustomClass[: | | | { | T: int, | | .name = "SeqT", | AnyStr: (str, bytes), | | .bound = "Sequence[bool]", | SeqT: Sequence[bool], | | .evaluation_context = "from collections.abc import Sequence", | **P = [str], | | }, | ]: | | {.name = "P", .kind = ParamSpec, .default_ = "[str]"}, | pass | | {NULL}, | | | }; | | | | | | // snip | | +--------------------------------------------------------------------------+-----------------------------------------+

Evaluating bound and constraints evaluation_context

This requires 3 new unstable C-API functions, PyTypeParam_EvalBound, PyTypeParam_EvalDefault and PyTypeParam_EvalConstraints. These functions should execute evaluation_context and the locals created, be captured before evaluating the bound, default or constraints. A example implementation for PyTypeParam_EvalBound is provided:

def PyTypeParam_EvalBound(self: PyTypeParam) -> PyObject:
    locals_ = {}
    if self.evaluation_context is not None:
        evaluation_context_code = compile(self.evaluation_context, f"<evaluation-context for {self.name}>", "exec")
        exec(evaluation_context_code, locals=locals_)

    # now that locals_ may be populated, we can evaluate the bound
    compiled_bound = compile(self.bound, f"<bound for {self.name}>", "eval")
    return eval(compiled_bound, locals=locals_)

Evaluation context was chosen over the alternative of having users import, construct and call all the correct methods to initialise a type. If NULL then PyEval_EvalCode can be skipped and the bound/default/constraints can be evaluated in the normal, "empty" globals/locals. Evaluation context is also only given one value for all three type receiving kwargs bound, constraints and default as a safe optimisation as bound and constraints are mutually exclusive and default is a subtype of either.

Deprecation warning for missing type parameters

Whilst this would be a nice feature to have for people who really care about enforcing all of this at compile time, it is not practical for a number of reasons. This change would be backwards incompatible as it raises, which would mean churn. This could have been turned into an optional flag like --strict which would allow for better gradual typing, but this doesn't have an easy option for configuration of which modules care about this. This feature however would also maybe give too much confidence/where that python is performing runtime type checking which isn't possible. The final nail in this idea's coffin is that they're not always required due to bi-directional inference.

Open Issues

Should there be a decision made on TypeAliasType being made callable it might be useful to have T(*args, **kwargs) forward to T.__value__(*args, **kwargs)

Why no __parameters__, could do typing.get_parameters(self) if we really care.

Copyright

TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment