Draft PEP Thread on typing-sig
By default I think we actually should allow arbitrary extra fields, but
disallow any writes and only allow get
method reads, which should return
something opaque like object | None
I have two arguments for why I like this:
Today, it’s considered illegal take a literal with extra keys and store it in a variable with a given TypedDict, but it’s perfectly legal to upcast a TypedDict.
That’s pretty weird, because assigning a literal to a TypedDict variable is no more risky - maybe even a bit safer given the lack of dangling references - than assigning a value of a child TypedDict class.
Here’s what I mean:
from typing import TypedDict
class Employee(TypedDict):
identifier: int
is_manager: bool
class EmployeeWithIncome(Employee):
income: float
d0: Employee = {
"identifier": 59,
"is_manager": True,
}
d1: EmployeeWithIncome = {
"identifier": 59,
"is_manager": True,
"income": 100.0
}
# mypy, pyright, and pyre all accept this as okay
d2: Employee = d1
# mypy, pyright, and pyre all reject this, which is no safer than d2
d3: Employee = {
"identifier": 59,
"is_manager": True,
"income": 100.0
}
This behavior is explicitly required in PEP 589 (“Extra keys included in TypedDict object construction should also be caught”), but doesn’t seem coherent.
There are other less-relevant static typing arguments as well - for example, assigning a literal dict to a TypedDict that is compatible is analagous to assigning a Class to a compatible Protocol, which is permitted (Scholbach mentions this in the “Duck Typing” section of his rationale of first email).
If we start using TypedDict for **kwargs typing I think this may become more common, since I could imagine a lot of daimonds of TypedDicts coming into existence for kwarg-happy libraries like Airflow (Scholbach mentions this as well as part of the rationale.)
To be clear, PEP 589 specifically throws up its hands and says that all runtime type use is out-of-scope. The language is a little vague, at one point it seems to ban the use of TypedDict at runtime, but then it says that “such functionality can be provided by a third party”, implicitly suggesting that the third-party behavior is arbitrary and not subject to disussion in a PEP.
As a result, a case could be made that runtime support does not require any PEP changes at all, it’s totally up to the ecosystem (although obviously) may still be in scope for typing-sig discussions!
I still need to read PEP 655, which does specify a bit more runtime behavior.
First off, a runtime type checker doesn’t know anything about static types, so in order to support structural subtyping it has to allow extra fields - there’s no way to distinguish between assigning across consistent types vs assigning literals.
Secondly, I think we would want it to allow extra fields anyway: It is extremely common that an API tries to guarantee stability of the schema for existing keys which reserving the right to add arbitrary new keys. Using a runtime validator for these cases is going to work much better if the validator ignores unknown keys. (Scholback mentions this use case in his first email to typing-sig describing rationale).
Above, I have what I consider a solid argument that it’s very problematic for
TypedDict to ban extra fields at all - it’s not coherent and limits use cases.
But I also argue that we should continue to ban all writes and treat all get
reads as opaque.
In the mailing list, Eric asks why we couldn’t specify a type for unknown arguments, and possibly even make it generic. I think this is irrelevant to the question of whether the PEP 589 behavior needs tweaking, but it might be a good thing to consider.
It seems pretty clear that if we had extra keys of some type T
(ignoring for now whether T
is generic), then:
- we’d need to allow reads with unknown literals that resolve to
T
; failure to allow this would call into question why we’re even bothering. - we’d likely want to allow writes as well, and it makes sense that we might want deletes.
Even with just this minimal behavior on literal unknown keys, we already must ban structural subtypes. Otherwise, we can easily introduce widespread unsoundness on both read and write operations going both up and down the type hierarchy.
Presumably one reason folks might want this is to allow not just
unknown literal keys, but also arbitrary string keys. But unfortunatley
this doesn’t work well at all: we can’t guarantee that the key
doesn’t collide with a field name, and as a result assuming either
reads or writes work with type T
will in general be unsound.
In my opinion, these limitations suggest that we shouldn’t extend TypedDict to support extra keys, beyond the suggestion above. If anything, I’d argue that it would make more sense to introduce a proposal doing the opposite: banning extra keys explicitly, which would also have to make structural subtypes illegal.
In the body of PEP 589, “is consistent with” is a directional, asymmetric relationship and basically means “is castable to”, i.e. “can be treated as lower in the type hierarchy”.
In both the PEPs and typing-sig, “structural subtyping” is used to refer to inheritance for typed dicts. It more generally refers to any typing that isn’t “nominal” (class-based), for example Protocols. When people talk about extra fields being possible due to this, they mean that a typed dict could be an “instance” of a “child” which means it could have extra keys.
Some of the behaviors of TypedDict with respect to required-ness of keys are pretty awkward for reasons that have to do with mutation.
I think it could be beneficial to propose a TypedMapping that doesn’t allow mutations; this would behave better in some cases - in particular, a subtype that makes a NotRequired field be Required would become legal, which feels more intuitive (the only reason it isn’t legal today is because of deltions).
Both mypy and pyright:
- treat setting a field not defined in the TypedDict interface as an error.
- allow calls to `.get`, but assume an opaque type
- mypy assumes
builtins.object*
- pyright assumes
Any | None
- mypy assumes
This seems reasonable: you can call the get method, but you cannot make any assumptions whatsoever about the returned type.
I think the mypy approach is better, using `object` forces a cast in order to create unsoundess on child TypedDict classes, but it’s a bit of an edge case and probably not that important.
Upon a closer read, technically pyright is in violation of 589, it
specifically says get
should use object
The pyre behavior is almost certainly a bug rather than a design choice, we should ignore it.
For those who are curious:
get
returns an optional of that type, which is very weird- pyre will allow arbitrary writes of that type to any key, which is not only wierd but flagrantly unsound
No, as far as I can tell, and this seems correct because PEP 589 specifically says “A TypedDict cannot inherit from both a TypedDict type and a non-TypedDict base class”.
from typing import TypedDict, Protocol, Generic, TypeVar
T = TypeVar("T")
class StringDict(Protocol, Generic[T]):
def __getitem__(self, key: str) -> T: ...
def __setitem__(self, key: str, value: T) -> None: ...
# Most typecheckers reject this:
# - pyright and mypy both complain about this violation of PEP 589
# - pyre doesn't complain but also completely ignores the protocol
# and uses the incorrect first-field-type logic described above
class DictWithExtraKeys(TypedDict, StringDict[object]):
key: int
It depends. If we only allow TypedDicts with no children to specify extra fields, then as far as I can think of there’s no unique mutation problem.
But if we allow child TypedDicts, then without mutation
- Arbitrary children are compatible with an extras
When someone mentioned that structural subtyping