stroxler/TypedDict_extra_keys.org

## TypedDict_extra_keys.org

      
    Raw
  

              TypedDict_extra_keys.org
            
          
    Resources

Draft PEP
  Thread on typing-sig
General thoughts

I think existence, but not use, of extra fields should be the default

By default I think we actually should allow arbitrary extra fields, but
  disallow any writes and only allow get method reads, which should return
  something opaque like object | None
I have two arguments for why I like this:
Sane typing seems to suggest it ought to work

Today, it’s considered illegal
  take a literal with extra keys and store it in a variable with a given TypedDict,
  but it’s perfectly legal to upcast a TypedDict.
That’s pretty weird, because assigning a literal to a TypedDict variable is no
  more risky - maybe even a bit safer given the lack of dangling references - than
  assigning a value of a child TypedDict class.
Here’s what I mean:
from typing import TypedDict

class Employee(TypedDict):
    identifier: int
    is_manager: bool
    
class EmployeeWithIncome(Employee):
   income: float

d0: Employee = {
    "identifier": 59,
    "is_manager": True,
}

d1: EmployeeWithIncome = {
    "identifier": 59,
    "is_manager": True,
    "income": 100.0
}

# mypy, pyright, and pyre all accept this as okay
d2: Employee = d1

# mypy, pyright, and pyre all reject this, which is no safer than d2
d3: Employee = {
    "identifier": 59,
    "is_manager": True,
    "income": 100.0
}
This behavior is explicitly required in PEP 589 (“Extra keys included in
  TypedDict object construction should also be caught”), but doesn’t
  seem coherent.
There are other less-relevant static typing arguments as well - for example,
  assigning a literal dict to a TypedDict that is compatible is analagous to
  assigning a Class to a compatible Protocol, which is permitted (Scholbach mentions
  this in the “Duck Typing” section of his rationale of first email).
If we start using TypedDict for **kwargs typing I think this may become
  more common, since I could imagine a lot of daimonds of TypedDicts coming
  into existence for kwarg-happy libraries like Airflow (Scholbach mentions
  this as well as part of the rationale.)
It makes runtime libraries quite a bit more useful

Caveat: PEP 589 actually explicitly doesn’t define runtime behavior

To be clear, PEP 589 specifically throws up its hands and says that all runtime
  type use is out-of-scope. The language is a little vague, at one point it seems
  to ban the use of TypedDict at runtime, but then it says that “such
  functionality can be provided by a third party”, implicitly suggesting that
  the third-party behavior is arbitrary and not subject to disussion in a PEP.
As a result, a case could be made that runtime support does not require any
  PEP changes at all, it’s totally up to the ecosystem (although obviously)
  may still be in scope for typing-sig discussions!
I still need to read PEP 655, which does specify a bit more runtime
  behavior.
Why extra fields by default makes sense for runtime checkers

First off, a runtime type checker doesn’t know anything about static
  types, so in order to support structural subtyping it has to allow
  extra fields - there’s no way to distinguish between assigning across
  consistent types vs assigning literals.
Secondly, I think we would want it to allow extra fields anyway: It is
  extremely common that an API tries to guarantee stability of the schema for
  existing keys which reserving the right to add arbitrary new keys. Using a
  runtime validator for these cases is going to work much better if the validator
  ignores unknown keys. (Scholback mentions this use case in his first email
  to typing-sig describing rationale).
What about the suggestion to explicitly support extra keys, with a type?

Above, I have what I consider a solid argument that it’s very problematic for
  TypedDict to ban extra fields at all - it’s not coherent and limits use cases.
  But I also argue that we should continue to ban all writes and treat all get
  reads as opaque.
In the mailing list, Eric asks why we couldn’t specify a type for unknown
  arguments, and possibly even make it generic. I think this is irrelevant to the
  question of whether the PEP 589 behavior needs tweaking, but it might be a good
  thing to consider.
If we do want to do this, how would it work and what edge cases occur?

It seems pretty clear that if we had extra keys of some type T
  (ignoring for now whether T is generic), then:

  we’d need to allow reads with unknown literals that resolve to T;
    failure to allow this would call into question why we’re even
    bothering.
  we’d likely want to allow writes as well, and it makes sense that
    we might want deletes.

Even with just this minimal behavior on literal unknown keys, we already
  must ban structural subtypes. Otherwise, we can easily introduce
  widespread unsoundness on both read and write operations going both
  up and down the type hierarchy.
Presumably one reason folks might want this is to allow not just
  unknown literal keys, but also arbitrary string keys. But unfortunatley
  this doesn’t work well at all: we can’t guarantee that the key
  doesn’t collide with a field name, and as a result assuming either
  reads or writes work with type T will in general be unsound.
My view on this

In my opinion, these limitations suggest that we shouldn’t extend
  TypedDict to support extra keys, beyond the suggestion above. If
  anything, I’d argue that it would make more sense to introduce
  a proposal doing the opposite: banning extra keys explicitly, which
  would also have to make structural subtypes illegal.
Miscellaneous notes

Defining some vocab

In the body of PEP 589, “is consistent with” is a directional, asymmetric
  relationship and basically means “is castable to”, i.e. “can be treated
  as lower in the type hierarchy”.
In both the PEPs and typing-sig, “structural subtyping” is used to
  refer to inheritance for typed dicts. It more generally refers to any
  typing that isn’t “nominal” (class-based), for example Protocols.
  When people talk about extra fields being possible due to this, they
  mean that a typed dict could be an “instance” of a “child” which means
  it could have extra keys.
More PEP ideas

Some of the behaviors of TypedDict with respect to required-ness of keys are pretty awkward for reasons that have to do with mutation.
I think it could be beneficial to propose a TypedMapping that doesn’t allow mutations; this would behave better in some cases - in particular,
  a subtype that makes a NotRequired field be Required would become
  legal, which feels more intuitive (the only reason it isn’t legal
  today is because of deltions).
Detailed notes about how existing type checkers handle some edge cases

Related question: how do typed dicts today handle extra keys in getting / setting?

mypy and pyright

Both mypy and pyright:

  treat setting a field not defined in the TypedDict interface as an error.
  allow calls to `.get`, but assume an opaque type
    
      mypy assumes builtins.object*
      pyright assumes Any | None
    
  
This seems reasonable: you can call the get method, but you cannot make
  any assumptions whatsoever about the returned type.
I think the mypy approach is better, using `object` forces a cast in
  order to create unsoundess on child TypedDict classes, but it’s a bit
  of an edge case and probably not that important.
Upon a closer read, technically pyright is in violation of 589, it
  specifically says get should use object
pyre

The pyre behavior is almost certainly a bug rather than a design choice, we
  should ignore it.
For those who are curious:

  get returns an optional of that type, which is very weird
  pyre will allow arbitrary writes of that type to any key, which is
    not only wierd but flagrantly unsound

Can you use a protocol with TypedDict to get this?

No, as far as I can tell, and this seems correct because PEP 589
  specifically says “A TypedDict cannot inherit from both a TypedDict
  type and a non-TypedDict base class”.
from typing import TypedDict, Protocol, Generic, TypeVar

T = TypeVar("T")

class StringDict(Protocol, Generic[T]):

    def __getitem__(self, key: str) -> T: ...
    
    def __setitem__(self, key: str, value: T) -> None: ...
    

# Most typecheckers reject this:
# - pyright and mypy both complain about this violation of PEP 589
# - pyre doesn't complain but also completely ignores the protocol
#   and uses the incorrect first-field-type logic described above
class DictWithExtraKeys(TypedDict, StringDict[object]):
    key: int
Is mutation a source of additional problems?

It depends. If we only allow TypedDicts with no children to specify
  extra fields, then as far as I can think of there’s no unique
  mutation problem.
But if we allow child TypedDicts, then without mutation

  Arbitrary children are compatible with an extras

When someone mentioned that structural subtyping