Skip to content

Instantly share code, notes, and snippets.

@stroxler
Created March 16, 2022 02:29
Show Gist options
  • Save stroxler/801bc72a4837bc07a51e607d41773653 to your computer and use it in GitHub Desktop.
Save stroxler/801bc72a4837bc07a51e607d41773653 to your computer and use it in GitHub Desktop.
TypedDict: how should we handle extra keys?

Resources

Draft PEP Thread on typing-sig

General thoughts

I think existence, but not use, of extra fields should be the default

By default I think we actually should allow arbitrary extra fields, but disallow any writes and only allow get method reads, which should return something opaque like object | None

I have two arguments for why I like this:

Sane typing seems to suggest it ought to work

Today, it’s considered illegal take a literal with extra keys and store it in a variable with a given TypedDict, but it’s perfectly legal to upcast a TypedDict.

That’s pretty weird, because assigning a literal to a TypedDict variable is no more risky - maybe even a bit safer given the lack of dangling references - than assigning a value of a child TypedDict class.

Here’s what I mean:

from typing import TypedDict

class Employee(TypedDict):
    identifier: int
    is_manager: bool
    
class EmployeeWithIncome(Employee):
   income: float

d0: Employee = {
    "identifier": 59,
    "is_manager": True,
}

d1: EmployeeWithIncome = {
    "identifier": 59,
    "is_manager": True,
    "income": 100.0
}

# mypy, pyright, and pyre all accept this as okay
d2: Employee = d1

# mypy, pyright, and pyre all reject this, which is no safer than d2
d3: Employee = {
    "identifier": 59,
    "is_manager": True,
    "income": 100.0
}

This behavior is explicitly required in PEP 589 (“Extra keys included in TypedDict object construction should also be caught”), but doesn’t seem coherent.

There are other less-relevant static typing arguments as well - for example, assigning a literal dict to a TypedDict that is compatible is analagous to assigning a Class to a compatible Protocol, which is permitted (Scholbach mentions this in the “Duck Typing” section of his rationale of first email).

If we start using TypedDict for **kwargs typing I think this may become more common, since I could imagine a lot of daimonds of TypedDicts coming into existence for kwarg-happy libraries like Airflow (Scholbach mentions this as well as part of the rationale.)

It makes runtime libraries quite a bit more useful

Caveat: PEP 589 actually explicitly doesn’t define runtime behavior

To be clear, PEP 589 specifically throws up its hands and says that all runtime type use is out-of-scope. The language is a little vague, at one point it seems to ban the use of TypedDict at runtime, but then it says that “such functionality can be provided by a third party”, implicitly suggesting that the third-party behavior is arbitrary and not subject to disussion in a PEP.

As a result, a case could be made that runtime support does not require any PEP changes at all, it’s totally up to the ecosystem (although obviously) may still be in scope for typing-sig discussions!

I still need to read PEP 655, which does specify a bit more runtime behavior.

Why extra fields by default makes sense for runtime checkers

First off, a runtime type checker doesn’t know anything about static types, so in order to support structural subtyping it has to allow extra fields - there’s no way to distinguish between assigning across consistent types vs assigning literals.

Secondly, I think we would want it to allow extra fields anyway: It is extremely common that an API tries to guarantee stability of the schema for existing keys which reserving the right to add arbitrary new keys. Using a runtime validator for these cases is going to work much better if the validator ignores unknown keys. (Scholback mentions this use case in his first email to typing-sig describing rationale).

What about the suggestion to explicitly support extra keys, with a type?

Above, I have what I consider a solid argument that it’s very problematic for TypedDict to ban extra fields at all - it’s not coherent and limits use cases. But I also argue that we should continue to ban all writes and treat all get reads as opaque.

In the mailing list, Eric asks why we couldn’t specify a type for unknown arguments, and possibly even make it generic. I think this is irrelevant to the question of whether the PEP 589 behavior needs tweaking, but it might be a good thing to consider.

If we do want to do this, how would it work and what edge cases occur?

It seems pretty clear that if we had extra keys of some type T (ignoring for now whether T is generic), then:

  • we’d need to allow reads with unknown literals that resolve to T; failure to allow this would call into question why we’re even bothering.
  • we’d likely want to allow writes as well, and it makes sense that we might want deletes.

Even with just this minimal behavior on literal unknown keys, we already must ban structural subtypes. Otherwise, we can easily introduce widespread unsoundness on both read and write operations going both up and down the type hierarchy.

Presumably one reason folks might want this is to allow not just unknown literal keys, but also arbitrary string keys. But unfortunatley this doesn’t work well at all: we can’t guarantee that the key doesn’t collide with a field name, and as a result assuming either reads or writes work with type T will in general be unsound.

My view on this

In my opinion, these limitations suggest that we shouldn’t extend TypedDict to support extra keys, beyond the suggestion above. If anything, I’d argue that it would make more sense to introduce a proposal doing the opposite: banning extra keys explicitly, which would also have to make structural subtypes illegal.

Miscellaneous notes

Defining some vocab

In the body of PEP 589, “is consistent with” is a directional, asymmetric relationship and basically means “is castable to”, i.e. “can be treated as lower in the type hierarchy”.

In both the PEPs and typing-sig, “structural subtyping” is used to refer to inheritance for typed dicts. It more generally refers to any typing that isn’t “nominal” (class-based), for example Protocols. When people talk about extra fields being possible due to this, they mean that a typed dict could be an “instance” of a “child” which means it could have extra keys.

More PEP ideas

Some of the behaviors of TypedDict with respect to required-ness of keys are pretty awkward for reasons that have to do with mutation.

I think it could be beneficial to propose a TypedMapping that doesn’t allow mutations; this would behave better in some cases - in particular, a subtype that makes a NotRequired field be Required would become legal, which feels more intuitive (the only reason it isn’t legal today is because of deltions).

Detailed notes about how existing type checkers handle some edge cases

Related question: how do typed dicts today handle extra keys in getting / setting?

mypy and pyright

Both mypy and pyright:

  • treat setting a field not defined in the TypedDict interface as an error.
  • allow calls to `.get`, but assume an opaque type
    • mypy assumes builtins.object*
    • pyright assumes Any | None

This seems reasonable: you can call the get method, but you cannot make any assumptions whatsoever about the returned type.

I think the mypy approach is better, using `object` forces a cast in order to create unsoundess on child TypedDict classes, but it’s a bit of an edge case and probably not that important.

Upon a closer read, technically pyright is in violation of 589, it specifically says get should use object

pyre

The pyre behavior is almost certainly a bug rather than a design choice, we should ignore it.

For those who are curious:

  • get returns an optional of that type, which is very weird
  • pyre will allow arbitrary writes of that type to any key, which is not only wierd but flagrantly unsound

Can you use a protocol with TypedDict to get this?

No, as far as I can tell, and this seems correct because PEP 589 specifically says “A TypedDict cannot inherit from both a TypedDict type and a non-TypedDict base class”.

from typing import TypedDict, Protocol, Generic, TypeVar

T = TypeVar("T")

class StringDict(Protocol, Generic[T]):

    def __getitem__(self, key: str) -> T: ...
    
    def __setitem__(self, key: str, value: T) -> None: ...
    

    
# Most typecheckers reject this:
# - pyright and mypy both complain about this violation of PEP 589
# - pyre doesn't complain but also completely ignores the protocol
#   and uses the incorrect first-field-type logic described above
class DictWithExtraKeys(TypedDict, StringDict[object]):
    key: int

Is mutation a source of additional problems?

It depends. If we only allow TypedDicts with no children to specify extra fields, then as far as I can think of there’s no unique mutation problem.

But if we allow child TypedDicts, then without mutation

  • Arbitrary children are compatible with an extras

When someone mentioned that structural subtyping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment