Skip to content

Instantly share code, notes, and snippets.

@vlasovskikh
Last active June 1, 2016 20:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vlasovskikh/1a8d5effe95d5944b919 to your computer and use it in GitHub Desktop.
Save vlasovskikh/1a8d5effe95d5944b919 to your computer and use it in GitHub Desktop.

Proposal #2

See also Proposal #1

TL;DR

  • Introduce typing.Text for text data in Python 2+3
  • bytes, str, unicode, typing.Text in type hints mean whatever they mean at runtime for Python 2 or 3
  • Allow str -> unicode and unicode -> str promotions for Python 2
  • Type checking for Python 2 and Python 3 actually finds most text/binary errors
  • A few false negatives for Python 2 are not worth special handling besides possible ad-hoc handling of non-ASCII literals conversions

Summary for Python users

If you want your code to be Python 2+3 compatible:

  • Write text/binary type hints in 2+3 compatible comments
    • Use typing.Text for text data, bytes for binary data
    • Use str only for rare cases of "native strings"
    • Don't use unicode since it's absent in Python 3
  • Run a type checker for both Python 2 and Python 3

Summary for authors of type checkers

The semantics of types bytes, str, unicode, typing.Text and the type checking rules for them should match the runtime behavior of these types in Python 2 and Python 3 depending on Python 2 or 3 modes. Using the runtime semantics for the types is easy to understand while it still allows to catch most errors. The Python 2+3 compatibility mode is just a sum of Python 2 and Python 3 warnings.

Type checkers should promote str/bytes to unicode/Text and unicode/Text to str/bytes for Python 2. Most text/binary conversion errors can be found by running a type checker for Python 2 and for Python 3.

typing.Text: Python 2+3 compatible type for text data

The typing.Text type is a Python 2+3 compatible type for text data. It's defined as follows:

if sys.version_info < (3,):
    Text = unicode
else:
    Text = str

For a Python 2+3 compatible type for binary data use bytes that is available in both 2 and 3.

Implicit text/binary conversions

In Python 2 text data is implicitly converted to binary data and vice versa using the ASCII encoding. Only if the data isn't ASCII-compatible, then a UnicodeEncodeError or a UnicodeDecodeError is raised. This results in many programs that aren't well-tested regarding non-ASCII data handling.

In Python 3 converting text data to binary data always raises a TypeError.

A type checker run in the Python 3 mode will find most of Python 2 implicit conversion errors.

Checking for Python 2+3 compatibility

In order to be Python 2+3 compatible a program has to pass both Python 2 and Python 3 type checking. In other words, the warnings found in the Python 2+3 compatible mode are a simple sum of Python 2 warnings and Python 3 warnings.

Runtime type compatibility

Here is a table of types whose values are compatible at runtime. Columns are the expected types, rows are the actual types:

        | Text  | bytes | str   | unicode
--------+-------+-------+-------+---------
Text    |  . .  |  * F  |  * .  |  . F
bytes   |  * F  |  . .  |  . F  |  * F
str     |  * .  |  . F  |  . .  |  * F
unicode |  . F  |  * F  |  * F  |  . F

Each cell contains two characters: the result in Python 2 and in Python 3 respectively. Abbreviations:

  • . — types are compatible
  • F — types are not compatible
  • * — types are compatible, ignoring implicit ASCII conversions

At runtime in Python 2 str is compatible with unicode and vice versa (ignoring possible implicit ASCII conversion errors).

Using unicode in Python 3 is always an error since there is no unicode name in Python 3.

As you can see from the table above, many implicit ASCII conversion errors in a Python 2 program can be found just by running a type checker in the Python 3 mode.

The only problematic conversions that may result in errors are Text to str and vice versa in Python 2.

Example 1. Text to str

def foo(obj, x):
    # type: (Any, str) -> Any
    return getattr(obj, x)

foo(..., u'привет')  # False negative warning for non-ASCII in Python 2

Example 2. str to Text

def foo(x):
    # type: (Text) -> Any
    return u'Привет, ' + x

foo('Мир')  # False negative warning for non-ASCII in Python 2

For non-ASCII text literals passed to functions that expect Text or str in Python 2 a type checker can analyze the contents of the literal and show additional warnings based on this information. For non-ASCII data coming from sources other than literals this check would be more complicated.

To summarize, with this type compatibility table in place, a type checker run for both Python 2 and Python 3 is able to find almost all errors related to text and binary data except for a few text to "native string" conversions and vice versa in Python 2.

Current Mypy type compatibility (non-runtime semantics)

Mypy implies str to unicode promotion for Python 2, but it doesn't promote unicode to str. Here is an example of a Python 2 program that is correct given the runtime type compatibility semantics shown in the table above, but is incorrect for Mypy:

def foo(obj, x):
    # type: (Any, str) -> Any
    return getattr(obj, x)

foo({}, u'upper')  # False positive warning in Mypy for ASCII in Python 2

Here is the type compatibility table for the current version of Mypy:

        | Text  | bytes | str   | unicode
--------+-------+-------+-------+---------
Text    |  . .  |  F F  |  F .  |  . F
bytes   |  * F  |  . .  |  . F  |  * F
str     |  * .  |  . F  |  . .  |  * F
unicode |  . F  |  F F  |  F F  |  . F

Running the Mypy type checker in Python 2 mode and Python 3 mode for the same program would find almost all implicit ASCII conversion errors except for str to Text conversions.

To summarize, the current Mypy type compatibility table covers almost all text and binary data handling errors when used for both Python 2 and Python 3. But it doesn't notice errors in "native string" to text conversions in Python 2 and produces false warnings for text to "native string" conversions in Python 2.

@gvanrossum
Copy link

Can you summarize the the definition of typing.Text in the TL;DR section?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment