Skip to content

Instantly share code, notes, and snippets.

@gwerbin
Last active July 31, 2019 23:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gwerbin/c0186e5e836bbf34d5a01b664013a6e4 to your computer and use it in GitHub Desktop.
Save gwerbin/c0186e5e836bbf34d5a01b664013a6e4 to your computer and use it in GitHub Desktop.
Freenode #python gets frustrated on a Sunday afternoon and rewrites Python.

Freenode #python rewrites Python

Text

The str class should represent a sequence of grapheme clusters, not codepoints. This should lead to less-surprising behavior when working with non-ASCII text.

NOTE: After some more discussion with other individuals, this might be too computationally intensive for "normal" use. It would be better to have a separate set of functions for working with grapheme clusters. How does it work in Swift?

Constructor

Calling str(b'asdf') should be equivalent to str(b'asdf', encoding='utf8'). If you really want to obtain "b'asdf'", use repr instead.

Iteration

Change str.__iter__ to yield grapheme clusters, not Unicode code points.

To access the sequence of Unicode code points, use str.codepoints.

Example:

s1 = '\N{LATIN SMALL LETTER A WITH GRAVE}'
print(s1)
# à
print(s1.codepoints)
# ['à']

s2 = '\N{LATIN SMALL LETTER A}\N{COMBINING GRAVE ACCENT}'
print(s2)
# à
print(s2.codepoints)
# ['a', '̀']

Containment

Containment should also be "grapheme-aware".

Example:

s1 = '\N{LATIN SMALL LETTER A WITH GRAVE}bc'
s2 = '\N{LATIN SMALL LETTER A}\N{COMBINING GRAVE ACCENT}bc'
print(s1)
# àbc
print(s2)
# àbc
print(s1 in s2)
# True

Length

String length is the number of graphemes, not codepoints.

Example:

s1 = '\N{LATIN SMALL LETTER A WITH GRAVE}bc'
s2 = '\N{LATIN SMALL LETTER A}\N{COMBINING GRAVE ACCENT}bc'
print(s1)
# àbc
print(s2)
# àbc
print(len(s1) == len(s2))
# True
print(len(s1.codepoints) == len(s2.codepoints))
# False

Regular expressions

Regular expressions should match on grapheme clusters as well, not codepoints.

Dicts

dict.__iter__ should yield key-value pairs (like dict.items), not keys. Better still, they shouldn't be iterable at all -- iter(dict) should be a TypeError.

This breaks symmetry with dict.__contains__, but I don't think anyone cares.

asyncio.PriorityQueue

asyncio.PriorityQueue documentation should reflect the fact that it 1) is a thin wrapper for heapq, and 2) as a result requires elements to be orderable with <.

For rough POC implementation of a "generic" priority queue that does not place this restriction on elements, see here.

Default magic methods

None of these should ever be implemented by default, except maybe __repr__. Default for __bool__ and __str__ are especially bad. They lead to surprising bugs and don't improve the language at all.

Annotating exceptions

There should be type annotation syntax for raising exceptions.

My proposals:

import math
from typing_extensions import raises

@raises(TypeError, ValueError)
def f(x: float, operation: str) -> int:
    """ Does a specified operation on x

    Example:
        x = 2
        f(x, 'recip') == 1/x
    """
    if x <= 0:
        raise TypeError('x must be positive')

    if operation == 'log':
        result = math.log(x)
    elif operation == 'recip':
        result = 1 / x
    else:
        raise ValueError('Unknown operation')

    return result
import math

def f(x: float, operation: str) -> int:
    __raises__: Union[TypeError, ValueError]
    """ Does a specified operation on x

    Example:
        x = 2
        f(x, 'recip') == 1/x
    """
    if x <= 0:
        raise TypeError('x must be positive')

    if operation == 'log':
        result = math.log(x)
    elif operation == 'recip':
        result = 1 / x
    else:
        raise ValueError('Unknown operation')

    return result

By Mark on the Python discord:

import math

def f(x: float, operation: str) -> (int, Union[TypeError, ValueError]):
    """ Does a specified operation on x

    Example:
        x = 2
        f(x, 'recip') == 1/x
    """
    if x <= 0:
        raise TypeError('x must be positive')

    if operation == 'log':
        result = math.log(x)
    elif operation == 'recip':
        result = 1 / x
    else:
        raise ValueError('Unknown operation')

    return result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment