gwerbin/ideas.md

## ideas.md

      
    Raw
  

              ideas.md
            
          
    Freenode #python rewrites Python

Text

The str class should represent a sequence of grapheme clusters, not codepoints. This should lead to less-surprising behavior when
working with non-ASCII text.
NOTE: After some more discussion with other individuals, this might be too computationally intensive for "normal" use. It would be better to have a separate set of functions for working with grapheme clusters. How does it work in Swift?
Constructor

Calling str(b'asdf') should be equivalent to str(b'asdf', encoding='utf8'). If you really want to obtain "b'asdf'", use repr instead.
Iteration

Change str.__iter__ to yield grapheme clusters, not Unicode code points.
To access the sequence of Unicode code points, use str.codepoints.
Example:
s1 = '\N{LATIN SMALL LETTER A WITH GRAVE}'
print(s1)
# à
print(s1.codepoints)
# ['à']

s2 = '\N{LATIN SMALL LETTER A}\N{COMBINING GRAVE ACCENT}'
print(s2)
# à
print(s2.codepoints)
# ['a', '̀']
Containment

Containment should also be "grapheme-aware".
Example:
s1 = '\N{LATIN SMALL LETTER A WITH GRAVE}bc'
s2 = '\N{LATIN SMALL LETTER A}\N{COMBINING GRAVE ACCENT}bc'
print(s1)
# àbc
print(s2)
# àbc
print(s1 in s2)
# True
Length

String length is the number of graphemes, not codepoints.
Example:
s1 = '\N{LATIN SMALL LETTER A WITH GRAVE}bc'
s2 = '\N{LATIN SMALL LETTER A}\N{COMBINING GRAVE ACCENT}bc'
print(s1)
# àbc
print(s2)
# àbc
print(len(s1) == len(s2))
# True
print(len(s1.codepoints) == len(s2.codepoints))
# False
Regular expressions

Regular expressions should match on grapheme clusters as well, not codepoints.
Dicts

dict.__iter__ should yield key-value pairs (like dict.items), not keys. Better still, they shouldn't be iterable at all -- iter(dict) should be a TypeError.
This breaks symmetry with dict.__contains__, but I don't think anyone cares.
asyncio.PriorityQueue

asyncio.PriorityQueue documentation should reflect the fact that it 1) is a thin wrapper for heapq, and 2)
as a result requires elements to be orderable with <.
For rough POC implementation of a "generic" priority queue that does not place this restriction on elements, see here.
Default magic methods

None of these should ever be implemented by default, except maybe __repr__. Default for __bool__ and __str__ are especially bad. They lead to surprising bugs and don't improve the language at all.
Annotating exceptions

There should be type annotation syntax for raising exceptions.
My proposals:
import math
from typing_extensions import raises

@raises(TypeError, ValueError)
def f(x: float, operation: str) -> int:
    """ Does a specified operation on x

    Example:
        x = 2
        f(x, 'recip') == 1/x
    """
    if x <= 0:
        raise TypeError('x must be positive')

    if operation == 'log':
        result = math.log(x)
    elif operation == 'recip':
        result = 1 / x
    else:
        raise ValueError('Unknown operation')

    return result
import math

def f(x: float, operation: str) -> int:
    __raises__: Union[TypeError, ValueError]
    """ Does a specified operation on x

    Example:
        x = 2
        f(x, 'recip') == 1/x
    """
    if x <= 0:
        raise TypeError('x must be positive')

    if operation == 'log':
        result = math.log(x)
    elif operation == 'recip':
        result = 1 / x
    else:
        raise ValueError('Unknown operation')

    return result

By Mark on the Python discord:
import math

def f(x: float, operation: str) -> (int, Union[TypeError, ValueError]):
    """ Does a specified operation on x

    Example:
        x = 2
        f(x, 'recip') == 1/x
    """
    if x <= 0:
        raise TypeError('x must be positive')

    if operation == 'log':
        result = math.log(x)
    elif operation == 'recip':
        result = 1 / x
    else:
        raise ValueError('Unknown operation')

    return result