The str
class should represent a sequence of grapheme clusters, not codepoints. This should lead to less-surprising behavior when
working with non-ASCII text.
NOTE: After some more discussion with other individuals, this might be too computationally intensive for "normal" use. It would be better to have a separate set of functions for working with grapheme clusters. How does it work in Swift?
Calling str(b'asdf')
should be equivalent to str(b'asdf', encoding='utf8')
. If you really want to obtain "b'asdf'"
, use repr
instead.
Change str.__iter__
to yield grapheme clusters, not Unicode code points.
To access the sequence of Unicode code points, use str.codepoints
.
Example:
s1 = '\N{LATIN SMALL LETTER A WITH GRAVE}'
print(s1)
# à
print(s1.codepoints)
# ['à']
s2 = '\N{LATIN SMALL LETTER A}\N{COMBINING GRAVE ACCENT}'
print(s2)
# à
print(s2.codepoints)
# ['a', '̀']
Containment should also be "grapheme-aware".
Example:
s1 = '\N{LATIN SMALL LETTER A WITH GRAVE}bc'
s2 = '\N{LATIN SMALL LETTER A}\N{COMBINING GRAVE ACCENT}bc'
print(s1)
# àbc
print(s2)
# àbc
print(s1 in s2)
# True
String length is the number of graphemes, not codepoints.
Example:
s1 = '\N{LATIN SMALL LETTER A WITH GRAVE}bc'
s2 = '\N{LATIN SMALL LETTER A}\N{COMBINING GRAVE ACCENT}bc'
print(s1)
# àbc
print(s2)
# àbc
print(len(s1) == len(s2))
# True
print(len(s1.codepoints) == len(s2.codepoints))
# False
Regular expressions should match on grapheme clusters as well, not codepoints.
dict.__iter__
should yield key-value pairs (like dict.items
), not keys. Better still, they shouldn't be iterable at all -- iter(dict)
should be a TypeError
.
This breaks symmetry with dict.__contains__
, but I don't think anyone cares.
asyncio.PriorityQueue
documentation should reflect the fact that it 1) is a thin wrapper for heapq
, and 2)
as a result requires elements to be orderable with <
.
For rough POC implementation of a "generic" priority queue that does not place this restriction on elements, see here.
None of these should ever be implemented by default, except maybe __repr__
. Default for __bool__
and __str__
are especially bad. They lead to surprising bugs and don't improve the language at all.
There should be type annotation syntax for raising exceptions.
My proposals:
import math
from typing_extensions import raises
@raises(TypeError, ValueError)
def f(x: float, operation: str) -> int:
""" Does a specified operation on x
Example:
x = 2
f(x, 'recip') == 1/x
"""
if x <= 0:
raise TypeError('x must be positive')
if operation == 'log':
result = math.log(x)
elif operation == 'recip':
result = 1 / x
else:
raise ValueError('Unknown operation')
return result
import math
def f(x: float, operation: str) -> int:
__raises__: Union[TypeError, ValueError]
""" Does a specified operation on x
Example:
x = 2
f(x, 'recip') == 1/x
"""
if x <= 0:
raise TypeError('x must be positive')
if operation == 'log':
result = math.log(x)
elif operation == 'recip':
result = 1 / x
else:
raise ValueError('Unknown operation')
return result
By Mark on the Python discord:
import math
def f(x: float, operation: str) -> (int, Union[TypeError, ValueError]):
""" Does a specified operation on x
Example:
x = 2
f(x, 'recip') == 1/x
"""
if x <= 0:
raise TypeError('x must be positive')
if operation == 'log':
result = math.log(x)
elif operation == 'recip':
result = 1 / x
else:
raise ValueError('Unknown operation')
return result