Skip to content

Instantly share code, notes, and snippets.

@terrdavis
Forked from leycec/beartype.py
Last active May 22, 2020 08:07
Show Gist options
  • Save terrdavis/1b23b7ff8023f55f627199b09cfa6b24 to your computer and use it in GitHub Desktop.
Save terrdavis/1b23b7ff8023f55f627199b09cfa6b24 to your computer and use it in GitHub Desktop.
`@beartype` Decorator and Unit Test Suite Thereof
#!/usr/bin/env python3
'''
`@beartype` decorator, implementing a rudimentary subset of PEP 484-style type
checking based on Python 3.x function annotations.
See Also
----------
https://stackoverflow.com/a/37961120/2809027
Stackoverflow answer introducing the `@beartype` decorator.
'''
# ....................{ MAIN }....................
# If the active Python interpreter is *NOT* optimized (e.g., option "-O" was
# *NOT* passed to this interpreter), enable type checking.
if __debug__:
import inspect
from functools import wraps
from inspect import Parameter, Signature
def beartype(func: callable) -> callable:
'''
Decorate the passed **callable** (e.g., function, method) to validate
both all annotated parameters passed to this callable _and_ the
annotated value returned by this callable if any.
This decorator performs rudimentary type checking based on Python 3.x
function annotations, as officially documented by PEP 484 ("Type
Hints"). While PEP 484 supports arbitrarily complex type composition,
this decorator requires _all_ parameter and return value annotations to
be either:
* Classes (e.g., `int`, `OrderedDict`).
* Tuples of classes (e.g., `(int, OrderedDict)`).
If optimizations are enabled by the active Python interpreter (e.g., due
to option `-O` passed to this interpreter), this decorator is a noop.
Raises
----------
NameError
If any parameter has the reserved name `__beartype_func`.
TypeError
If either:
* Any parameter or return value annotation is neither:
* A type.
* A tuple of types.
* The kind of any parameter is unrecognized. This should _never_
happen, assuming no significant changes to Python semantics.
'''
# Raw string of Python statements comprising the body of this wrapper,
# including (in order):
#
# * A "@wraps" decorator propagating the name, docstring, and other
# identifying metadata of the original function to this wrapper.
# * A private "__beartype_func" parameter initialized to this function.
# In theory, the "func" parameter passed to this decorator should be
# accessible as a closure-style local in this wrapper. For unknown
# reasons (presumably, a subtle bug in the exec() builtin), this is
# not the case. Instead, a closure-style local must be simulated by
# passing the "func" parameter to this function at function
# definition time as the default value of an arbitrary parameter. To
# ensure this default is *NOT* overwritten by a function accepting a
# parameter of the same name, this edge case is tested for below.
# * Assert statements type checking parameters passed to this callable.
# * A call to this callable.
# * An assert statement type checking the value returned by this
# callable.
#
# While there exist numerous alternatives (e.g., appending to a list or
# bytearray before joining the elements of that iterable into a string),
# these alternatives are either slower (as in the case of a list, due to
# the high up-front cost of list construction) or substantially more
# cumbersome (as in the case of a bytearray). Since string concatenation
# is heavily optimized by the official CPython interpreter, the simplest
# approach is (curiously) the most ideal.
func_body = '''
@wraps(__beartype_func)
def func_beartyped(*args, __beartype_func=__beartype_func, **kwargs):
missing = object()
'''
# "inspect.Signature" instance encapsulating this callable's signature.
func_sig = inspect.signature(func)
# Human-readable name of this function for use in exceptions.
func_name = func.__name__ + '()'
# For the name of each parameter passed to this callable and the
# "inspect.Parameter" instance encapsulating this parameter (in the
# passed order)...
for func_arg_index, func_arg in enumerate(func_sig.parameters.values()):
# If this callable redefines a parameter initialized to a default
# value by this wrapper, raise an exception. Permitting this
# unlikely edge case would permit unsuspecting users to
# "accidentally" override these defaults.
if func_arg.name == '__beartype_func':
raise NameError(
'Parameter {} reserved for use by @beartype.'.format(
func_arg.name))
# If this parameter is both annotated and non-ignorable for purposes
# of type checking, type check this parameter.
if (func_arg.annotation is not Parameter.empty and
func_arg.kind not in _PARAMETER_KIND_IGNORED):
# Validate this annotation.
_check_type_annotation(
annotation=func_arg.annotation,
label='{} parameter {} type'.format(
func_name, func_arg.name))
# String evaluating to this parameter's annotated type.
func_arg_type_expr = (
"getattr(__beartype_func.__annotations__[{0!r}], '__origin__', None) "
"or __beartype_func.__annotations__[{0!r}]".format(func_arg.name))
# String evaluating to this parameter's current value when
# passed as a keyword.
func_arg_value_key_expr = 'kwargs.get({!r}, missing)'.format(func_arg.name)
# If this parameter is keyword-only, type check this parameter
# only by lookup in the variadic "**kwargs" dictionary.
if func_arg.kind is Parameter.KEYWORD_ONLY:
func_body += '''
{arg_name}_type = {arg_type_expr}
{arg_name}_value = {arg_value_key_expr}
if {arg_name}_value is not missing and not isinstance({arg_name}_value, {arg_name}_type):
raise TypeError(
'{func_name} keyword-only parameter {arg_name}={{}} not a {{!r}}'.format(
{arg_name}_value, {arg_name}_type))
'''.format(
func_name=func_name,
arg_name=func_arg.name,
arg_type_expr=func_arg_type_expr,
arg_value_key_expr=func_arg_value_key_expr,
)
# Else, this parameter may be passed either positionally or as
# a keyword. Type check this parameter both by lookup in the
# variadic "**kwargs" dictionary *AND* by index into the
# variadic "*args" tuple.
else:
# String evaluating to this parameter's current value when
# passed positionally.
func_arg_value_pos_expr = 'args[{!r}]'.format(
func_arg_index)
func_body += '''
type_{arg_index} = {arg_type_expr}
value_{arg_index} = {arg_value_pos_expr} if {arg_index} < len(args) else {arg_value_key_expr}
if value_{arg_index} is not missing and not isinstance(value_{arg_index}, type_{arg_index}):
raise TypeError('{func_name} parameter {arg_name}={{}} not of {{!r}}'.format(
value_{arg_index}, type_{arg_index}))
'''.format(
func_name=func_name,
arg_name=func_arg.name,
arg_index=func_arg_index,
arg_type_expr=func_arg_type_expr,
arg_value_key_expr=func_arg_value_key_expr,
arg_value_pos_expr=func_arg_value_pos_expr,
)
# If this callable's return value is both annotated and non-ignorable
# for purposes of type checking, type check this value.
if func_sig.return_annotation not in _RETURN_ANNOTATION_IGNORED:
# Validate this annotation.
_check_type_annotation(
annotation=func_sig.return_annotation,
label='{} return type'.format(func_name))
# Strings evaluating to this parameter's annotated type and
# currently passed value, as above.
func_return_type_expr = (
"getattr(__beartype_func.__annotations__['return'], '__origin__', None) "
"or __beartype_func.__annotations__['return']")
# Call this callable, type check the returned value, and return this
# value from this wrapper.
func_body += '''
return_value = __beartype_func(*args, **kwargs)
return_type = {return_type}
if not isinstance(return_value, return_type):
raise TypeError('{func_name} return value {{}} not of {{!r}}'.format(return_value, return_type))
return return_value
'''.format(func_name=func_name, return_type=func_return_type_expr)
# Else, call this callable and return this value from this wrapper.
else:
func_body += '''
return __beartype_func(*args, **kwargs)
'''
# Dictionary mapping from local attribute name to value. For efficiency,
# only those local attributes explicitly required in the body of this
# wrapper are copied from the current namespace. (See below.)
local_attrs = {'__beartype_func': func}
# Dynamically define this wrapper as a closure of this decorator. For
# obscure and presumably uninteresting reasons, Python fails to locally
# declare this closure when the locals() dictionary is passed; to
# capture this closure, a local dictionary must be passed instead.
exec(func_body, globals(), local_attrs)
# Return this wrapper.
return local_attrs['func_beartyped']
_PARAMETER_KIND_IGNORED = {
Parameter.POSITIONAL_ONLY, Parameter.VAR_POSITIONAL, Parameter.VAR_KEYWORD,
}
'''
Set of all `inspect.Parameter.kind` constants to be ignored during
annotation- based type checking in the `@beartype` decorator.
This includes:
* Constants specific to variadic parameters (e.g., `*args`, `**kwargs`).
Variadic parameters cannot be annotated and hence cannot be type checked.
* Constants specific to positional-only parameters, which apply to non-pure-
Python callables (e.g., defined by C extensions). The `@beartype`
decorator applies _only_ to pure-Python callables, which provide no
syntactic means of specifying positional-only parameters.
'''
_RETURN_ANNOTATION_IGNORED = {Signature.empty, None}
'''
Set of all annotations for return values to be ignored during annotation-
based type checking in the `@beartype` decorator.
This includes:
* `Signature.empty`, signifying a callable whose return value is _not_
annotated.
* `None`, signifying a callable returning no value. By convention, callables
returning no value are typically annotated to return `None`. Technically,
callables whose return values are annotated as `None` _could_ be
explicitly checked to return `None` rather than a none-`None` value. Since
return values are safely ignorable by callers, however, there appears to
be little real-world utility in enforcing this constraint.
'''
def _check_type_annotation(annotation: object, label: str) -> None:
'''
Validate the passed annotation to be a valid type supported by the
`@beartype` decorator.
Parameters
----------
annotation : object
Annotation to be validated.
label : str
Human-readable label describing this annotation, interpolated into
exceptions raised by this function.
Raises
----------
TypeError
If this annotation is neither a new-style class nor a tuple of
new-style classes.
'''
# If this annotation is a tuple, raise an exception if any member of
# this tuple is not a new-style class. Note that the "__name__"
# attribute tested below is not defined by old-style classes and hence
# serves as a helpful means of identifying new-style classes.
if isinstance(annotation, tuple):
for member in annotation:
if not (
hasattr(member, '__origin__') or
isinstance(member, type) and hasattr(member, '__name__')):
raise TypeError(
'{} tuple member {} not a new-style class'.format(
label, member))
# Else if this annotation is not a new-style class, raise an exception.
elif not (
hasattr(annotation, '__origin__') or
isinstance(annotation, type) and hasattr(annotation, '__name__')):
raise TypeError(
'{} {} neither a new-style class nor '
'tuple of such classes'.format(label, annotation))
# Else, the active Python interpreter is optimized. In this case, disable type
# checking by reducing this decorator to the identity decorator.
else:
def beartype(func: callable) -> callable:
return func
#!/usr/bin/env python3
'''
`py.test`-driven unit test suite for the `@beartype` decorator, implementing a
rudimentary subset of PEP 484-style type checking based on Python 3.x function
annotations.
Usage
----------
These tests assume the `@beartype` decorator and all utility functions (e.g.,
`_check_type_annotation()`) and globals (e.g., `_PARAMETER_KIND_IGNORED`)
required by this decorator to reside in a top-level module named `beartype`. If
this is the case, these tests may be run as is with:
$ py.test -k test_beartype
See Also
----------
https://stackoverflow.com/a/37961120/2809027
Stackoverflow answer introducing the `@beartype` decorator.
'''
# ....................{ IMPORTS }....................
import pytest
# ....................{ TESTS }....................
def test_beartype_noop() -> None:
'''
Test bear typing of a function with no function annotations, reducing to
_no_ type checking.
'''
# Import this decorator.
from beartype import beartype
# Unannotated function to be type checked.
@beartype
def khorne(gork, mork):
return gork + mork
# Call this function and assert the expected return value.
assert khorne('WAAAGH!', '!HGAAAW') == 'WAAAGH!!HGAAAW'
# ....................{ TESTS ~ pass : param }....................
def test_beartype_pass_param_keyword_and_positional() -> None:
'''
Test bear typing of a function call successfully passed both annotated
positional and keyword parameters.
'''
# Import this decorator.
from beartype import beartype
# Function to be type checked.
@beartype
def slaanesh(daemonette: str, keeper_of_secrets: str) -> str:
return daemonette + keeper_of_secrets
# Call this function with both positional and keyword arguments and assert
# the expected return value.
assert slaanesh(
'Seeker of Decadence', keeper_of_secrets="N'Kari") == (
"Seeker of DecadenceN'Kari")
def test_beartype_pass_param_keyword_only() -> None:
'''
Test bear typing of a function call successfully passed an annotated
keyword-only parameter following an `*` or `*args` parameter.
'''
# Import this decorator.
from beartype import beartype
# Function to be type checked.
@beartype
def changer_of_ways(sky_shark: str, *, chaos_spawn: str) -> str:
return sky_shark + chaos_spawn
# Call this function with keyword arguments and assert the expected return
# value.
assert changer_of_ways(
'Screamers', chaos_spawn="Mith'an'driarkh") == (
"ScreamersMith'an'driarkh")
def test_beartype_pass_param_tuple() -> None:
'''
Test bear typing of a function call successfully passed a parameter
annotated as a tuple.
'''
# Import this decorator.
from beartype import beartype
# Function to be type checked.
@beartype
def genestealer(tyranid: str, hive_fleet: (str, int)) -> str:
return tyranid + str(hive_fleet)
# Call this function with each of the two types listed in the above tuple.
assert genestealer(
'Norn-Queen', hive_fleet='Behemoth') == 'Norn-QueenBehemoth'
assert genestealer(
'Carnifex', hive_fleet=0xDEADBEEF) == 'Carnifex3735928559'
def test_type_check_pass_param_custom() -> None:
'''
Test bear typing of a function call successfully passed a parameter
annotated as a user-defined rather than builtin type.
'''
# Import this decorator.
from beartype import beartype
# User-defined type.
class CustomTestStr(str):
pass
# Function to be type checked.
@beartype
def hrud(gugann: str, delphic_plague: CustomTestStr) -> str:
return gugann + delphic_plague
# Call this function with each of the two types listed in the above tuple.
assert hrud(
'Troglydium hruddi', delphic_plague=CustomTestStr('Delphic Sink')) == (
'Troglydium hruddiDelphic Sink')
def test_type_check_pass_typing_module() -> None:
'''
Test bear typing of a function call successfully passed a parameter
annotated with an abstract type from the typing module.
'''
from beartype import beartype
import typing
MyMap = typing.Mapping
@beartype
def function(par: MyMap, ameter: MyMap) -> MyMap:
result = par.copy()
result.update(ameter)
return result
assert function({1:1}, {2:2}) == {1:1, 2:2}
def test_type_check_pass_parameterized_typing_module() -> None:
'''
Test bear typing of a function call successfully passed a parameter
annotated with a parametirized abstract type from the typing module.
'''
from beartype import beartype
import typing
MyMap = typing.Mapping[str, int]
@beartype
def function(par: MyMap, ameter: MyMap) -> MyMap:
result = par.copy()
result.update(ameter)
return result
assert function({1:1}, {2:2}) == {1:1, 2:2}
# ....................{ TESTS ~ pass : return }....................
def test_type_check_pass_return_none() -> None:
'''
Test bear typing of a function call successfully returning `None` and
annotated as such.
'''
# Import this decorator.
from beartype import beartype
# Function to be type checked.
@beartype
def xenos(interex: str, diasporex: str) -> None:
interex + diasporex
# Call this function and assert no value to be returned.
assert xenos(
'Luna Wolves', diasporex='Iron Hands Legion') is None
# ....................{ TESTS ~ fail }....................
def test_beartype_fail_keyword_unknown() -> None:
'''
Test bear typing of an annotated function call passed an unrecognized
keyword parameter.
'''
# Import this decorator.
from beartype import beartype
# Annotated function to be type checked.
@beartype
def tau(kroot: str, vespid: str) -> str:
return kroot + vespid
# Call this function with an unrecognized keyword parameter and assert the
# expected exception.
with pytest.raises(TypeError) as exception:
tau(kroot='Greater Good', nicassar='Dhow')
# For readability, this should be a "TypeError" synopsizing the exact issue
# raised by the Python interpreter on calling the original function rather
# than a "TypeError" failing to synopsize the exact issue raised by the
# wrapper type-checking the original function. Since the function
# annotations defined above guarantee that the exception message of the
# latter will be suffixed by "not a str", ensure this is *NOT* the case.
assert not str(exception.value).endswith('not a str')
def test_beartype_fail_param_name() -> None:
'''
Test bear typing of a function accepting a parameter name reserved for
use by the `@beartype` decorator.
'''
# Import this decorator.
from beartype import beartype
# Define a function accepting a reserved parameter name and assert the
# expected exception.
with pytest.raises(NameError):
@beartype
def jokaero(weaponsmith: str, __beartype_func: str) -> str:
return weaponsmith + __beartype_func
# ....................{ TESTS ~ fail : type }....................
def test_beartype_fail_param_type() -> None:
'''
Test bear typing of an annotated function call failing a parameter type
check.
'''
# Import this decorator.
from beartype import beartype
# Annotated function to be type checked.
@beartype
def eldar(isha: str, asuryan: (str, int)) -> str:
return isha + asuryan
# Call this function with an invalid type and assert the expected exception.
with pytest.raises(TypeError):
eldar('Mother of the Eldar', 100.100)
def test_beartype_fail_return_type() -> None:
'''
Test bear typing of an annotated function call failing a return type
check.
'''
# Import this decorator.
from beartype import beartype
# Annotated function to be type checked.
@beartype
def necron(star_god: str, old_one: str) -> str:
return 60e6
# Call this function and assert the expected exception.
with pytest.raises(TypeError):
necron("C'tan", 'Elder Thing')
# ....................{ TESTS ~ fail : annotation }....................
def test_beartype_fail_annotation_param() -> None:
'''
Test bear typing of a function with an unsupported parameter annotation.
'''
# Import this decorator.
from beartype import beartype
# Assert the expected exception from attempting to type check a function
# with a parameter annotation that is *NOT* a type.
with pytest.raises(TypeError):
@beartype
def nurgle(nurgling: str, great_unclean_one: 'Bringer of Poxes') -> str:
return nurgling + great_unclean_one
def test_beartype_fail_annotation_return() -> None:
'''
Test bear typing of a function with an unsupported return annotation.
'''
# Import this decorator.
from beartype import beartype
# Assert the expected exception from attempting to type check a function
# with a return annotation that is *NOT* a type.
with pytest.raises(TypeError):
@beartype
def tzeentch(disc: str, lord_of_change: str) -> 'Player of Games':
return disc + lord_of_change
@gilbertohasnofb
Copy link

Shouldn't the second print below raise an exception, since it is not a list of floats (List[float])? It seems to me that beartype is not checking the contents of the list.

from typing import List

@beartype
def foobar(l: List[float]) -> bool:
    return len(l) > 0

print(foobar([1.3, 5.4, 3.8]))
print(foobar([1, None, True]))

@terrdavis
Copy link
Author

@gilbertohasnofb
The stackoverflow answer introducing this addresses your question in the section (Caveat II: No PEP 484):
https://stackoverflow.com/questions/19684434/best-way-to-check-function-arguments-in-python/37961120#37961120

Caveat II: No PEP 484

PEP 484 ("Type Hints") formalized the use of function annotations first introduced by PEP 3107 ("Function Annotations"). Python 3.5 superficially supports this formalization with a new top-level typing module, a standard API for composing arbitrarily complex types from simpler types (e.g., Callable[[Arg1Type, Arg2Type], ReturnType], a type describing a function accepting two arguments of type Arg1Type and Arg2Type and returning a value of type ReturnType).

Bear typing supports none of them. In theory, it could. But not in 275 lines or less and certainly not as a stackoverflow answer.

Bear typing does, however, support unions of types in the same way that the isinstance() builtin supports unions of types: as tuples. This superficially corresponds to the typing.Union type – with the obvious caveat that typing.Union supports arbitrarily complex types, while tuples accepted by @beartype support only simple classes. In my defense, 275 lines.

@albanie
Copy link

albanie commented Mar 23, 2020

This is a fantastic tool. Although it is beyond the scope of the original SO answer by @leycec, it would be tremendously useful to support complex types (beyond using tuples for Union). 🙏

@leycec
Copy link

leycec commented Apr 1, 2020

Fascinating. I'm delighted that @beartype continues to draw collective interest from the community at large. This is PR speak for: "Fuck, yeah!"

Since my initial posting, I've significantly expanded upon the initial implementation in dark and forbidden secrecy. I wasn't convinced that anyone else was actually interested in a lightweight, pure-Python, efficiency-driven type checking solution, so I sequestered my work within the smelly guts of a multiphysics biology simulator co-authored by my wife and I. "So it goes," to paraphrase the Vonnegut.

Perhaps... I was wrong. If there is sufficient interest, extracting that prior work into a reusable open-source PyPI-hosted package only seems sensible. Proper support for the subset of PEP 484 that can be efficiently implemented in pure Python would definitely be on the table, too. Significant questions include:

  • @beartype? It's a unique name to be sure. I like it – and that certainly matters. But do you like it? Mash that Japanese Ogre emoji if the current name makes you want to punch me in my bald(ing) head: 👹
  • PEP 484. Which complex types exactly are of interest? I'm not as familiar with PEP 484 as I should be, 'cause... I just @beartype.
  • Container types. @beartype intentionally ignores the contents of non-string containers (e.g., dict, list, tuple). Why? Because validating the types of all n items in an arbitrarily large container is an O(n) operation in pure Python for each function and method call accepting that container – and that's optimistically assuming those items are all non-container primitives (e.g., str, int, float). Since that's a bit insane, I'm unconvinced that can be done efficiently. But maybe it either can (e.g., by only doing so under JITs like PyPy3 or Numba or by implicitly converting containers type-hinted as containing only primitive types like List[Int] into comparable NumPy arrays and then catching raised exceptions, which might be reasonably efficient assuming NumPy does so in low-level C or Fortran or [insert language here], or by only doing so once for the first function or method call accepting that container and then internally caching the result, except you then need to invalidate that cache on the next modification to that container and I have no idea if that's even pragmatically feasible) or nobody actually cares about efficiency because this is Python (except they usually do in practice, despite what they say in theory). Moreover, there be subtle dragons here. Consider self-referential nested containers like linked list loops, for example. Guarding against that sort of devilry requires careful consideration and lots of unit tests. Lots and lots and loooooots of unit tests. At the time, I couldn't be bothered. Now that other developers maybe care, though, I wonder if this dangerous and risky feature request isn't worth a second look?

Hit me with your best commentary, friendly GitHubbers.

@albanie
Copy link

albanie commented Apr 1, 2020

  1. The name is great.
  2. The ones I tend to use frequently are typing.Dict, typing.List, typing.Union, typing.Tuple, typing.Set in their nested forms (not sure if this is typical). It seems to help me a lot with reading code that processes nested data structures.
  3. I agree with the concerns about runtime. It depends somewhat on your philosophy, but would it be criminal to suggest a "lightweight mode" which we only validate the first (or first, middle and last) element of each container? A second option would be to randomly sample a fixed number of items from the container to validate so that we get the right behaviour in expectation. I would happily use these over non-checks in code on a critical path. In both cases, there would admittedly be a challenge in communicating the behaviour to the user. One approach could be to pick an appropriate decorator argument (e.g. @beartype(mode="light")).

@leycec
Copy link

leycec commented Apr 2, 2020

@albanie: Noice. In a time of economic bloodletting and viral fire, this is soothing mind-candy for the perennially news-addled. Thanks a metric ton for that rapid and boisterous response. (This must be why you're labelled PRO.)

  1. I can only concur. Type-checking Canadian brown bears for all! 🐻
  2. Excellent. Indeed, the self-documenting nature of nested typing (e.g., def muh_undocumented_func(muh_list: List[Dict[int, str]], muh_option: Optional[Tuple[float, ...]]) -> Union[str, List[str]]) would be almost as useful as the hard contractual guarantees afforded by type-checking. Efficiency aside, the gain in readability and maintainability is clear. Since all but one of the typing types you've listed describe containers, this leads us straightaway to...
  3. No, Sir. That would not be criminal; that would rather be sane. Because you make excellent and useful suggestions, let's break this out into a deeper dive.

Deep container type-checking for the posterity of future generations

First, I wonder if anyone who's really thought the time complexity implications through would even want a full-blown "heavyweight mode."

Exhibit A: a @beartype(mode="heavy") option forcing arbitrarily large and nested containers to be deeply and repeatedly type-checked on each decorated call. Personally, I might consider that tantamount to a security risk; I'm unsure I'd ever want that. Uninformed and/or intentionally malicious external callers could bring an innocent system to its computational knees merely by passing a sufficiently large data structure to a @beartype(mode="heavy")-decorated callable – unwittingly increasing attack surface to trivial denial-of-service (DoS) exploits. Of course, that's only under CPython. Under Cython & PyPy & JIT Friends, deeply and repeatedly type-checking arbitrarily large containers on each decorated call might be a bit more tractable... might. I'd still personally avoid it, especially on front-facing applications accepting user input. Nobody wants that awkward call from Legal during a Saturday-night bender spree.

Second, that leaves @beartype(mode="light"). I actually like both suggestions (with only mild caveats):

  1. Validate only k non-randomly-sampled items, where k is some negligibly small magic number. As you say, k=3 seems good and sane. For integer-indexable sequences (e.g., list, tuple), validating only the first, middle, and last items ala Quicksort-style partitioning algorithms also seems good and sane. For non-integer-indexable sequences (e.g., dict, set), validating only the first three key-value pairs or what not also seems good and sane. For other iterables, care would need to be taken to avoid consuming one-time-only non-reusable constructs like generators and iterators. That's probably just implementation details, so... no concerning concerns here. Yay!
  2. Validate only j randomly sampled items, where j is again some negligibly small magic number. In theory, this seems preferable; given a sufficient number of @beartype(mode="light")-decorated calls passed the same container, random sampling should eventually increase statistical significance to the point where you've got a quasi-hard contractual guarantee. In practice, the algorithmic devil's in the details.

The naive implementation simply calls the stdlib muh_random_sample = random.sample(muh_original_container, k=3) function to obtain a new list muh_random_sample containing three randomly sampled items shallow-copied from muh_original_container, which the decorator would then type-check. Of course, this naive implementation is awful. I'd hoped that random.sample() would have been implemented in C. Sadly, it isn't. For k=3, that function creates and returns one new list of size 3 and, depending on whether len(muh_original_container) >= 21, internally creates either another new list of size len(muh_original_container) for a single O(k) loop or one new set of size 3 for a double O(k**2) loop. In either case, I'm mostly unimpressed. (This also ignores the _randbelow() function repeatedly called by random.sample(), which looks to have awkward worst-case time complexity.)

The non-naive implementation avoids instantiating or iterating over containers during type-checking by:

  • Manually generating three possibly duplicate random indices in [0, n] for n = len(muh_original_container). Filtering out duplicates only needlessly consumes time and space itself and becomes increasingly irrelevant as container size increases – in theory, anyway. The trial by fire here is doing so efficiently. For such a core module, stdlib random is disturbingly inefficient. If there's one thing you want to optimize, it's core data structures. If there's two things, it's core data structures and random number generation. The private C-based random._random.Random superclass seems efficient and awesome, but the public mostly pure-Python random.Random subclass inheriting that superclass... oh, boy. For our purposes, the only "safe" callables in the random namespace appear to be the C-based random() and getrandbits() bound methods. Everything else is absolute cray-cray. If you like feeling depressed, read more about this here. Given that, the optimal method for generating three pseudo-random integers in Python (ignoring typically non-ignorable issues like distribution skewing) is: drunken drum roll, please
n = len(muh_container)
n_bit_len = n.bit_length()
muh_first_random_index = random.getrandbits(n_bit_len) % n
muh_second_random_index = random.getrandbits(n_bit_len) % n
muh_third_random_index = random.getrandbits(n_bit_len) % n
  • Manually type-checking the items at these indices without iteration.

Unhappily, the optimal method for generating three pseudo-random integers in Python still sucks. Five assignments, five method calls, and three modulo operations for each container for each decorated call is... unseemly.

Happily, nobody probably cares whether items are randomly or deterministically selected. So, the trivial first, middle, and last implementation (which still suffers edge cases for empty containers and containers containing only one or two items) seems optimal-ish.

It's late here in Ontario, so let's say for the sake of my sleep-deprived brainpan that that's the best we can possibly do.

@leycec
Copy link

leycec commented Apr 3, 2020

[note to future self] My wife and I also resolved this prior hypothetical:

Validating the types of all n items in an arbitrarily large container... can be done efficiently... by only doing so once for the first function or method call accepting that container and then internally caching the result, except you then need to invalidate that cache on the next modification to that container and I have no idea if that's even pragmatically feasible.

That is pragmatically feasible – but only be defining @beartype-specific container subclasses inheriting stdlib container classes (e.g., class bearlist(list), class beardict(dict)), which then override various dunder methods to set internal dirty flags on external modifications. Type-checking the contents of those containers is then an O(n) operation amortized over all decorated calls rather than an O(n) operation for each decorated call, which seems more than tractable.

The obvious disadvantage is that callers would then need to instantiate non-standard bearlist and beardict subclasses rather than standard list and dict classes throughout the entirety of their codebase to see any meaningful benefit. Would anyone actually do that? Probably not, but a basement-dwelling neckbeard can still dream. 😴

Up next: a new GitHub-hosted beartype project probably leveraging a post-modern poetry workflow. It's time to sling some code, folks.

@leycec
Copy link

leycec commented Apr 3, 2020

A placeholder beartype GitHub organization and beartype GitHub repository now live!

The existing implementation hidden in the BETSE codebase is our initial launching pad – with full PEP 484 compliance as our eventual end goal. I'm confident we can hit most of the goalposts, but only scarce volunteer time and copious pull requests will tell. If anyone has any further ideas (grandiose, practical, or otherwise), please do abuse our issue tracker as your own personal whiteboard.

dis goin' be fun 🎉

@albanie
Copy link

albanie commented Apr 3, 2020

This looks great. Another argument in favour of deterministic checking could be: possibly not everyone fixes random seeds at the start of their programs, so it may be a benefit if, by default, they hit the same type error twice in a row when re-running their code.

Also, thanks for the rng link, this is useful knowledge :)

@leycec
Copy link

leycec commented Apr 4, 2020

Right? Now I know that random is so randomly balls-to-the-walls that I no longer trust it. Oh, precious ignorance: how I sorely miss thee.

[Mildly on-topic] I'm currently packaging poetry for Gentoo Linux and faceplanting into unexpected caltrops. I still believe that poetry is the only sane path forward for Python packaging, but... dey sure ain't makin' a brother's life any easier. Once I've dispatched that temporary villain, the cleansing flame of @beartype will surely be ignited for all to bask in. 🔥

@leycec
Copy link

leycec commented Apr 4, 2020

...I no longer believe that poetry is the only sane path forward for Python packaging. In fact, I now believe that poetry is fundamentally insane. Why? Because poetry itself is hostile to packaging by system package managers (e.g., apt, brew, emerge, pacman). Thanks to that, @beartype will (at least initially) leverage the standard setuptools workflow. I may hate setuptools, but I hate poetry even more. So much rage. Cue bloody-face Doom guy. :rage4:

Expect sweet, sweet GitHub action on Monday. For now, we play video games and consume legal refreshments. Friday night, I summon you!

@leycec
Copy link

leycec commented May 21, 2020

beartype 0.1.0 is now live (and maybe worky) on PyPi, much to the astonishment of my weary brainpan: 😪

pip3 install beartype

Partial PEP 484 compliance (e.g., typing.Union, typing.Optional) is planned for the eventual 1.0.x release cycle. Although that may be just a mote in God's eye at the moment, equivalent functionality to:

Thanks again for all the interest, all! Let's make runtime type checking in pure Python the envy of the static type checking world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment