nascheme/cve-2020-10735-faq.txt

## cve-2020-10735-faq.txt
Why is considered a security bug?
---------------------------------

Or, certain computing operations can take a long time, depending on the size of
input data.  Why is this specific issue considered a security bug?

It is quite common for Python code implementing network protocols and data
serialization to do int(untrusted_string_or_bytes_value) on input to get a
numeric value, without having limited the input length or to do log("processing
thing id %s", unknowingly_huge_integer) or any similar concept to convert an
int to a string without first checking its magnitude. (http, json, xmlrpc,
logging, loading large values into integer via linear-time conversions such as
hexadecimal stored in yaml, or anything computing larger values based on user
controlled inputs… which then wind up attempting to output as decimal later
on). All of these can suffer a CPU consuming DoS in the face of untrusted data.

The size of input data required to trigger long processing times is large but
not larger than what could be allowed for common network software and services.
e.g. a few megabytes of input data could cause multiple seconds of processing
time.  For an operation that would normally complete on the order of milli or
micro seconds, that can effectively be a DoS of the service.


Why was it required to be changed in a bugfix release?
------------------------------------------------------

Auditing all existing Python code for this problem, adding length guards, and
maintaining that practice everywhere is not feasible nor is it what we deem the
vast majority of our users want to do.

For the vastly smaller set of users and software that require the old behavior
of unlimited digits, it can be enabled by a global system environment variable
(PYTHONINTMAXSTRDIGITS=0) or by calling sys.set_int_max_str_digits(0).


Why not instead of changing int(), add limited_int()?
-----------------------------------------------------

As above, changing all existing code to use `limited_int()` would be a huge
task and something we deem the vast majority of our users don't want to do.
Also, some mitigation would needed for the int-to-str case.


Why choose 4300 as default limit?
---------------------------------

It was choosen as a limit that's high enough to allow commonly used libraries
to work correctly and low enough that even relatively weak CPUs will not be
vulnerable to a DoS attack.  It is fairly simple to increase the limit with a
global environment setting (PYTHONINTMAXSTRDIGITS=400000), e.g. if you have
a fast CPU.


That limit seems too low, why not something higher?
---------------------------------------------------

Any limit is likely going to break some code somewhere.  It expected there is a
"long tail" distribution in effect and so a limit of 10x the current would only
allow slightly more code to work.  It is expected that a vast majority of code
will work fine with the default limit.  For the code that isn't fine, it is
better to let the limit be disabled or set to something that's appropriate for
that usage.  In that case, the limit is unlikely to be suitable as an
out-of-the-box default.


Why global interpreter setting rather than something else?
----------------------------------------------------------

First, a global interpreter setting was the least complex to implement and
causes the least risk when backporting to older Python versions.  Second, for
most cases, is expected that fine-grained control of the limit is not required.
Either the default global limit is okay or a new global limit would be set.  It
is possible that new versions of Python, like 3.12 will have ways to set the
limit at a finer level (e.g. context manager).


Why not keyword parameter of int?
---------------------------------

Having a keyword that defaults to the "safe" or limited mode would be an option
but there is no convienient keyword that could be used for the int-to-str case.
So, the global intepreter setting is the simple approach.


Can’t we just fix the algorithms to be faster?
----------------------------------------------

Implementing sub-quadratic time algorithms for decimal int-to-str and
str-to-int is possible.  However, it's not something practical to put into a
bugfix release.  Some work is being done for better algorithms in Python 3.12.


Can’t we just fix the software calling int() and formatting long integers?
--------------------------------------------------------------------------

Sanitation and validation of untrusted input data is always a good idea.  However,
calling `int(untrusted_string_or_bytes_value)` or
`print(f{"got: {unknowingly_huge_integer}")` is very common. The amount of code
that would need to be fixed is vast and that is unlikely to happen, at least on
any reasonable time scale.
	Why is considered a security bug?
	---------------------------------

	Or, certain computing operations can take a long time, depending on the size of
	input data. Why is this specific issue considered a security bug?

	It is quite common for Python code implementing network protocols and data
	serialization to do int(untrusted_string_or_bytes_value) on input to get a
	numeric value, without having limited the input length or to do log("processing
	thing id %s", unknowingly_huge_integer) or any similar concept to convert an
	int to a string without first checking its magnitude. (http, json, xmlrpc,
	logging, loading large values into integer via linear-time conversions such as
	hexadecimal stored in yaml, or anything computing larger values based on user
	controlled inputs… which then wind up attempting to output as decimal later
	on). All of these can suffer a CPU consuming DoS in the face of untrusted data.

	The size of input data required to trigger long processing times is large but
	not larger than what could be allowed for common network software and services.
	e.g. a few megabytes of input data could cause multiple seconds of processing
	time. For an operation that would normally complete on the order of milli or
	micro seconds, that can effectively be a DoS of the service.


	Why was it required to be changed in a bugfix release?
	------------------------------------------------------

	Auditing all existing Python code for this problem, adding length guards, and
	maintaining that practice everywhere is not feasible nor is it what we deem the
	vast majority of our users want to do.

	For the vastly smaller set of users and software that require the old behavior
	of unlimited digits, it can be enabled by a global system environment variable
	(PYTHONINTMAXSTRDIGITS=0) or by calling sys.set_int_max_str_digits(0).


	Why not instead of changing int(), add limited_int()?
	-----------------------------------------------------

	As above, changing all existing code to use `limited_int()` would be a huge
	task and something we deem the vast majority of our users don't want to do.
	Also, some mitigation would needed for the int-to-str case.


	Why choose 4300 as default limit?
	---------------------------------

	It was choosen as a limit that's high enough to allow commonly used libraries
	to work correctly and low enough that even relatively weak CPUs will not be
	vulnerable to a DoS attack. It is fairly simple to increase the limit with a
	global environment setting (PYTHONINTMAXSTRDIGITS=400000), e.g. if you have
	a fast CPU.


	That limit seems too low, why not something higher?
	---------------------------------------------------

	Any limit is likely going to break some code somewhere. It expected there is a
	"long tail" distribution in effect and so a limit of 10x the current would only
	allow slightly more code to work. It is expected that a vast majority of code
	will work fine with the default limit. For the code that isn't fine, it is
	better to let the limit be disabled or set to something that's appropriate for
	that usage. In that case, the limit is unlikely to be suitable as an
	out-of-the-box default.


	Why global interpreter setting rather than something else?
	----------------------------------------------------------

	First, a global interpreter setting was the least complex to implement and
	causes the least risk when backporting to older Python versions. Second, for
	most cases, is expected that fine-grained control of the limit is not required.
	Either the default global limit is okay or a new global limit would be set. It
	is possible that new versions of Python, like 3.12 will have ways to set the
	limit at a finer level (e.g. context manager).


	Why not keyword parameter of int?
	---------------------------------

	Having a keyword that defaults to the "safe" or limited mode would be an option
	but there is no convienient keyword that could be used for the int-to-str case.
	So, the global intepreter setting is the simple approach.


	Can’t we just fix the algorithms to be faster?
	----------------------------------------------

	Implementing sub-quadratic time algorithms for decimal int-to-str and
	str-to-int is possible. However, it's not something practical to put into a
	bugfix release. Some work is being done for better algorithms in Python 3.12.


	Can’t we just fix the software calling int() and formatting long integers?
	--------------------------------------------------------------------------

	Sanitation and validation of untrusted input data is always a good idea. However,
	calling `int(untrusted_string_or_bytes_value)` or
	`print(f{"got: {unknowingly_huge_integer}")` is very common. The amount of code
	that would need to be fixed is vast and that is unlikely to happen, at least on
	any reasonable time scale.