Skip to content

Instantly share code, notes, and snippets.

@nascheme
Last active September 16, 2022 19:28
Show Gist options
  • Save nascheme/5c436f4a3cf6439909f303b22ffeaa58 to your computer and use it in GitHub Desktop.
Save nascheme/5c436f4a3cf6439909f303b22ffeaa58 to your computer and use it in GitHub Desktop.
Why is considered a security bug?
---------------------------------
Or, certain computing operations can take a long time, depending on the size of
input data. Why is this specific issue considered a security bug?
It is quite common for Python code implementing network protocols and data
serialization to do int(untrusted_string_or_bytes_value) on input to get a
numeric value, without having limited the input length or to do log("processing
thing id %s", unknowingly_huge_integer) or any similar concept to convert an
int to a string without first checking its magnitude. (http, json, xmlrpc,
logging, loading large values into integer via linear-time conversions such as
hexadecimal stored in yaml, or anything computing larger values based on user
controlled inputs… which then wind up attempting to output as decimal later
on). All of these can suffer a CPU consuming DoS in the face of untrusted data.
The size of input data required to trigger long processing times is large but
not larger than what could be allowed for common network software and services.
e.g. a few megabytes of input data could cause multiple seconds of processing
time. For an operation that would normally complete on the order of milli or
micro seconds, that can effectively be a DoS of the service.
Why was it required to be changed in a bugfix release?
------------------------------------------------------
Auditing all existing Python code for this problem, adding length guards, and
maintaining that practice everywhere is not feasible nor is it what we deem the
vast majority of our users want to do.
For the vastly smaller set of users and software that require the old behavior
of unlimited digits, it can be enabled by a global system environment variable
(PYTHONINTMAXSTRDIGITS=0) or by calling sys.set_int_max_str_digits(0).
Why not instead of changing int(), add limited_int()?
-----------------------------------------------------
As above, changing all existing code to use `limited_int()` would be a huge
task and something we deem the vast majority of our users don't want to do.
Also, some mitigation would needed for the int-to-str case.
Why choose 4300 as default limit?
---------------------------------
It was choosen as a limit that's high enough to allow commonly used libraries
to work correctly and low enough that even relatively weak CPUs will not be
vulnerable to a DoS attack. It is fairly simple to increase the limit with a
global environment setting (PYTHONINTMAXSTRDIGITS=400000), e.g. if you have
a fast CPU.
That limit seems too low, why not something higher?
---------------------------------------------------
Any limit is likely going to break some code somewhere. It expected there is a
"long tail" distribution in effect and so a limit of 10x the current would only
allow slightly more code to work. It is expected that a vast majority of code
will work fine with the default limit. For the code that isn't fine, it is
better to let the limit be disabled or set to something that's appropriate for
that usage. In that case, the limit is unlikely to be suitable as an
out-of-the-box default.
Why global interpreter setting rather than something else?
----------------------------------------------------------
First, a global interpreter setting was the least complex to implement and
causes the least risk when backporting to older Python versions. Second, for
most cases, is expected that fine-grained control of the limit is not required.
Either the default global limit is okay or a new global limit would be set. It
is possible that new versions of Python, like 3.12 will have ways to set the
limit at a finer level (e.g. context manager).
Why not keyword parameter of int?
---------------------------------
Having a keyword that defaults to the "safe" or limited mode would be an option
but there is no convienient keyword that could be used for the int-to-str case.
So, the global intepreter setting is the simple approach.
Can’t we just fix the algorithms to be faster?
----------------------------------------------
Implementing sub-quadratic time algorithms for decimal int-to-str and
str-to-int is possible. However, it's not something practical to put into a
bugfix release. Some work is being done for better algorithms in Python 3.12.
Can’t we just fix the software calling int() and formatting long integers?
--------------------------------------------------------------------------
Sanitation and validation of untrusted input data is always a good idea. However,
calling `int(untrusted_string_or_bytes_value)` or
`print(f{"got: {unknowingly_huge_integer}")` is very common. The amount of code
that would need to be fixed is vast and that is unlikely to happen, at least on
any reasonable time scale.
@zooba
Copy link

zooba commented Sep 15, 2022

All of these can suffer a CPU consuming DoS in the face of untrusted data.

Might split out the DoS term into its own idea, given the exchange between Tim and David in the thread.

Maybe also add a section on the process:

Why wasn't it discussed publicly before making releases?

The PSRT discussed the issue internally, including with the steering council, and determined that the risk of exploitation would increase if they were to disclose it without a fix, and that there was no reasonable mitigation available without a patch. As the patch was going to be fairly invasive (multiple files and user-visible APIs), it could not have been easily applied by builders, and so security releases were scheduled for a few days after disclosure.
The initial change leaves the user in full control of the conversion limit, or else massive pressure would have been applied to every individual library developer to patch their own library for security. Future development can properly enable libraries to manage their own exceptions to the user preference, though we hope that libraries will largely respect their users' wishes.

Other than that, 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment