Last active
September 16, 2022 19:28
-
-
Save nascheme/5c436f4a3cf6439909f303b22ffeaa58 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Why is considered a security bug? | |
--------------------------------- | |
Or, certain computing operations can take a long time, depending on the size of | |
input data. Why is this specific issue considered a security bug? | |
It is quite common for Python code implementing network protocols and data | |
serialization to do int(untrusted_string_or_bytes_value) on input to get a | |
numeric value, without having limited the input length or to do log("processing | |
thing id %s", unknowingly_huge_integer) or any similar concept to convert an | |
int to a string without first checking its magnitude. (http, json, xmlrpc, | |
logging, loading large values into integer via linear-time conversions such as | |
hexadecimal stored in yaml, or anything computing larger values based on user | |
controlled inputs… which then wind up attempting to output as decimal later | |
on). All of these can suffer a CPU consuming DoS in the face of untrusted data. | |
The size of input data required to trigger long processing times is large but | |
not larger than what could be allowed for common network software and services. | |
e.g. a few megabytes of input data could cause multiple seconds of processing | |
time. For an operation that would normally complete on the order of milli or | |
micro seconds, that can effectively be a DoS of the service. | |
Why was it required to be changed in a bugfix release? | |
------------------------------------------------------ | |
Auditing all existing Python code for this problem, adding length guards, and | |
maintaining that practice everywhere is not feasible nor is it what we deem the | |
vast majority of our users want to do. | |
For the vastly smaller set of users and software that require the old behavior | |
of unlimited digits, it can be enabled by a global system environment variable | |
(PYTHONINTMAXSTRDIGITS=0) or by calling sys.set_int_max_str_digits(0). | |
Why not instead of changing int(), add limited_int()? | |
----------------------------------------------------- | |
As above, changing all existing code to use `limited_int()` would be a huge | |
task and something we deem the vast majority of our users don't want to do. | |
Also, some mitigation would needed for the int-to-str case. | |
Why choose 4300 as default limit? | |
--------------------------------- | |
It was choosen as a limit that's high enough to allow commonly used libraries | |
to work correctly and low enough that even relatively weak CPUs will not be | |
vulnerable to a DoS attack. It is fairly simple to increase the limit with a | |
global environment setting (PYTHONINTMAXSTRDIGITS=400000), e.g. if you have | |
a fast CPU. | |
That limit seems too low, why not something higher? | |
--------------------------------------------------- | |
Any limit is likely going to break some code somewhere. It expected there is a | |
"long tail" distribution in effect and so a limit of 10x the current would only | |
allow slightly more code to work. It is expected that a vast majority of code | |
will work fine with the default limit. For the code that isn't fine, it is | |
better to let the limit be disabled or set to something that's appropriate for | |
that usage. In that case, the limit is unlikely to be suitable as an | |
out-of-the-box default. | |
Why global interpreter setting rather than something else? | |
---------------------------------------------------------- | |
First, a global interpreter setting was the least complex to implement and | |
causes the least risk when backporting to older Python versions. Second, for | |
most cases, is expected that fine-grained control of the limit is not required. | |
Either the default global limit is okay or a new global limit would be set. It | |
is possible that new versions of Python, like 3.12 will have ways to set the | |
limit at a finer level (e.g. context manager). | |
Why not keyword parameter of int? | |
--------------------------------- | |
Having a keyword that defaults to the "safe" or limited mode would be an option | |
but there is no convienient keyword that could be used for the int-to-str case. | |
So, the global intepreter setting is the simple approach. | |
Can’t we just fix the algorithms to be faster? | |
---------------------------------------------- | |
Implementing sub-quadratic time algorithms for decimal int-to-str and | |
str-to-int is possible. However, it's not something practical to put into a | |
bugfix release. Some work is being done for better algorithms in Python 3.12. | |
Can’t we just fix the software calling int() and formatting long integers? | |
-------------------------------------------------------------------------- | |
Sanitation and validation of untrusted input data is always a good idea. However, | |
calling `int(untrusted_string_or_bytes_value)` or | |
`print(f{"got: {unknowingly_huge_integer}")` is very common. The amount of code | |
that would need to be fixed is vast and that is unlikely to happen, at least on | |
any reasonable time scale. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Might split out the DoS term into its own idea, given the exchange between Tim and David in the thread.
Maybe also add a section on the process:
Other than that, 👍