Skip to content

Instantly share code, notes, and snippets.

@Kasimir123
Last active July 25, 2023 16:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e to your computer and use it in GitHub Desktop.
Save Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e to your computer and use it in GitHub Desktop.
Skeleton of the patch that is being applied to projects vulnerable to CVE-2007-4559. Based off of code from pip's unpacking.py.
import os
def is_within_directory(directory, target):
abs_directory = os.path.abspath(directory)
abs_target = os.path.abspath(target)
prefix = os.path.commonprefix([abs_directory, abs_target])
return prefix == abs_directory
def safe_extract(tar, path=".", members=None, *, numeric_owner=False):
for member in tar.getmembers():
member_path = os.path.join(path, member.name)
if not is_within_directory(path, member_path):
raise Exception("Attempted Path Traversal in Tar File")
tar.extractall(path, members, numeric_owner=numeric_owner)
@mara004
Copy link

mara004 commented Nov 15, 2022

It looks like @TrellixVulnTeam may have started an automated mass submission of this patch.
But seriously, patching thousands of projects with boilerplate code can't be called a proper solution.
What we need is a simple way to extract tars safely using the standard library, like a prevent_traversal option to shutil.unpack_archive().

@mara004
Copy link

mara004 commented Feb 26, 2023

Apart from that, I find the patch both bloated and written in a fashion that is hard to understand.
Here's an alternative suggestion, using pathlib:

(Update: Warning, this code is not correct, see the following comments)

def safe_extract(tar, dest_dir: Path, **kwargs):
    dest_dir = dest_dir.resolve()
    for member in tar.getmembers():
        # if str(dest_dir) != os.path.commonprefix( [dest_dir, (dest_dir/member.name).resolve()] ):
        # if not (dest_dir/member.name).resolve().is_relative_to(dest_dir):  # python >= 3.9
        if not str( (dest_dir/member.name).resolve() ).startswith( str(dest_dir) ):
            raise RuntimeError("Attempted path traversal in tar archive (probably malicious).")
    tar.extractall(dest_dir, **kwargs)

@jowagner
Copy link

jowagner commented Feb 27, 2023

(1) False negatives, e.g. dest_dir = Path('/dest') and member.name = Path('/dest2').
(2) Why require the user to specify the target path twice, once as path and once as dest_dir = Path(path)? You could simply change dest_dir.resolve() to Path(path).resolve(), assuming that's how one initialises Path objects (you didn't show us from where you import this type).

@mara004
Copy link

mara004 commented Feb 27, 2023

(1) False negatives, e.g. dest_dir = Path('/dest') and member.name = Path('/dest2').

Oh, I see. Thanks! (I wrongly assumed str(path) would add a trailing slash to directories.)
The is_relative_to() strategy should work correctly, though?

You could simply change dest_dir.resolve() to Path(path).resolve(), assuming that's how one initialises Path objects (you didn't show us from where you import this type).

Imported from pathlib, of course. Doing Path(dest_dir).resolve() would be an option but isn't necessary if the caller uses pathlib already.

@mara004
Copy link

mara004 commented Mar 2, 2023

@jowagner Erm, doesn't the os.path.commonprefix() strategy have exactly the same false negative, also in @Kasimir123's (and pip's) code above? Shouldn't it rather be os.path.commonpath()?

@mara004
Copy link

mara004 commented Mar 2, 2023

>>> from pathlib import Path
>>> import os.path
>>> p1 = Path("/a/b/c")
>>> p2 = Path("/a/b/c2")
>>> os.path.commonprefix([p1, p2])
'/a/b/c'
>>> os.path.commonprefix([p1, p2]) == str(p1)
True
>>> os.path.commonpath([p1, p2])
'/a/b'
>>> p2.is_relative_to(p1)
False

@mara004
Copy link

mara004 commented Jul 25, 2023

See also PEP 706 for recent progress on this.

Here's an updated safe extraction implementation I'm now using for my projects:
https://gist.github.com/mara004/6fe0ac15d0cf303bed0aea2f22d8531f


And FWIW, some threads confirming my suspicion that os.path.commonprefix() is invalid in the Trellix patch:
https://stackoverflow.com/a/43613742/15547292
http://grokbase.com/t/python/python-ideas/153nyg6qt1/os-path-commonprefix-yes-that-old-chestnut

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment