-
-
Save Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e to your computer and use it in GitHub Desktop.
import os | |
def is_within_directory(directory, target): | |
abs_directory = os.path.abspath(directory) | |
abs_target = os.path.abspath(target) | |
prefix = os.path.commonprefix([abs_directory, abs_target]) | |
return prefix == abs_directory | |
def safe_extract(tar, path=".", members=None, *, numeric_owner=False): | |
for member in tar.getmembers(): | |
member_path = os.path.join(path, member.name) | |
if not is_within_directory(path, member_path): | |
raise Exception("Attempted Path Traversal in Tar File") | |
tar.extractall(path, members, numeric_owner=numeric_owner) |
(1) False negatives, e.g. dest_dir = Path('/dest') and member.name = Path('/dest2').
Oh, I see. Thanks! (I wrongly assumed str(path)
would add a trailing slash to directories.)
The is_relative_to()
strategy should work correctly, though?
You could simply change dest_dir.resolve() to Path(path).resolve(), assuming that's how one initialises Path objects (you didn't show us from where you import this type).
Imported from pathlib, of course. Doing Path(dest_dir).resolve()
would be an option but isn't necessary if the caller uses pathlib already.
@jowagner Erm, doesn't the os.path.commonprefix()
strategy have exactly the same false negative, also in @Kasimir123's (and pip's) code above? Shouldn't it rather be os.path.commonpath()
?
>>> from pathlib import Path
>>> import os.path
>>> p1 = Path("/a/b/c")
>>> p2 = Path("/a/b/c2")
>>> os.path.commonprefix([p1, p2])
'/a/b/c'
>>> os.path.commonprefix([p1, p2]) == str(p1)
True
>>> os.path.commonpath([p1, p2])
'/a/b'
>>> p2.is_relative_to(p1)
False
See also PEP 706 for recent progress on this.
Here's an updated safe extraction implementation I'm now using for my projects:
https://gist.github.com/mara004/6fe0ac15d0cf303bed0aea2f22d8531f
And FWIW, some threads confirming my suspicion that os.path.commonprefix()
is invalid in the Trellix patch:
https://stackoverflow.com/a/43613742/15547292
http://grokbase.com/t/python/python-ideas/153nyg6qt1/os-path-commonprefix-yes-that-old-chestnut
(1) False negatives, e.g.
dest_dir = Path('/dest')
andmember.name = Path('/dest2')
.(2) Why require the user to specify the target path twice, once as
path
and once asdest_dir = Path(path)
? You could simply changedest_dir.resolve()
toPath(path).resolve()
, assuming that's how one initialisesPath
objects (you didn't show us from where you import this type).