-
-
Save Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e to your computer and use it in GitHub Desktop.
import os | |
def is_within_directory(directory, target): | |
abs_directory = os.path.abspath(directory) | |
abs_target = os.path.abspath(target) | |
prefix = os.path.commonprefix([abs_directory, abs_target]) | |
return prefix == abs_directory | |
def safe_extract(tar, path=".", members=None, *, numeric_owner=False): | |
for member in tar.getmembers(): | |
member_path = os.path.join(path, member.name) | |
if not is_within_directory(path, member_path): | |
raise Exception("Attempted Path Traversal in Tar File") | |
tar.extractall(path, members, numeric_owner=numeric_owner) |
@Kasimir123 From my testing, your proposed solution would not prevent the directory traversal if the archive contains a symlink to outside of the target directory, e.g.:
~/tmp$ mkdir test_dir
~/tmp$ cd test_dir/
~/tmp/test_dir$ ln -s /tmp foo
~/tmp/test_dir$ touch foo/bar
~/tmp/test_dir$ ll
total 8
drwxr-xr-x 2 soranzon ga002 4096 Nov 4 11:34 ./
drwxr-xr-x 5 soranzon ga002 4096 Nov 4 11:34 ../
lrwxrwxrwx 1 soranzon ga002 5 Nov 4 11:34 foo -> /tmp/
~/tmp/test_dir$ ll foo/bar
-rw-r--r-- 1 soranzon ga002 0 Nov 4 11:34 foo/bar
~/tmp/test_dir$ tar cf ../test_dir.tar foo foo/bar
The extraction of the test_dir.tar
archive is forbidden by tar:
~/tmp/test_dir$ rm foo /tmp/bar
rm: remove symbolic link 'foo'? y
rm: remove regular empty file '/tmp/bar'? y
~/tmp/test_dir$ tar xf ../test_dir.tar
tar: foo/bar: Cannot open: Not a directory
tar: Exiting with failure status due to previous errors
~/tmp/test_dir$ ll
total 8
drwxr-xr-x 2 soranzon ga002 4096 Nov 4 11:36 ./
drwxr-xr-x 5 soranzon ga002 4096 Nov 4 11:35 ../
lrwxrwxrwx 1 soranzon ga002 5 Nov 4 11:34 foo -> /tmp/
~/tmp/test_dir$ ll foo/bar
ls: cannot access 'foo/bar': No such file or directory
~/tmp/test_dir$ rm foo
rm: remove symbolic link 'foo'? y
but is not blocked by your safe_extract()
:
import tarfile
tarfile("../test_dir.tar")
safe_extract(tar)
It looks like @TrellixVulnTeam may have started an automated mass submission of this patch.
But seriously, patching thousands of projects with boilerplate code can't be called a proper solution.
What we need is a simple way to extract tars safely using the standard library, like a prevent_traversal
option to shutil.unpack_archive()
.
Apart from that, I find the patch both bloated and written in a fashion that is hard to understand.
Here's an alternative suggestion, using pathlib:
(Update: Warning, this code is not correct, see the following comments)
def safe_extract(tar, dest_dir: Path, **kwargs):
dest_dir = dest_dir.resolve()
for member in tar.getmembers():
# if str(dest_dir) != os.path.commonprefix( [dest_dir, (dest_dir/member.name).resolve()] ):
# if not (dest_dir/member.name).resolve().is_relative_to(dest_dir): # python >= 3.9
if not str( (dest_dir/member.name).resolve() ).startswith( str(dest_dir) ):
raise RuntimeError("Attempted path traversal in tar archive (probably malicious).")
tar.extractall(dest_dir, **kwargs)
(1) False negatives, e.g. dest_dir = Path('/dest')
and member.name = Path('/dest2')
.
(2) Why require the user to specify the target path twice, once as path
and once as dest_dir = Path(path)
? You could simply change dest_dir.resolve()
to Path(path).resolve()
, assuming that's how one initialises Path
objects (you didn't show us from where you import this type).
(1) False negatives, e.g. dest_dir = Path('/dest') and member.name = Path('/dest2').
Oh, I see. Thanks! (I wrongly assumed str(path)
would add a trailing slash to directories.)
The is_relative_to()
strategy should work correctly, though?
You could simply change dest_dir.resolve() to Path(path).resolve(), assuming that's how one initialises Path objects (you didn't show us from where you import this type).
Imported from pathlib, of course. Doing Path(dest_dir).resolve()
would be an option but isn't necessary if the caller uses pathlib already.
@jowagner Erm, doesn't the os.path.commonprefix()
strategy have exactly the same false negative, also in @Kasimir123's (and pip's) code above? Shouldn't it rather be os.path.commonpath()
?
>>> from pathlib import Path
>>> import os.path
>>> p1 = Path("/a/b/c")
>>> p2 = Path("/a/b/c2")
>>> os.path.commonprefix([p1, p2])
'/a/b/c'
>>> os.path.commonprefix([p1, p2]) == str(p1)
True
>>> os.path.commonpath([p1, p2])
'/a/b'
>>> p2.is_relative_to(p1)
False
See also PEP 706 for recent progress on this.
Here's an updated safe extraction implementation I'm now using for my projects:
https://gist.github.com/mara004/6fe0ac15d0cf303bed0aea2f22d8531f
And FWIW, some threads confirming my suspicion that os.path.commonprefix()
is invalid in the Trellix patch:
https://stackoverflow.com/a/43613742/15547292
http://grokbase.com/t/python/python-ideas/153nyg6qt1/os-path-commonprefix-yes-that-old-chestnut
I think there is a misunderstanding: I did not mean to imply with "in line 15 [...] path is ignored for absolute member.name" that something is broken. It's purely a suggestion to add a source code comment here.
Re performance: Yes, until the tarfile module provides an interface to hook in a
path_check()
function or implements similar checks itself you have no other choice than to scan all members in advance. I am pushing for a combination ofpath_check()
hook and safety by default in python/cpython#73974 (comment). Again, it is more an issue of documentation that this check incurs a performance penality (scanning the archive twice) and breaks extracting from tar files that cannot be read twice, e.g. reading fromsys.stdin.buffer
.Edit: clearer phrasing