Last active
May 24, 2023 18:50
-
-
Save devhero/8ae2229d9ea1a59003ced4587c9cb236 to your computer and use it in GitHub Desktop.
python download and extract remote file tar.gzip
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Instruct the interpreter to create a network request and create an object representing the request state. This can be done using the urllib module. | |
import urllib.request | |
import tarfile | |
thetarfile = "http://file.tar.gz" | |
ftpstream = urllib.request.urlopen(thetarfile) | |
thetarfile = tarfile.open(fileobj=ftpstream, mode="r|gz") | |
thetarfile.extractall() | |
# The ftpstream object is a file-like that represents the connection to the ftp server. Then the tarfile module can access this stream. Since we do not pass the filename, we have to specify the compression in the mode parameter. |
import requests
with open(local_filename, 'wb') as f:
r = requests.get(url, stream=True)
for chunk in r.raw.stream(1024, decode_content=False):
if chunk:
f.write(chunk)
f.flush()
import requests
import tarfile
with requests.get(link , stream=True) as rx , tarfile.open(fileobj=rx.raw , mode="r:gz") as tarobj :
tarobj.extractall()
These scripts don't work for me. Do I need to login before downloading? What doing with autorization?
These scripts don't work for me. Do I need to login before downloading? What doing with autorization?
Look how is easy using requests.
For lazy ones I transcribe there:
import requests
r = requests.get('http://protected_file_url', auth=('user', 'pass'))
So latest proposal could become:
import requests
import tarfile
with requests.get(link , stream=True, auth=('user', 'pass')) as rx , tarfile.open(fileobj=rx.raw , mode="r:gz") as tarobj :
tarobj.extractall()
What if I have tar file already on remote server and just need to untar remotely ? I tried couple of ways but it takes more than 2 hrs to untar contents of 2GB.
e.g. snippet:
if source_filename.endswith('tar.gz'):
cmd = f"unpigz -dc --fast -p 16 {source_filename} | (cd {destination_path} && tar xf -)"
print(f"CMD: {cmd}")
print("tar file extraction started...")
output = conn.execute_command(cmd, shell=True)
print("tarfile extraction success")
return True
tried attaching more cores also but no use.
here conn is remote connection handle.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In case you use python's requests module: