Skip to content

Instantly share code, notes, and snippets.

@devhero
Last active May 24, 2023 18:50
Show Gist options
  • Save devhero/8ae2229d9ea1a59003ced4587c9cb236 to your computer and use it in GitHub Desktop.
Save devhero/8ae2229d9ea1a59003ced4587c9cb236 to your computer and use it in GitHub Desktop.
python download and extract remote file tar.gzip
# Instruct the interpreter to create a network request and create an object representing the request state. This can be done using the urllib module.
import urllib.request
import tarfile
thetarfile = "http://file.tar.gz"
ftpstream = urllib.request.urlopen(thetarfile)
thetarfile = tarfile.open(fileobj=ftpstream, mode="r|gz")
thetarfile.extractall()
# The ftpstream object is a file-like that represents the connection to the ftp server. Then the tarfile module can access this stream. Since we do not pass the filename, we have to specify the compression in the mode parameter.
@devhero
Copy link
Author

devhero commented Oct 19, 2022

These scripts don't work for me. Do I need to login before downloading? What doing with autorization?

Look how is easy using requests.
For lazy ones I transcribe there:

import requests
r = requests.get('http://protected_file_url', auth=('user', 'pass'))

So latest proposal could become:

import  requests 
import  tarfile

with requests.get(link , stream=True, auth=('user', 'pass')) as  rx  , tarfile.open(fileobj=rx.raw  , mode="r:gz") as  tarobj  : 
        tarobj.extractall() 

@vrahikar
Copy link

What if I have tar file already on remote server and just need to untar remotely ? I tried couple of ways but it takes more than 2 hrs to untar contents of 2GB.
e.g. snippet:

    if source_filename.endswith('tar.gz'):
        cmd = f"unpigz -dc --fast -p 16 {source_filename} | (cd {destination_path} && tar xf -)"
        print(f"CMD: {cmd}")
        print("tar file extraction started...")
        output = conn.execute_command(cmd, shell=True)
        print("tarfile extraction success")
        return True

tried attaching more cores also but no use.
here conn is remote connection handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment