Skip to content

Instantly share code, notes, and snippets.

@devhero
Last active May 24, 2023 18:50
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save devhero/8ae2229d9ea1a59003ced4587c9cb236 to your computer and use it in GitHub Desktop.
Save devhero/8ae2229d9ea1a59003ced4587c9cb236 to your computer and use it in GitHub Desktop.
python download and extract remote file tar.gzip
# Instruct the interpreter to create a network request and create an object representing the request state. This can be done using the urllib module.
import urllib.request
import tarfile
thetarfile = "http://file.tar.gz"
ftpstream = urllib.request.urlopen(thetarfile)
thetarfile = tarfile.open(fileobj=ftpstream, mode="r|gz")
thetarfile.extractall()
# The ftpstream object is a file-like that represents the connection to the ftp server. Then the tarfile module can access this stream. Since we do not pass the filename, we have to specify the compression in the mode parameter.
@ozcanyarimdunya
Copy link

ozcanyarimdunya commented Jun 10, 2021

In case you use python's requests module:

import requests
import tarfile

url = ".tar.gz url here"
response = requests.get(url, stream=True)
file = tarfile.open(fileobj=response.raw, mode="r|gz")
file.extractall(path=".")

@bhuiyanmobasshir94
Copy link

bhuiyanmobasshir94 commented Nov 19, 2021

import requests

with open(local_filename, 'wb') as f:
    r = requests.get(url, stream=True)
    for chunk in r.raw.stream(1024, decode_content=False):
        if chunk:
            f.write(chunk)
            f.flush()

@Jukoo
Copy link

Jukoo commented Feb 8, 2022

import  requests 
import  tarfile

with requests.get(link , stream=True) as  rx  , tarfile.open(fileobj=rx.raw  , mode="r:gz") as  tarobj  : 
        tarobj.extractall() 

@vsobolev
Copy link

These scripts don't work for me. Do I need to login before downloading? What doing with autorization?

@devhero
Copy link
Author

devhero commented Oct 19, 2022

These scripts don't work for me. Do I need to login before downloading? What doing with autorization?

Look how is easy using requests.
For lazy ones I transcribe there:

import requests
r = requests.get('http://protected_file_url', auth=('user', 'pass'))

So latest proposal could become:

import  requests 
import  tarfile

with requests.get(link , stream=True, auth=('user', 'pass')) as  rx  , tarfile.open(fileobj=rx.raw  , mode="r:gz") as  tarobj  : 
        tarobj.extractall() 

@vrahikax
Copy link

What if I have tar file already on remote server and just need to untar remotely ? I tried couple of ways but it takes more than 2 hrs to untar contents of 2GB.
e.g. snippet:

    if source_filename.endswith('tar.gz'):
        cmd = f"unpigz -dc --fast -p 16 {source_filename} | (cd {destination_path} && tar xf -)"
        print(f"CMD: {cmd}")
        print("tar file extraction started...")
        output = conn.execute_command(cmd, shell=True)
        print("tarfile extraction success")
        return True

tried attaching more cores also but no use.
here conn is remote connection handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment