Skip to content

Instantly share code, notes, and snippets.

@charlesreid1
Last active May 26, 2022 08:35
  • Star 30 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save charlesreid1/4f3d676b33b95fce83af08e4ec261822 to your computer and use it in GitHub Desktop.
Download the Large-scale CelebFaces Attributes (CelebA) Dataset from their Google Drive link
#!/bin/bash
#
# Download the Large-scale CelebFaces Attributes (CelebA) Dataset
# from their Google Drive link.
#
# CelebA: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
#
# Google Drive: https://drive.google.com/drive/folders/0B7EVK8r0v71pWEZsZE9oNnFzTm8
python3 get_drive_file.py 0B7EVK8r0v71pZjFTYXZWM3FlRnM celebA.zip
import requests
def download_file_from_google_drive(id, destination):
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
if __name__ == "__main__":
import sys
if len(sys.argv) is not 3:
print("Usage: python google_drive.py drive_file_id destination_file_path")
else:
# TAKE ID FROM SHAREABLE LINK
file_id = sys.argv[1]
# DESTINATION FILE ON YOUR DISK
destination = sys.argv[2]
download_file_from_google_drive(file_id, destination)
@flxai
Copy link

flxai commented Dec 2, 2019

Unfortunately this is broken.

@charlesreid1
Copy link
Author

I'll look into it!

@charlesreid1
Copy link
Author

The command

$ python3 get_drive_file.py 0B7EVK8r0v71pZjFTYXZWM3FlRnM celebA.zip

is working for me. It may have been a temporary issue, please follow up if you experience the error again!

@flxai
Copy link

flxai commented Dec 11, 2019

Thanks for checking. It works now. Sorry for not being specific with the description last time. Seems to me like it was a networking error.

@VMBoehm
Copy link

VMBoehm commented Dec 19, 2019

Thanks for providing this! The code runs, but suspiciously fast and under unzipping the zip file, I get a
End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.

@henrikmarklund
Copy link

Thanks for providing this! The code runs, but suspiciously fast and under unzipping the zip file, I get a
End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.

I am having the same issue. Did you find a solution?

Thanks.

@VMBoehm
Copy link

VMBoehm commented Jan 23, 2020

The problem seems to be the size of the files and the fact that it's hosted on google drive, which is not really meant to be used for sharing such big datasets. What happens is that the download fails, or it only downloads partly. I ended up getting the data set from a different source. Downloading it with tensorflow datasets worked for me at some point, maybe give this a try?

@theRealSuperMario
Copy link

same problem here.

@sickmz
Copy link

sickmz commented Jun 12, 2020

same problem here.

The problem seems to be the size of the files and the fact that it's hosted on google drive, which is not really meant to be used for sharing such big datasets. What happens is that the download fails, or it only downloads partly. I ended up getting the data set from a different source. Downloading it with tensorflow datasets worked for me at some point, maybe give this a try?

I am having the same issue. Did you find a solution?

Thanks.

https://github.com/matteodalessio/download_google_drive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment