Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Download the Large-scale CelebFaces Attributes (CelebA) Dataset from their Google Drive link
#!/bin/bash
#
# Download the Large-scale CelebFaces Attributes (CelebA) Dataset
# from their Google Drive link.
#
# CelebA: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
#
# Google Drive: https://drive.google.com/drive/folders/0B7EVK8r0v71pWEZsZE9oNnFzTm8
python3 get_drive_file.py 0B7EVK8r0v71pZjFTYXZWM3FlRnM celebA.zip
import requests
def download_file_from_google_drive(id, destination):
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
if __name__ == "__main__":
import sys
if len(sys.argv) is not 3:
print("Usage: python google_drive.py drive_file_id destination_file_path")
else:
# TAKE ID FROM SHAREABLE LINK
file_id = sys.argv[1]
# DESTINATION FILE ON YOUR DISK
destination = sys.argv[2]
download_file_from_google_drive(file_id, destination)
@flxai

This comment has been minimized.

Copy link

@flxai flxai commented Dec 2, 2019

Unfortunately this is broken.

@charlesreid1

This comment has been minimized.

Copy link
Owner Author

@charlesreid1 charlesreid1 commented Dec 11, 2019

I'll look into it!

@charlesreid1

This comment has been minimized.

Copy link
Owner Author

@charlesreid1 charlesreid1 commented Dec 11, 2019

The command

$ python3 get_drive_file.py 0B7EVK8r0v71pZjFTYXZWM3FlRnM celebA.zip

is working for me. It may have been a temporary issue, please follow up if you experience the error again!

@flxai

This comment has been minimized.

Copy link

@flxai flxai commented Dec 11, 2019

Thanks for checking. It works now. Sorry for not being specific with the description last time. Seems to me like it was a networking error.

@VMBoehm

This comment has been minimized.

Copy link

@VMBoehm VMBoehm commented Dec 19, 2019

Thanks for providing this! The code runs, but suspiciously fast and under unzipping the zip file, I get a
End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.

@henrikmarklund

This comment has been minimized.

Copy link

@henrikmarklund henrikmarklund commented Jan 22, 2020

Thanks for providing this! The code runs, but suspiciously fast and under unzipping the zip file, I get a
End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.

I am having the same issue. Did you find a solution?

Thanks.

@VMBoehm

This comment has been minimized.

Copy link

@VMBoehm VMBoehm commented Jan 23, 2020

The problem seems to be the size of the files and the fact that it's hosted on google drive, which is not really meant to be used for sharing such big datasets. What happens is that the download fails, or it only downloads partly. I ended up getting the data set from a different source. Downloading it with tensorflow datasets worked for me at some point, maybe give this a try?

@theRealSuperMario

This comment has been minimized.

Copy link

@theRealSuperMario theRealSuperMario commented Feb 19, 2020

same problem here.

@sickmz

This comment has been minimized.

Copy link

@sickmz sickmz commented Jun 12, 2020

same problem here.

The problem seems to be the size of the files and the fact that it's hosted on google drive, which is not really meant to be used for sharing such big datasets. What happens is that the download fails, or it only downloads partly. I ended up getting the data set from a different source. Downloading it with tensorflow datasets worked for me at some point, maybe give this a try?

I am having the same issue. Did you find a solution?

Thanks.

https://github.com/matteodalessio/download_google_drive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.