Skip to content

Instantly share code, notes, and snippets.

@rdinse
Created March 13, 2018 22:55
Show Gist options
  • Star 26 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save rdinse/159f5d77f13d03e0183cb8f7154b170a to your computer and use it in GitHub Desktop.
Save rdinse/159f5d77f13d03e0183cb8f7154b170a to your computer and use it in GitHub Desktop.
Simple Google Drive backup script with automatic authentication for Google Colaboratory (Python 3)
# Simple Google Drive backup script with automatic authentication
# for Google Colaboratory (Python 3)
# Instructions:
# 1. Run this cell and authenticate via the link and text box.
# 2. Copy the JSON output below this cell into the `mycreds_file_contents`
# variable. Authentication will occur automatically from now on.
# 3. Create a new folder in Google Drive and copy the ID of this folder
# from the URL bar to the `folder_id` variable.
# 4. Specify the directory to be backed up in `dir_to_backup`.
# Caveats:
# 1. The backup/restore functions override existing files both locally and
# remotely without warning.
# 2. Empty directories and files are ignored.
# 3. Use at your own risk.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import google.colab
from oauth2client.client import GoogleCredentials
import glob, os
folder_id = 'GOOGLE_DRIVE_FOLDER_ID_HERE'
dir_to_backup = 'LOCAL_BACKUP_DIRECTORY_HERE'
mycreds_file_contents = 'PASTE_JSON_STRING_HERE'
mycreds_file = 'mycreds.json'
with open(mycreds_file, 'w') as f:
f.write(mycreds_file_contents)
def authenticate_pydrive():
gauth = GoogleAuth()
# https://stackoverflow.com/a/24542604/5096199
# Try to load saved client credentials
gauth.LoadCredentialsFile(mycreds_file)
if gauth.credentials is None:
# Authenticate if they're not there
google.colab.auth.authenticate_user()
gauth.credentials = GoogleCredentials.get_application_default()
elif gauth.access_token_expired:
# Refresh them if expired
gauth.Refresh()
else:
# Initialize the saved creds
gauth.Authorize()
# Save the current credentials to a file
gauth.SaveCredentialsFile(mycreds_file)
drive = GoogleDrive(gauth)
return drive
def backup_pydrive():
drive = authenticate_pydrive()
paths = list(glob.iglob(os.path.join(dir_to_backup, '**'), recursive=True))
print(paths)
# Delete existing files
files = drive.ListFile({'q': "'%s' in parents" % folder_id}).GetList()
for file in files:
if file['title'] in paths:
file.Delete()
for path in paths:
if os.path.isdir(path) or os.stat(path).st_size == 0:
continue
file = drive.CreateFile({'title': path, 'parents':
[{"kind": "drive#fileLink", "id": folder_id}]})
file.SetContentFile(path)
file.Upload()
print('Backed up %s' % path)
def restore_pydrive():
drive = authenticate_pydrive()
files = drive.ListFile({'q': "'%s' in parents" % folder_id}).GetList()
for file in files:
os.makedirs(os.path.dirname(file['title']), exist_ok=True)
file.GetContentFile(file['title'])
print('Restored %s' % file['title'])
authenticate_pydrive()
!cat {mycreds_file}
Copy link

ghost commented Oct 10, 2018

Hi Robin, Your technique seems to persist data in Colab Notebook - I used the following config and don't see any data in google drive folder ( id obfuscated for privacy reasons below ) , but data is persistent in Collab Notebook over Browser re-loads and GPU runtime restarts - how do we store a copy of the data in Google Drive - I originally downloaded the data from Kaggle - and used your code to attempt to sync data into GDrive Folder.

The refresh token changes everytime i run the code in collab, but dont see any link or text box, is it because i am already logged into Google and using GDRIVE in another browser session

folder_id = '************************'
dir_to_backup = '/content/data'
mycreds_file_contents = '{"_module": "oauth2client.client", "scopes": [], "token_expiry": null, "id_token": null, "user_agent": "Python client library", "access_token": null, "token_uri": "https://oauth2.googleapis.com/token", "invalid": false, "token_response": null, "client_id": "
.apps.googleusercontent.com", "token_info_uri": null, "client_secret": "####################", "revoke_uri": "https://oauth2.googleapis.com/revoke", "_class": "GoogleCredentials", "refresh_token": "$$$$$$$$$$$$$$$$$$$$$$$$$$$$$", "id_token_jwt": null}'
mycreds_file = 'mycreds.json'

Will the data be synced and deleted from Google Drive ? And only kept in Collab Folder ? Or should I begin with data in GDrive ( meaning download data into GDrive and then sync with Colab ?

Thanks for your time !

@frikazoid11
Copy link

Hello, I'm the completest noob of the whole internet (since can't find the same problems anywhere)
But the script couldn't find dir_to_backup even if it's root:
in backup_pydrive()
49
50 for path in paths:
---> 51 if os.path.isdir(path) or os.stat(path).st_size == 0:
52 continue
53 file = drive.CreateFile({'title': path, 'parents':

FileNotFoundError: [Errno 2] No such file or directory: '~/'

What am i doing wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment