Skip to content

Instantly share code, notes, and snippets.

@brokensandals
Last active November 6, 2022 09:33
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save brokensandals/6b77f73666323d6e4b94ff1df12a532a to your computer and use it in GitHub Desktop.
Save brokensandals/6b77f73666323d6e4b94ff1df12a532a to your computer and use it in GitHub Desktop.
Google Docs/Sheets/Slides automated backups script

Update: I now use a slightly different version of this script, which creates a single zip file instead of one per document, and puts a timestamp in the filename rather than overwriting the previous backup file. That version can be found at https://github.com/brokensandals/export-google-docs.

Google Apps Script that exports all your Google Docs/Sheets/Slides into docx/xlsx/pptx files and PDFs into a folder in your Google Drive. For more info and step-by-step setup instructions, see here: http://brokensandals.net/google-docs-backup

Replace INSERT_FOLDER_ID_HERE with the ID of the folder you want backups to be placed in.

Create a trigger to run the backupAll function if you want to do this on a schedule (e.g. nightly).

Notes:

  • By default, only files that you own (as opposed to files others have shared with you) will be backed up. Remove the file.getOwner() check from the backupAll method if you want to change that.
  • For each file, both an Office file (docx/xlsx/pptx) and a PDF are generated, and combined into a zip file that's placed in the backup folder. Zipping the backup files ensures that they don't clutter up the recent activity list for Docs/Sheets/Slides.
  • The script depends on the lastUpdated dates being correct on both the input files and the files in the backup directory. If that seems problematic, you could change the createOrUpdateFileForBlob method to delete existing backup files rather than updating them.

As always, this code may have defects that prevent it from working properly. Use at your own risk and remember to periodically verify that your backups are actually working as expected.

var BACKUP_FOLDER_ID = 'INSERT_FOLDER_ID_HERE';
var NATIVE_MIME_TYPES = {};
NATIVE_MIME_TYPES[MimeType.GOOGLE_DOCS] = MimeType.MICROSOFT_WORD;
NATIVE_MIME_TYPES[MimeType.GOOGLE_SHEETS] = MimeType.MICROSOFT_EXCEL;
NATIVE_MIME_TYPES[MimeType.GOOGLE_SLIDES] = MimeType.MICROSOFT_POWERPOINT;
var NATIVE_EXTENSIONS = {};
NATIVE_EXTENSIONS[MimeType.GOOGLE_DOCS] = '.docx';
NATIVE_EXTENSIONS[MimeType.GOOGLE_SHEETS] = '.xlsx';
NATIVE_EXTENSIONS[MimeType.GOOGLE_SLIDES] = '.pptx';
var BACKUP_MIME_TYPES = Object.keys(NATIVE_MIME_TYPES);
function backupAll() {
const backupFolder = DriveApp.getFolderById(BACKUP_FOLDER_ID);
BACKUP_MIME_TYPES.forEach(function(mimeType) {
var files = DriveApp.getFilesByType(mimeType);
while (files.hasNext()) {
var file = files.next();
if (file.getOwner() && file.getOwner().getEmail() == Session.getActiveUser().getEmail()) {
backup(file, backupFolder);
}
}
});
}
function backup(file, folder) {
var targetName = file.getName() + ' ' + file.getId();
var lastUpdated = file.getLastUpdated();
var pdf = getPdfBlob(file);
var native = getNativeBlob(file);
var zip = Utilities.zip([pdf, native], targetName + '.zip');
createOrUpdateFileForBlob(zip, folder, lastUpdated);
}
function createOrUpdateFileForBlob(blob, folder, ifOlderThan) {
var existingFiles = folder.getFilesByName(blob.getName());
if (existingFiles.hasNext()) {
var file = existingFiles.next();
if (file.getLastUpdated() < ifOlderThan) {
updateFile(file, blob);
}
} else {
folder.createFile(blob);
}
}
function updateFile(file, blob) {
const url = 'https://www.googleapis.com/upload/drive/v2/files/' + file.getId() + '?uploadType=media';
const params = {
method: 'put',
headers: { Authorization: 'Bearer ' + ScriptApp.getOAuthToken() },
payload: blob
};
var response = UrlFetchApp.fetch(url, params);
if (response.getResponseCode() < 200 || response.getResponseCode() > 299) {
throw 'Failed to update file named ' + file.getName();
}
}
function getPdfBlob(file) {
var blob = file.getAs('application/pdf');
return blob;
}
function getNativeBlob(file) {
const nativeMimeType = NATIVE_MIME_TYPES[file.getMimeType()];
const extension = NATIVE_EXTENSIONS[file.getMimeType()];
const url = 'https://www.googleapis.com/drive/v2/files/' + file.getId() + '/export?mimeType=' + nativeMimeType;
const params = {
method: 'get',
headers: { Authorization: 'Bearer ' + ScriptApp.getOAuthToken() }
};
const blob = UrlFetchApp.fetch(url, params).getBlob();
blob.setName(file.getName() + extension);
return blob;
}
Copy link

ghost commented Mar 20, 2020

This is amazing. Is there a way to automatically name the exported zips as "file-YYYY-MM-DD"?

@brokensandals
Copy link
Author

@cnlbrnsllvn Thanks for comment! Try changing the first two lines of the backup function to this:

var lastUpdated = file.getLastUpdated();
var targetName = file.getName() + ' ' + file.getId() + '-' + lastUpdated.toISOString().slice(0, 10);

I haven't tested it, but I think that should work. The date in each zip file name will be the date that the original file had last been modified, at the time the backup was made.

Copy link

ghost commented Mar 20, 2020

@cnlbrnsllvn Thanks for comment! Try changing the first two lines of the backup function to this:

var lastUpdated = file.getLastUpdated();
var targetName = file.getName() + ' ' + file.getId() + '-' + lastUpdated.toISOString().slice(0, 10);

I haven't tested it, but I think that should work. The date in each zip file name will be the date that the original file had last been modified, at the time the backup was made.

Thanks!

@eCigDistributors
Copy link

I want to remove formulas and only copy values, I also want to keep formatting and colors though.

Will you please help me with that?

Thank you for sharing this

@brokensandals
Copy link
Author

@eCigDistributors try replacing the backup function with the following:

function backup(file, folder) {
  var targetName = file.getName() + ' ' + file.getId();
  var lastUpdated = file.getLastUpdated();
  
  var pdf = getPdfBlob(file);
  
  var native;
  if (file.getMimeType() === MimeType.GOOGLE_SHEETS) {
    // Create a copy of the file with all formulas replaced with their literal values
    // This code was adapted from https://support.google.com/docs/thread/17355100?hl=en&msgid=17382472
    var copy = file.makeCopy('VALUES ' + file.getName());
    var spreadsheet = SpreadsheetApp.open(copy);
    var sheets = spreadsheet.getSheets();
    for (var si = 0; si < sheets.length; si++) {
      var sheet = sheets[si];
      if (sheet.getLastRow() > 0 && sheet.getLastColumn() > 0) {
        var range = sheet.getRange(1, 1, sheet.getLastRow(), sheet.getLastColumn());
        range.setValues(range.getValues());
      }
    }
    native = getNativeBlob(copy);
    copy.setTrashed(true);
  } else {
    native = getNativeBlob(file);
  }
  
  var zip = Utilities.zip([pdf, native], targetName + '.zip');
  
  createOrUpdateFileForBlob(zip, folder, lastUpdated);
}

This doesn't work perfectly. In some of my spreadsheets, it does replace most or all formulas with values in the backups; but in some spreadsheets, I still see some formulas. So there is probably a bug in one of the three lines beginning at "if (sheet.getLastRow()...". I don't think I'll have time to investigate further, though.

@K-Langoso
Copy link

K-Langoso commented Apr 9, 2020

Hi!

This script is just what I needed. I had a little trouble running it but I think it is just when run manually. Posting here if anyone runs into the same thing.

When manually run, it gives an error at row 30, can't get property getName on “undefined”. However, when run automatically by the scheduler, it works fine. (Never mind, I was running a sub-function. 🤦‍♂️ See below.)

So, success, I guess! 😁

Thanks!

@brokensandals
Copy link
Author

@K-Langoso thanks for the comment! Make sure you run the backupAll function, not the backup function. Trying to run the backup function could give the error you described (because it depends on parameters that have to be supplied by the backupAll function).

Screen Shot 2020-04-09 at 12 15 47 PM

@K-Langoso
Copy link

Yep. Looks like it. 🙈

I'll update my comment accordingly, thanks!

@jmcastellano
Copy link

Hi,
I also need to make a copy of all the .pdf files.
I have tried adding it as NATIVE_MIME_TYPES [MimeType.pdf] but it doesn't work,
could you help me?

@brokensandals
Copy link
Author

@jmcastellano The script should already include PDF copies of your docs/sheets/slides in the .zip files it produces. If you're wanting to make backups of other PDF files you have stored in Drive, do this:

  1. Change the line var BACKUP_MIME_TYPES... to var BACKUP_MIME_TYPES = Object.keys(NATIVE_MIME_TYPES).concat([MimeType.PDF]);
  2. Rewrite the backup function as follows:
function backup(file, folder) {
  var targetName = file.getName() + ' ' + file.getId();
  var lastUpdated = file.getLastUpdated();
  
  var toZip;
  if (NATIVE_MIME_TYPES[file.getMimeType()]) {
    var pdf = getPdfBlob(file);
    var native = getNativeBlob(file);
    toZip = [pdf, native];
  } else {
    toZip = [file];
  }
  
  var zip = Utilities.zip(toZip, targetName + '.zip');
  
  createOrUpdateFileForBlob(zip, folder, lastUpdated);
}

@jmcastellano
Copy link

Thanks!! it works perfectly!!

@freetimecoder
Copy link

Hello!

This script is amazing, thank you very much for making it!

I have a problem exporting files bigger than 10mb though, I searched and it looks like the answer is to use a method called exportLinks. However I do not understand enough of the api and script to implement it. Any pointers would be appreciated.

Also is it possible to retain the folder structure in the backup? At the moment the script exports all documents into the same folder.

Kind Regards!

@brokensandals
Copy link
Author

Hi @freetimecoder! Good find regarding exportLinks, I didn't realize that existed. Using that, I was able to get the script to download exports of files larger than 10MB, but it fails when it goes to upload them. I think this is just a limitation of UrlFetchApp, or maybe just the particular way I'm using UrlFetchApp, not the Google Drive API itself. I can't devote any more time to it right now, but here's the code in case you want to play with it: https://github.com/brokensandals/export-google-docs/tree/zip-per-document-large-files (the updateFile method is what would have to be fixed).

(If you get that working, though, you may still run into another issue if you have a lot of large files: there's a limit to how long a script can run, and streaming a bunch of large files to/from the drive API might take too long.)

Regarding retaining the folder structure, that would take some work. One option would be to use the https://developers.google.com/apps-script/reference/drive/file#getParents() method on the file objects to find the parent folder, and walk that recursively until you get to the root, building a list of all the folders along the way. (This is complicated by the fact that a file can have multiple parents, but if you never use that feature you could just use the first parent. I think Google is removing the multiple-parents feature later this year anyway.) Then you could go back through the list and use https://developers.google.com/apps-script/reference/drive/folder#createFolder(String) to create corresponding folders inside the backup directory.

Alternatively, you could change the backupAll method so that instead of just requesting all files for a given mime type, it starts at https://developers.google.com/apps-script/reference/drive/drive-app#getRootFolder() and walks the directory tree recursively, so that you'd always know where you are in the tree. Then you can create corresponding folders in the backup directory as you go along.

@freetimecoder
Copy link

freetimecoder commented May 22, 2020

Thanks for your answer @brokensandals!

I've been reading into the whole thing today and made some adjustments, especially for speed as the script terminates after a few minutes. Since I don't want to zip or create pdfs I removed them from my version. A probably volatile solution to get more speed is to hard code the export links, skipping one query (I also removed the . from the extension list):

var NATIVE_EXPORT_URLS = {};
NATIVE_EXPORT_URLS[MimeType.GOOGLE_DOCS] = "https://docs.google.com/feeds/download/documents/export/Export?id=";
NATIVE_EXPORT_URLS[MimeType.GOOGLE_SHEETS] = "https://docs.google.com/spreadsheets/export?id=";
NATIVE_EXPORT_URLS[MimeType.GOOGLE_SLIDES] = "https://docs.google.com/feeds/download/presentations/Export?id=";

[...]

function getNativeBlob(file) {
  const nativeMimeType = NATIVE_MIME_TYPES[file.getMimeType()];
  const extension = NATIVE_EXTENSIONS[file.getMimeType()];
  const exportURL = NATIVE_EXPORT_URLS[file.getMimeType()];
  
  //Construct query url
  const url = exportURL + file.getId() + '&exportFormat='+extension;

  const params = {
    method: 'get',
    headers: { Authorization: 'Bearer ' + ScriptApp.getOAuthToken() }
  };
  
  //Download converted file
  const blob = UrlFetchApp.fetch(url, params).getBlob();
  
  var fileName = file.getName() +"."+ extension;
  
  //Set filename
  blob.setName(fileName);
  return blob;
}

I got another big speed gain from checking if the existing backup file is newer before downloading the blobs and removed the check in the update function:

BACKUP_MIME_TYPES.forEach(function(mimeType) {
	//Iterate through all files on the drive
	var files = DriveApp.getFilesByType(mimeType);
	while (files.hasNext()) {
	  var file = files.next();
			   
	  //Check if older backups exist of this file
	  var existingFiles = backupFolder.getFilesByName(file.getName()+"."+NATIVE_EXTENSIONS[mimeType])
	  if(existingFiles.hasNext()) {
		var existingFile = existingFiles.next();

		//Don't backup if existing backup is up to date
		if(file.getLastUpdated() < existingFile.getLastUpdated())
		{
		  continue;
		}
	  }
	  
	  //Backup file 
	  backup(file, backupFolder);
	}
});

This way the existing files are basically skipped over fast, which means the script can run multiple times and has a better chance to reach the end if the files at the front are done by a previous iteration. It's not ideal as it still suffers from the same problem as before if there are too many files. I read about being able to cache values between script runs so I'll look into that next. (Or not, depending on time I can spend on it)

Thanks for the links on folder structure, the multi parents threw me off, I thought it was supposed to list every parent kind of like this [Root, Subfolder, SubSubfolder], which it doesn't. I think walking the tree is probably faster than reconstructing the hierarchy from each file.

Thanks again, the script was a fantastic starting point to reverse engineer! I will report back if I get the folders to work.

Kind Regards

@pantha007
Copy link

@freetimecoder, can you please help me I am trying to implement your changes but can not seem to get it to run properly, is there any chance I can see you full code ??? ( without the drive link of course :-) )

@freetimecoder
Copy link

@pantha007 Sure, I put my forked changes up here:
https://gist.github.com/freetimecoder/dc916ace62b0e67f89ed80e617b6f630
Beware though, it's probably not as stable, because I optimized for speed and skipped some checks along the way.

@pantha007
Copy link

@freetimecoder, thanks so much I will be keeping an eye on that :-) :-)

@tis-andias
Copy link

Thanks so much @brokensandals!

I'm looking for a way to back-up selected google docs only. Hoping you can help or point me in the right direction?

@brokensandals
Copy link
Author

@tis-andias You'll want to change the backupAll function so that it retrieves a File object for each file you're interested in, and then calls the backup function with it. (You can get File objects using the DriveApp API which is documented at https://developers.google.com/apps-script/reference/drive/drive-app . Below I'll give a couple examples of using it.)

If you just want to back up a handful of specific docs, you can find the ID of each one by opening it in Google Docs (or sheets or whatever) and looking at your web browser's address bar; the ID is the long series of letters and numbers between /d/ and /edit. Then you can change the backupAll function to look like this:

function backupAll() {
  const backupFolder = DriveApp.getFolderById(BACKUP_FOLDER_ID);
  // Add as many files as you want on the following line.
  const ids = ['INSERT_FIRST_ID_HERE', 'INSERT_SECOND_ID_HERE', 'INSERT_THIRD_ID_HERE'];
  for (var index = 0; index < ids.length; index++) {
    backup(DriveApp.getFileById(ids[index]), backupFolder);
  }
}

Alternatively, if you wanted to back up all the files in a specific folder, you could look up its ID in the address bar on Google Drive and do something like:

function backupAll() {
  const backupFolder = DriveApp.getFolderById(BACKUP_FOLDER_ID);
  const folderToBeBackedUp = DriveApp.getFolderById('PASTE_FOLDER_ID_HERE');
  var files = folderToBeBackedUp.getFiles();
  while (files.hasNext()) {
    backup(files.next(), backupFolder);
  }
}

I haven't tested this code so it might not be quite right, but should be close.

@jake-babcock
Copy link

Now is there a way to instead of backing it up to my google drive to have it download the files locally to an external hdd?

@brokensandals
Copy link
Author

Now is there a way to instead of backing it up to my google drive to have it download the files locally to an external hdd?

You can install Google Backup & Sync so that the backup file on google drive gets automatically downloaded to your hard drive. (You could either configure Backup & Sync to use a folder on your external hard drive, or use a different backup program to do backups from your main hard drive to your external hard drive.)

Or you could write a script to download just the backup file from google drive using the google drive API. Or write a different backup script similar to this one, but running on your computer and using the API to export each file and write the backup directly to your external hard drive. I don’t really use google drive any more so I can’t help with any of that.

@jake-babcock
Copy link

Yeah I've got a python script thatll download all the zips. Now im trying to automate this as much as possible, so is there a way within this Apps Script to create that backup folder at the beginning of the script, and then pull that File ID for the BACKUP_FOLDER_ID? I know there would have to be some checks in place to keep it from breaking should the folder already exist, but Im running into an issue of not being able to pull the file id properly.

@brokensandals
Copy link
Author

is there a way within this Apps Script to create that backup folder at the beginning of the script, and then pull that File ID for the BACKUP_FOLDER_ID?

One approach could be:

  • instead of putting the zips directly inside the backup folder, have the Apps Script create a new subfolder inside that folder each time it runs, and put the current date in the folder name
  • have the python script list all the subfolders in the backup folder and download from the newest one

See the createFolder method in the Folder documentation.

You could also look at my newer version of the script which puts the entire backup into a single zip file with a date in the name - then your python script would just need to download the newest file.

@dmorgans0803
Copy link

I'm a complete newbie and am struggling to implement this code.
I've installed it in Google Drive but not sure it's in the correct spot.
I get an error message when I try to run it ...
7:53:26 PM Notice Execution started
7:53:26 PM Error
Exception: No item with the given ID could be found. Possibly because you have not edited this item or you do not have permission to access it.
backupAll @ Code.gs:16

Tried researching the error message but only went further down the rabbit hole.
Any easy fixes?
Thanks
Douglas

@brokensandals
Copy link
Author

@dmorgans0803 Based on the error, I'm guessing maybe you didn't replace the text INSERT_FOLDER_ID_HERE in the script. You need to replace it with the ID of some folder that you've created (this is where the backup files will go), as mentioned in the "Instructions" section here: https://brokensandals.net/technical/backup-tooling/google-docs/

To get the ID, you open the folder in Google Drive in your web browser, then look at the address bar - you should see something similar to "https://drive.google.com/drive/folders/asdflkjjlk23409cdsklmklmnnl290". Notice the long sequence of random-looking letters and numbers after "folders/" - that's what you need to copy. Don't include the slash or anything before it; and if you see a question mark, don't include the question mark or anything after it.

Hope this helps - if the issue is something else, I'm afraid I don't have any immediate ideas.

@dmorgans0803
Copy link

dmorgans0803 commented Jan 7, 2022 via email

@Martianno
Copy link

Hello! Thanks for your fantastic utility. It works nicely!

May I have just one request? Would it be please possible to export my Documents as plain text - as a .txt file?

Thank you!

@brokensandals
Copy link
Author

@Martianno Glad you found it useful!

Would it be please possible to export my Documents as plain text - as a .txt file?

I won't have time to test or troubleshoot this, but, one thing you could try would be to replace MimeType.MICROSOFT_WORD on line 4 with MimeType.PLAIN_TEXT, and replace .docx on line 9 with .txt

@Martianno
Copy link

Thanks for a super fast reaction! It Works, God bless you!

@cygonzales
Copy link

Hi.. Thank you for the script. If i am not mistaken, the script is to backup all kinds of file and zip it. I am new in this environment and im trying to do the basics for each type of file. Can you help me with the script that can backup google docs file? I was able to make an auto backup for the spreadsheet however, im trying to replicate the code and replaced the var name but i dont know the specific function for google docs. I hope you could help me.. Thank you.. :)

function archiveCopy() {
var file= DriveApp.getFileById("1liz6Kfh4eiXJOR95Ry03m-lVO_6fv6-sylZ3N3RAWCg");
var destination= DriveApp.getFolderById("1y-aYx3SbnlZSde_J57FEj_Jyfo4bj6ED");

var timeZone=Session.getScriptTimeZone();
var formattedDate= Utilities.formatDate(new Date(),timeZone,"yyyy-MM-dd' 'HH:mm:ss");
var name= SpreadsheetApp.getActiveSpreadsheet().getName()+"Copy"+formattedDate;

file.makeCopy(name,destination);

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment