Skip to content

Instantly share code, notes, and snippets.

@mesgarpour
Last active March 10, 2024 00:55
Show Gist options
  • Save mesgarpour/07317e81e9ee2b3f1699 to your computer and use it in GitHub Desktop.
Save mesgarpour/07317e81e9ee2b3f1699 to your computer and use it in GitHub Desktop.
[Google Apps Script] List all files & folders in a Google Drive folder, & write into a speadsheet
/*
* Copyright 2017 Mohsen Mesgarpour
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* -----------------------------------------------------------------------------------------------------------
*
* ListFilesFolders script: It is a Google Apps Script, which lists all files and/or folders in a
* Google Drive folder, and then writes the list into a spreadsheet in batches. The script uses a
* caching mechanism to allow output recovery after possible crash; however, it won't continue
* to recover the interrupted script and continue the search.
* If you need to reduce/remove limitation on script runtime refer to the quotas for
* Google Services: https://developers.google.com/apps-script/guides/services/quotas
*
* Functions: There are two accessible functions that you may call:
* - 'run': It lists all folders and optionally files in the specified location, then writes them into
* the selected spreadsheet.
* - 'reset': It resets the script global cache. It must be run if the script was interrupted, to
* clean out the cache and triggers. Moreover, since the outputs are cached and are written
* after the whole list is created, if you run the script after the crash it would write all
* the cached output into the sheet, then clears the cache.
*
* Configurations: The following configuration parameters must be configured before running the script.
* - 'folderId' (type: string):
* The folder ID. The folder ID is everything after the 'folders/' portion of the URL.
* - 'searchDepthMax' (type: unsigned integer):
* The maximum depth for the recursive search of folders.
* - 'listFiles' (type: boolean):
* It is a flag to indicate if the listing must include files.
* - 'cacheTimeout' (type: unsigned integer, in milliseconds):
* The maximum time that internal cache persists in memory.
* - 'lockWaitTime' (type: unsigned integer, in milliseconds):
* The maximum watiting time for the cache reader/writer to wait for the memory lock.
* - 'appendToSheet' (type: boolean):
* It is a flag for appending to the selected spreadsheet.
* - 'writeBatchSize' (type: unsigned integer):
* The batch size for writing into the spreadsheet.
*
* Algorithm: The written functions uses a recursive function to list files & folders, and it uses
* a caching mechanisem to save the outputs in cache and write at the end.
*
* -----------------------------------------------------------------------------------------------------------
* Note-1: Because Apps Script services impose daily quotas and hard limitations on some features. If
* you exceed a quota or limitation, your script will throw an exception and terminate execution.
*
* Note-2: Firstly, set your folder ID ('folderId' variable)! You may copy the folder ID from the
* browser's address field. The folder ID is everything after the 'folders/' portion of the URL
* for Google Drive folder.
* Secondly, set the 'searchDepthMax' to a reasonable number, a very large number can
* significantly delay the outputs or may cause unexpected termination.
* Thirdly, set the 'listFiles' to 'false', if you only require the folders to be listed.
* Finally, other configuration parameters are preconfigured and can be left as default.
*
* Note-3: Because, this is a script and not a certified app, you must grant it permission, to run.
* When you run it for the first time, a pop-up window will open & asks you for permission.
*
* Note-4: Files and folders must NOT be modified in the selected path, as it may corrupt the
* generated list.
*
* Note-5: If you interrupt the script you might have to wait a few seconds (maximum 6 minutes),
* until you can re-run it.
*
* Note-6: It is recommended to first run the script on a small set of files and folders.
*
* Note-7: Make sure there are no other script in the current Google Sheet with similar function or
* variable names.
*
* Note-8: Refer to version 1.0 of the script, for a simplified version of the ListFilesFolders script:
* https://gist.github.com/mesgarpour/07317e81e9ee2b3f1699/23833cef09a62a3d2cf56b4143bb3cf4dbb5b827
*
* Note-9: The "teamdrive" does not support "getSharingPermission()", therefore comment out the lines
* that uses this function.
*
* Note-10: Note that when you use business version of the Google Drive, there are still limits on how
* much time can be spent on auditing (refer to Google's quotas). For instance, the script would
* timeout after ten minutes search a folder, therefore avoid auditing very big or deep folders.
*
* -----------------------------------------------------------------------------------------------------------
*
* @version 2.4 (2020.05)
* @see https://github.com/mesgarpour
*/
// Configurable variables
var folderId = 'My folder ID'; // The folder ID (everything after the 'folders/' portion of the URL).
var searchDepthMax = 100; // Max depth for recursive search of files and folders
var listFiles = true; // flag for listing files
var cacheTimeout = 24 * 60 * 60 * 1000; // set cache time-out
var lockWaitTime = 1 * 60 * 1000; // set maximium watiting time for the cache lock
var appendToSheet = false; // flag for appending to selected spreadsheet
var writeBatchSize = 100; // the write batch size
// ===========================================================================================================
// Global variables
var cacheOutputs = 'InventoryScript_outputs';
var cacheKillFlag = 'InventoryScript_killFlag';
// ===========================================================================================================
// Reset the script cache if it is required to run from the beginning
function reset() {
SpreadsheetApp.getActiveSpreadsheet().toast('Reseting script...', 'Status', -1);
// reset triggers and delete cache variables
setKillFlag_(true, this.cacheTimeout);
deleteTriggers_(this.loopResetGapTime);
deleteCache_();
SpreadsheetApp.getActiveSpreadsheet().toast('Reset is complete!', 'Status', -1);
}
// ===========================================================================================================
// List all folders and files, then write into the current spreadsheet.
function run() {
SpreadsheetApp.getActiveSpreadsheet().toast('Executing script...', 'Status', -1);
// load cache
setKillFlag_(false, this.cacheTimeout);
var outputRows = getCache_(this.lockWaitTime);
// get list
if (outputRows === undefined || outputRows === null ||
outputRows[0] === undefined || outputRows[0] === null) {
outputRows = [];
outputRows = getChildFiles_(null, DriveApp.getFolderById(this.folderId),
listFiles, cacheTimeout, outputRows);
outputRows = getFolderTree_(outputRows, this.folderId, this.listFiles, this.cacheTimeout,
this.lockWaitTime, this.searchDepthMax);
}
// write list
writeFolderTree_(outputRows, this.appendToSheet);
SpreadsheetApp.getActiveSpreadsheet().toast('Execution is complete!', 'Status', -1);
}
// ===========================================================================================================
// Get the list of folders and files
function getFolderTree_(outputRows, folderId, listFiles, cacheTimeout, lockWaitTime, searchDepthMax) {
var parentFolder, sheet = null;
var searchDepth = -1;
try {
// Get folder by id
parentFolder = DriveApp.getFolderById(folderId);
// Initialise the spreadsheet
sheet = SpreadsheetApp.getActiveSheet();
// Get files and/or folders
outputRows = getChildFolders_(searchDepth, parentFolder.getName(), parentFolder, sheet,
listFiles, cacheTimeout, lockWaitTime, outputRows, searchDepthMax);
} catch (e) {
SpreadsheetApp.getActiveSpreadsheet().toast('Timed out!', 'Status', -1);
}
return outputRows;
}
// ===========================================================================================================
// Write the list of folders and files into the spreadsheet
function writeFolderTree_(outputRows, appendToSheet) {
var sheet = null;
try {
if (getKillFlag_() === false) {
// Initialise the spreadsheet
sheet = SpreadsheetApp.getActiveSheet();
// Write to the selected spreadsheet
writeOutputs_(sheet, outputRows, appendToSheet);
// reset cache
reset();
}
} catch (e) {
SpreadsheetApp.getActiveSpreadsheet().toast('Timed out!', 'Status', -1);
}
}
// ===========================================================================================================
// Get the list of folders and files and their metadata using a recursive loop
function getChildFolders_(searchDepth, parentFolderName, parentFolder, sheet, listFiles, cacheTimeout,
lockWaitTime, outputRows, searchDepthMax) {
var childFolders = parentFolder.getFolders();
var childFolder = null;
searchDepth += 1;
try{
// List sub-folders inside the folder
while (childFolders.hasNext() && searchDepth < searchDepthMax && getKillFlag_() === false) {
childFolder = childFolders.next();
SpreadsheetApp.getActiveSpreadsheet().toast('Searching folder ' + childFolder.getName() +
' at depth ' + searchDepth + " ...", 'Status', -1);
// Get folder information
// Logger.log("Folder Name: " + childFolder.getName());
outputRows.push([
parentFolderName + "/" + childFolder.getName(),
childFolder.getName(),
"Folder",
childFolder.getDateCreated() ? childFolder.getDateCreated() : "NULL",
childFolder.getUrl() ? childFolder.getUrl() : "NULL",
childFolder.getLastUpdated() ? childFolder.getLastUpdated() : "NULL",
childFolder.getDescription() ? childFolder.getDescription() : "NULL",
childFolder.getSize() ? childFolder.getSize() : "NULL",
childFolder.getOwner() ? childFolder.getOwner() : "NULL",
childFolder.getOwner().getEmail() ? childFolder.getOwner().getEmail() : "NULL",
childFolder.getSharingPermission() ? childFolder.getSharingPermission() : "NULL",
childFolder.getSharingAccess() ? childFolder.getSharingAccess() : "NULL"
//, '=HYPERLINK("' + childFile.getUrl() + '", IMAGE("' + Drive.Files.get(childFolder.getId()).thumbnailLink + '",1))' //The 'Drive service' is a G-Suite service (commercial service)
// more info: https://developers.google.com/apps-script/reference/drive/folder
]);
// cache outputs
setCache_(outputRows, lockWaitTime, cacheTimeout);
// List files inside the folder
outputRows = getChildFiles_(
parentFolder, childFolder, listFiles, cacheTimeout, outputRows);
// Recursive call of the current sub-folder
outputRows = getChildFolders_(searchDepth++, parentFolderName + "/" + childFolder.getName(),
childFolder, sheet, listFiles, cacheTimeout, lockWaitTime, outputRows, searchDepthMax);
}
} catch (e) {
Logger.log('Timed out: Restarting! ' + e.toString());
SpreadsheetApp.getActiveSpreadsheet().toast( 'Timed out!', 'Status', -1);
}
// cache outputs
setCache_(outputRows, lockWaitTime, cacheTimeout);
return outputRows;
}
// ===========================================================================================================
// Get the list of files in the selected folder
function getChildFiles_(parentFolder, childFolder, listFiles, cacheTimeout, outputRows) {
var childFiles = childFolder.getFiles();
var childFile = null;
var path = ""
try{
// List files inside the folder
while (listFiles && childFiles.hasNext()) {
childFile = childFiles.next();
// derive path
if (parentFolder === null){
path = childFolder.getName() + "/" + childFile.getName()
}else{
path = parentFolder.getName() + "/" + childFolder.getName() + "/" + childFile.getName()
}
// Get file information
//Logger.log("File Name: " + childFile.getName());
outputRows.push([
path,
childFile.getName(),
childFile.getName().split('.').pop(),
childFile.getDateCreated() ? childFile.getDateCreated() : "NULL",
childFile.getUrl() ? childFile.getUrl() : "NULL",
childFile.getLastUpdated() ? childFile.getDescription() : "NULL",
childFile.getDescription() ? childFile.getDescription() : "NULL",
childFile.getSize() ? childFile.getSize() : "NULL",
childFile.getOwner() ? childFolder.getOwner() : "NULL",
childFile.getOwner().getEmail() ? childFile.getOwner().getEmail() : "NULL",
childFile.getSharingPermission() ? childFile.getSharingPermission() : "NULL",
childFile.getSharingAccess() ? childFile.getSharingAccess() : "NULL"
//, '=HYPERLINK("' + childFile.getUrl() + '", IMAGE("' + Drive.Files.get(childFile.getId()).thumbnailLink + '",1))' //The 'Drive service' is a G-Suite service (commercial service)
// more info: https://developers.google.com/apps-script/reference/drive/folder
]);
}
// cache outputs
setCache_(outputRows, lockWaitTime, cacheTimeout);
} catch (e) {
Logger.log('Timed out: Restarting! ' + e.toString());
SpreadsheetApp.getActiveSpreadsheet().toast('Timed out!', 'Status', -1);
}
return outputRows;
}
// ===========================================================================================================
// Set the values from cache
function setCache_(outputRows, lockWaitTime, cacheTimeout) {
try{
var cache = CacheService.getScriptCache();
var lock = LockService.getScriptLock();
lock.waitLock(lockWaitTime);
cache.put(cacheOutputs, JSON.stringify(outputRows), cacheTimeout);
lock.releaseLock();
} catch (e) {
Logger.log('Timed out: Restarting! ' + e.toString());
SpreadsheetApp.getActiveSpreadsheet().toast('Timed out!', 'Status', -1);
}
}
// ===========================================================================================================
// Get the values in cache
function getCache_(lockWaitTime) {
try{
var outputRows = [];
var cache = CacheService.getScriptCache();
var lock = LockService.getScriptLock();
lock.waitLock(lockWaitTime);
outputRows = JSON.parse(cache.get(cacheOutputs));
if (outputRows === undefined || outputRows === null ||
outputRows[0] === undefined || outputRows[0] === null) {
outputRows = JSON.parse(cache.get(cacheOutputs));
}
lock.releaseLock();
} catch (e) {
SpreadsheetApp.getActiveSpreadsheet().toast('Timed out!', 'Status', -1);
}
return outputRows;
}
// ===========================================================================================================
// Write outputs to the selected spreadsheet
function writeOutputs_(sheet, outputRows, appendToSheet) {
try{
var range, rowStart, indexStart, indexEnd = null;
var headerRow = ["Full Path", "Name", "Type", "Date", "URL", "Last Updated", "Description", "Size",
"Owner", "Sharing Permission", "Sharing Access"]; //, "Thumbnail"];
SpreadsheetApp.getActiveSpreadsheet().toast('Writing outputs...', 'Status', -1);
if (sheet !== null && outputRows.length > 0) {
if (appendToSheet === false) {
sheet.clear();
sheet.appendRow(headerRow);
rowStart = 2;
} else {
rowStart = getRowsFilled_(sheet, "A1:A") + 1;
}
indexStart = 0;
indexEnd = Math.min(writeBatchSize, outputRows.length);
while (indexStart < outputRows.length) {
range = sheet.getRange(rowStart + indexStart, 1, indexEnd - indexStart, outputRows[0].length);
range.setValues(outputRows.slice(indexStart, indexEnd));
a = outputRows.slice(indexStart, indexEnd);
indexStart = indexEnd;
indexEnd = Math.min(indexStart + writeBatchSize, outputRows.length);
}
range = sheet.getRange(getRowsFilled_(sheet, "A1:A") + 1, 1, 1, 1);
range.setValues([["End of List!"]]);
}
} catch (e) {
SpreadsheetApp.getActiveSpreadsheet().toast('Timed out!', 'Status', -1);
}
}
// ===========================================================================================================
// Get number of rows filled in the selected spreadsheet
function getRowsFilled_(sheet, selectedRange) {
var selectedMatrix = sheet.getRange(selectedRange).getValues();
return selectedMatrix.filter(String).length;
}
// ===========================================================================================================
// Delete the global cache
function deleteCache_() {
try{
var cache = CacheService.getScriptCache();
var lock = LockService.getScriptLock();
lock.waitLock(this.lockWaitTime);
cache = CacheService.getScriptCache();
cache.remove(cacheOutputs);
lock.releaseLock();
} catch (e) {
Logger.log('Failed to delete cache! ' + e.toString());
SpreadsheetApp.getActiveSpreadsheet().toast('Failed to delete cache! Try again in a few minutes.');
}
}
// ===========================================================================================================
// Delete triggers
function deleteTriggers_() {
var triggers = ScriptApp.getProjectTriggers();
try{
for (var i = 0; i < triggers.length; i++) {
if (triggers[i].getHandlerFunction() === "run") {
ScriptApp.deleteTrigger(triggers[i]);
}
}
} catch (e) {
Logger.log('Failed to delete triggers! ' + e.toString());
SpreadsheetApp.getActiveSpreadsheet().toast('Failed to delete triggers! Try again in a few minutes.');
}
}
// ===========================================================================================================
// Set kill flag
function setKillFlag_(state, cacheTimeout) {
var lock = LockService.getScriptLock();
try{
lock.waitLock(this.lockWaitTime);
cache = CacheService.getScriptCache();
cache.put(cacheKillFlag, state, cacheTimeout);
lock.releaseLock();
} catch (e) {
SpreadsheetApp.getActiveSpreadsheet().toast('Failed to set kill flag! Try again in a few minutes.');
}
}
// ===========================================================================================================
// Get kill flag
function getKillFlag_() {
killFlag = false;
try {
cache = CacheService.getScriptCache();
//lock.waitLock(this.lockWaitTime);
killFlag = cache.get(cacheKillFlag) === 'true';
//lock.releaseLock();
} catch (e) {
SpreadsheetApp.getActiveSpreadsheet().toast('Failed to set kill flag! Try again in a few minutes.');
}
return killFlag;
}
@its-dark-in-here
Copy link

@mesgarpour - Fantastic script - really useful.

However, like @asafpeled (ages ago if you look back through comments) I am getting a lot of Timeouts with error "Argument too large: value". I am using business Gsuite so limits should not necessarily be the problem, and I believe I have finally tracked it down.

The cause as @asafpeled noted is in the SetCache routine. The Google doc says there is a limit of 100Kb (102,400 bytes) on the CacheService, and with a little additional code, I can see that for one of my Drive scans I am getting the error at trying to save 102,605 bytes to the cache!

As I am using the business Gsuite, I have decided to forego the Cache function, and commented it out, and this works for me!

@papalphacharlie
Copy link

To help keep your account secure, from May 30, 2022, ​​Google no longer supports the use of third-party apps or devices which ask you to sign in to your Google Account using only your username and password.
https://support.google.com/accounts/answer/6010255?hl=en&visit_id=637902724079718537-430457167&p=less-secure-apps&rd=1#zippy=%2Cif-less-secure-app-access-is-on-for-your-account

Has anyone tried using this script recently? Cannot find a way to enable "Less secure apps" on my personal Google account anymore.

@JgLighting
Copy link

JgLighting commented Sep 6, 2022

@mesgarpour
Is it possible to have new files in the folder appended at the end of the list ?

I only have a single level of files and no child folders in the folder I wish to run the script on.

Thanks for a great script

@GJoe2
Copy link

GJoe2 commented Sep 27, 2022

I'm having issues with multiple google accounts. Every time I use var ss = SpreadsheetApp.getActiveSpreadsheet(); It returns Cannot read property 'getSheetByName' of null. To work around it I need to use var ss = SpreadsheetApp.openById("abc1234567");

I see your code use sheet = SpreadsheetApp.getActiveSheet(); in lines 165 and 186. It would be convenient if we just select the desire spreadsheet using ID or URL in another parameter.

Also, I would like to ask how to extract the number of pages if the file is a pdf

@kmnhngyn
Copy link

kmnhngyn commented Oct 19, 2022

hi, thanks for the script! When trying to run the script on a folder with a depth of 5 and more subfolders and a lot of files, I get a time out or that the output is too large ("Restarting! Exception: Argument too large). I understand that the memory/cache gets too big. Any ideas how I can adjust the script so that it doesn't store the rows until finished, but tries to write the first subfolder, then the second, the third,... and so on? Thanks!

@aalooksth
Copy link

@lutfiihsan to get all Drive tree, just replace
var parentFolder = DriveApp.getFolderById(folderId);
with
var parentFolder = DriveApp.getRootFolder();

Do note if there are too many files, it will result in an error.

@rajesh-h
Copy link

rajesh-h commented Jul 21, 2023

@mesgarpour I know you are not actively maintaining this.

can you answer this? when i set depth to 1 listFiles= False, my expectation is it will bring all folder names within the specified folder. But I get only one folder name is this expected?

var searchDepthMax = 1; // Max depth for recursive search of files and folders
var listFiles = false;

Eg.

ParentFolder
child1
grandchild1
child2
grandchild1
child3
child4.xlsx

My expectation is it should bring
child1
child2
child3

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment