Skip to content

Instantly share code, notes, and snippets.

@tanaikech
Last active January 17, 2024 12:02
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tanaikech/cc4380692790150cc090114553e38a0e to your computer and use it in GitHub Desktop.
Save tanaikech/cc4380692790150cc090114553e38a0e to your computer and use it in GitHub Desktop.
Handling 10,000,000 cells in Google Spreadsheet using Google Apps Script

Report: Handling 10,000,000 cells in Google Spreadsheet using Google Apps Script

Introduction

On March 14, 2022, it reported about "Google Sheets doubles cell limit". Ref By this update, now, the users can use 10,000,000 cells in a Google Spreadsheet. This is great news for us. When I tried to handle 10,000,000 cells in a Google Spreadsheet using Google Apps Script, it was found that there were various important points. In this report, I would like to introduce the important points for handling 10,000,000 cells in Google Spreadsheet using Google Apps Script.

Experiences

In this report, as a sample Google Spreadsheet, a Google Spreadsheet including a sheet that has 10,000,000 cells (10 columns x 1,000,000 rows) is used.

Using this sample Spreadsheet, I would like to introduce the following patterns.

  1. Retrieve the metadata from Spreadsheet.

  2. Retrieve all values from 10,000,000 cells.

  3. Retrieve a value from a cell "A1" and a cell "J1000000".

  4. Search a value from a cell "A1" and a cell "J1000000".

  5. Put a value to a cell "A1" and a cell "J1000000".

  6. Embed all 10,000,000 cells by a value.

  7. Embed all 10,000,000 cells by the difference values.

  8. Export Spreadsheet with 10,000,000 cells.

Result and discussions

1. Retrieve the metadata from Spreadsheet

After all different values were put to 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows, when the metadata of the numbers of maximum row and maximum column is retrieved using Spreadsheet service (SpreadsheetApp) and Sheets API, the following average process time is obtained.

  • Spreadsheet service: 0.3 s

  • Sheets API: 0.3 s

From the above results, it was found that when the metadata from a Spreadsheet that has all different values in 10,000,000 cells is retrieved, it can be done with the low process cost for Spreadsheet service and Sheets API. In this case, even when all 10,000,000 cells are empty is used, almost the same result was obtained.

2. Retrieve a value from a cell "A1" and a cell "J1000000"

After all different values were put to 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows, when the values from the cells "A1" and "J1000000" are retrieved using Spreadsheet service (SpreadsheetApp) and Sheets API, the following average process time is obtained.

  • Spreadsheet service: 0.3 s for both "A1" and "J1000000"

  • Sheets API: 0.3 s for both "A1" and "J1000000"

From the above results, it was found that when the values from the cells "A1" and "J1000000" are retrieved, it can be done with the low process cost for Spreadsheet service and Sheets API.

IMPORTANT

As another experiment, after all different values were put to 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows, when the Spreadsheet is opened soon, all values are not shown soon. But, using one script, after all different values were put to 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows, when the values are continuously retrieved from the cells of "A1" and "J1000000" soon, the correct values are returned soon. From this result, it is considered that the database on the internal server side has all values even when all values are not shown by opening Spreadsheet. I thought that this might be important information.

3. Retrieve all values from 10,000,000 cells

When I tried to retrieve all values from 10,000,000 cells, I obtained the following important points.

  1. When SpreadsheetApp.openById(ssId).getDataRange().getValues() with Spreadsheet service (SpreadsheetApp) is used, an error of Exceeded maximum execution time occurs.

  2. When Sheets.Spreadsheets.Values.get(ssId, "Sheet1") with Sheets API is used, an error of response too large occurs.

Namely, it was found that all values from 10,000,000 cells cannot be directly retrieved. So, as a workaround, I proposed to retrieve all values from 10,000,000 cells by separating rows.

When this is reflected in a script, it was found that all values from 10,000,000 cells can be retrieved by one execution of the script using both Spreadsheet service (SpreadsheetApp) and Sheets API. Here, I would like to introduce the result obtained by retrieving all values from 10,000,000 cells by separating the retrieved rows.

After all different values were put to 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows, when all values are retrieved using Spreadsheet service (SpreadsheetApp) and Sheets API, the following average process time is obtained. In this case, all values were retrieved from 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows by separating every 200,000 rows.

  • Spreadsheet service: 85 s

  • Sheets API: 45 s

From the above results, it was found that the process cost of Sheets API is lower than that of Spreadsheet service. This is the same trend with my published report.

IMPORTANT

In this sample, the values are retrieved by separating every 200,000 rows. But I think that this is not a suitable value for obtaining the lowest process cost. This is a sample script for retrieving all values from 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows by one execution of the script. Please be careful about this.

4. Search a value from a cell "A1" and a cell "J1000000".

After all different values were put to 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows, when a values of cells "A1" and "J1000000" are searched using TextFinder, the following average process time is obtained.

  • For "A1": 0.3 s for findNext() and 15 s for findAll()

  • For "J1000000": 15 s for both findNext() and findAll()

From the above results, it was found that when the values from the cells "A1" and "J1000000" are searched using TextFinder, the process cost of findNext() for searching a value that can be found from "A1" is lower than that of findAll() for searching a value which can be found from "J1000000". Because it is considered that findAll() scans all cells while findNext() can be finished when a value is found.

5. Put a value to a cell "A1" and a cell "J1000000"

After all different values were put to 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows, when a value is put to a cell "A1", a cell "J1000000" using Spreadsheet service (SpreadsheetApp) and Sheets API, the following average process time is obtained.

  • Spreadsheet service: Put a value to a cell "A1" and "J1000000": 0.2 s for both

  • Sheets API: Put a value to a cell "A1" and "J1000000": 0.2 s for both

From the above results, it was found that when a single value is put in a cell "A1" and "J1000000" in a Spreadsheet that has all different values in 10,000,000 cells is retrieved, it can be done with a low process cost for Spreadsheet service and Sheets API. In this case, even when all 10,000,000 cells are empty is used, almost the same result was obtained.

6. Embed all 10,000,000 cells by a value

When all cells in a sheet with 10 columns x 1,000,000 rows are embedded by one value like "sample" using Spreadsheet service (SpreadsheetApp) and Sheets API, the following average process time is obtained.

  • Spreadsheet service: 1 s.

  • Sheets API: 3 s.

From the above results, it was found that when all cells in a sheet with 10 columns x 1,000,000 rows are embedded by one value like "sample" using Spreadsheet service (SpreadsheetApp) and Sheets API, both processes can be done with the low process cost. And also, it was found that in this case, the process cost of Spreadsheet service is lower than that of Sheets API. This is an interesting result.

7. Embed all 10,000,000 cells by the difference values

When I tried to put all different values to 10,000,000 cells in a sheet with 10 columns x 1,000,000 rows, I obtained the following important points.

  1. When SpreadsheetApp.openById(ssId).getRange(1, 1, 1000000, 10).setValues(values) with Spreadsheet service (SpreadsheetApp) is used, an error of Exceeded maximum execution time occurs.

  2. When Sheets.Spreadsheets.Values.update({ values }, ssId, "Sheet1!A1", { valueInputOption: "USER_ENTERED"}) with Sheets API is used, an error of API call to sheets.spreadsheets.values.update failed with error: Empty response occurs.

Namely, it was found that the values cannot be directly put to 10,000,000 cells. So, as a workaround, I proposed to put the values to 10,000,000 cells by separating rows. This is the same method for retrieving values from 10,000,000 cells.

But, unfortunately, it was found that in this case, all 10,000,000 cells cannot be embedded using setValues of Spreadsheet service (SpreadsheetApp) even when this workaround is used, because of an error of Exceeded maximum execution time.

When this workaround is reflected in a script, it was found that the values are put to 10,000,000 cells by one execution of the script using Sheets API. Here, I would like to introduce the result obtained by putting values to 10,000,000 cells by separating the retrieved rows.

When the values are put using Sheets API, the following average process time is obtained. In this case, the values of 10 columns x 1,000,000 rows were put on a sheet by separating every 300,000 rows.

  • Spreadsheet service: fail

  • Sheets API: 220 s

From the above results, it was found that when the values are put into a sheet of 10 columns x 1,000,000 rows by separating the rows using Sheets API, 10,000,000 cells can be embedded by one execution of the script.

IMPORTANT

In this sample, the values are put on a sheet by separating every 300,000 rows. But I think that this is not a suitable value for obtaining the lowest process cost. This is a sample script for putting the values to 10,000,000 cells of a sheet with 10 columns x 1,000,000 rows by one execution of the script. Please be careful about this.

8. Export Spreadsheet with 10,000,000 cells

When a Spreadsheet including a sheet of 10 columns x 1,000,000 rows are exported using a script, the following result is obtained.

  • Spreadsheet can be exported as an XLSX file and CSV file.

    • When all cell values from #0000000A1 to #0J1000000 are used, the file size of the exported XLSX and CSV are 49,459,938 bytes and 52,427,565 bytes, respectively.

    • The average process times for exporting XLSX data and CSV data are as follows.

      • XLSX: 100 s
      • CSV: 40 s
  • Spreadsheet cannot be exported as a PDF file.

Summary

In this report, handling 10,000,000 cells in Google Spreadsheet using Google Apps Script has been investigated. And, the following results could be obtained.

  • Process costs for retrieving the metadata of Spreadsheet and a single value from a cell, and also searching a value, putting the same value to 10,000,000 cells are very low.

  • In order to retrieve all cell values from 10,000,000 cells using Spreadsheet service and Sheets API, it is required to retrieve them by separating the rows. Because all values cannot be directly retrieved by one request.

  • In order to put all different values to 10,000,000 cells using Sheets API, it is required to put them by separating the values. Because all values cannot be directly put by one request. And also, in this case, setValues of the Spreadsheet service cannot be used.

  • Spreadsheet embedded by all cell values to 10,000,000 cells can be exported as an XLSX file and a CSV file. But, it cannot be exported as a PDF file.

Note

  • At GAS, the processing time is not stable as you know. So the average value for more than 100 times measurements was used for each value which is shown by the showing process time. At this time, the fluctuation of the average values was less than 1 %. I worry that each detailed data in my environment might be different from that of other user’s environments. But I believe that the trend of this result can be used.

  • In my experience, it was found that a sheet of 1 column and 10,000,000 rows can be created.

  • These results are the current results (April 25, 2022). I would like to expect that this situation will be modified in the future update.

Appendix

These are the sample scripts for measuring process costs for each experiment.

1. Retrieve the metadata from Spreadsheet

For Spreadsheet service

const sheet = SpreadsheetApp.openById(ssId).getSheetByName("Sheet1");
const maxRow = sheet.getMaxRows();
const maxCol = sheet.getMaxColumns();

For Sheets API

const { rowCount, columnCount } = Sheets.Spreadsheets.get(ssId, {
  fields: "sheets(properties)",
}).sheets[0].properties.gridProperties;

2. Retrieve a value from a cell "A1" and a cell "J1000000"

For Spreadsheet service

const res = SpreadsheetApp.openById(ssId)
  .getSheetByName("Sheet1")
  .getRange(1, 1) // and .getRange(1000000, 10)
  .getValue();

For Sheets API

const range = "A1"; // and "J1000000"
const res = Sheets.Spreadsheets.Values.get(ssId, `Sheet1!${range}`)
  .values[0][0];

3. Retrieve all values from 10,000,000 cells

For Spreadsheet service

const limit = 200000; // Number of rows per one request of getValues().

let start = 1;
const ss = SpreadsheetApp.openById(ssId);
const res = [...Array(Math.ceil(maxRow / limit))].flatMap((_) => {
  const last = start - 1 + limit;
  const range = `'Sheet1'!A${start}:${last > maxRow ? maxRow : last}`;
  const temp = ss.getRange(range).getValues();
  start += limit;
  return temp;
});

For Sheets API

const limit = 200000; // Number of rows per one API call.

let start = 1;
const res = [...Array(Math.ceil(maxRow / limit))].flatMap((_) => {
  const last = start - 1 + limit;
  const range = `'Sheet1'!A${start}:${last > maxRow ? maxRow : last}`;
  const temp = Sheets.Spreadsheets.Values.get(ssId, range).values;
  start += limit;
  return temp;
});

4. Search a value from a cell "A1" and a cell "J1000000".

For searching a value from a cell "A1"

const r = SpreadsheetApp.openById(ssId)
  .getSheetByName("Sheet1")
  .createTextFinder("#0000000A1")
  .matchEntireCell(true)
  .findNext(); // and findAll()

For searching a value from a cell "J1000000"

const r = SpreadsheetApp.openById(ssId)
  .getSheetByName("Sheet1")
  .createTextFinder("#0J1000000")
  .matchEntireCell(true)
  .findNext(); // and findAll()

5. Put a value to a cell "A1" and a cell "J1000000"

For Spreadsheet service

SpreadsheetApp.openById(ssId)
  .getSheetByName("Sheet1")
  .getRange(1, 1) // and .getRange(1000000, 10)
  .setValue("sample");

For Sheets API

Sheets.Spreadsheets.Values.update(
  { values: [["sample"]] },
  ssId,
  "Sheet1!A1", // and "Sheet1!J1000000"
  { valueInputOption: "USER_ENTERED" }
);

6. Embed all 10,000,000 cells by a value

For Spreadsheet service

SpreadsheetApp.openById(ssId)
  .getSheetByName("Sheet1")
  .getRange(1, 1, 1000000, 10)
  .setValue("sample");

For Sheets API

const requests = [
  {
    repeatCell: {
      range: { sheetId: 0 },
      cell: { userEnteredValue: { stringValue: "sample" } },
      fields: "userEnteredValue",
    },
  },
];
Sheets.Spreadsheets.batchUpdate({ requests }, ssId);

7. Embed all 10,000,000 cells by the difference values

For Spreadsheet service

const limit = ###; // Number of rows per one request of getValues().
const values = [[]]; // Please set the array including 10 columns and 1,000,000 rows.

let start = 1;
const sheet = SpreadsheetApp.openById(ssId).getSheetByName("Sheet1");
Array.from(Array(Math.ceil(maxRow / limit))).forEach((_) => {
  const v = values.splice(0, limit);
  sheet.getRange(start, 1, v.length, v[0].length).setValues(v);
  start += limit;
});
  • In this case, all 10,000,000 cells couldn't be embedded using setValues.

For Sheets API

const limit = 200000; // Number of rows per one API call.
const values = [[]]; // Please set the array including 10 columns and 1,000,000 rows.

let start = 1;
Array.from(Array(Math.ceil(maxRow / limit))).forEach((_) => {
  const v = values.splice(0, limit);
  Sheets.Spreadsheets.Values.update({ values: v }, ssId, `Sheet1!A${start}`, {
    valueInputOption: "USER_ENTERED",
  });
  start += limit;
});

A sample script for retrieving the sample array of values is as follows.

// Ref: https://stackoverflow.com/a/21231012/7108653
const columnToLetter = (column) => {
  let temp,
    letter = "";
  while (column > 0) {
    temp = (column - 1) % 26;
    letter = String.fromCharCode(temp + 65) + letter;
    column = (column - temp - 1) / 26;
  }
  return letter;
};

function getSampleValues() {
  const cols = [...Array(maxCol)].map((_, i) => columnToLetter(i + 1));
  const values = [...Array(maxRow)].map((_, i) =>
    [...Array(maxCol)].map(
      (_, j) => "#" + `${cols[j]}${i + 1}`.padStart(9, "0")
    )
  );
  return values;
}

8. Export Spreadsheet with 10,000,000 cells

const obj = {
  xlsx: `https://docs.google.com/spreadsheets/export?id=${ssId}&exportFormat=xlsx`,
  csv: `https://docs.google.com/spreadsheets/export?id=${ssId}&exportFormat=csv`,
  pdf: `https://docs.google.com/spreadsheets/export?id=${ssId}&exportFormat=pdf`,
};
const token = ScriptApp.getOAuthToken();
Object.entries(obj).forEach(([k, v], i) => {
  const blob = UrlFetchApp.fetch(v, {
    headers: { authorization: "Bearer ###your access token###" },
  }).getBlob();
  DriveApp.createFile(blob.setName(k));
});
@GEC1227
Copy link

GEC1227 commented Jan 3, 2023

Thanks for taking the time to create this report--It's helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment