Skip to content

Instantly share code, notes, and snippets.

@tanaikech
Created April 26, 2024 06:19
Show Gist options
  • Save tanaikech/f3dbba02a620cfd7a729e13b012ea66f to your computer and use it in GitHub Desktop.
Save tanaikech/f3dbba02a620cfd7a729e13b012ea66f to your computer and use it in GitHub Desktop.
Batch Processing Powerhouse: Leverage Gemini 1.5 API and Google Apps Script for Efficient Content Workflows

Batch Processing Powerhouse: Leverage Gemini 1.5 API and Google Apps Script for Efficient Content Workflows

Abstract

A new Google Apps Script library, "GeminiWithFiles", simplifies using the powerful Gemini 1.5 AI model. It lets users directly upload files for content generation or create descriptions for many images at once, making it much faster than prior methods. This is helpful for tasks involving large amounts of text or images.

Introduction

Recently, Gemini, a family of Google's most capable AI models, has revolutionized various tasks by allowing unstructured data to be used as structured data. This breakthrough is particularly impactful for tasks involving large amounts of text or images.

The Gemini 1.5 API, released in February 2024, significantly expands capabilities compared to Gemini 1.0. Here are some key improvements:

  • Content generation: Gemini 1.5 can generate content up to 1 million tokens, enabling comprehensive and detailed outputs.
  • Image processing: Gemini 1.5 can now process up to 3,000 image files, a vast leap from the 16-image limit of Gemini 1.0.

While Gemini cannot directly work with Google Drive formats like Docs, Sheets, Slides, or PDFs yet (future updates will address this), workarounds exist. A previous report explored a method to convert PDFs to image data for Gemini processing. Ref This approach proved successful for parsing invoices and various text formats.

This report builds a Google Apps Script library "GeminiWithFiles" upon that concept by introducing a user-friendly Google Apps Script that simplifies Gemini integration. The library mainly offers two functionalities:

  • Content generation from uploaded files: Users can directly upload files, and the script will process them through Gemini to generate content.
  • Batch image description creation: This script allows generating descriptions for a large number of images in a single API call. This significantly improves efficiency compared to the one-image-at-a-time approach used in the previous report. Ref

By streamlining the process and enabling batch operations, this script offers a powerful tool for various use cases.

Repository

You can see the script of "GeminiWithFiles" in the following repository. https://github.com/tanaikech/GeminiWithFiles

Origins of this library

I created this library based on the following reports.

Features

This library GeminiWithFiles allows you to interact with Gemini, a powerful document processing and management platform, through an easy-to-use API. Here's what you can achieve with this library:

File Management:

  • Upload files to Gemini for storage and future processing with an asynchronous process.
  • Retrieve a list of files currently stored in your Gemini account.
  • Delete files from your Gemini account with an asynchronous process.

Content Upload:

  • Upload various file formats including Google Docs (Documents, Spreadsheets, Slides), and PDFs. Gemini will convert each page of the uploaded file into images for further processing.

Chat History Management:

  • Save your chat history for later analysis or retrieval.

Content Generation:

  • Process multiple files at once (e.g., images, papers, invoices) using a single API call to generate new content based on the uploaded data.

Output Specification:

  • Specify the desired output format for the results generated by the Gemini API.

Usage

In order to test this script, please do the following steps.

1. Create an API key

Please access https://makersuite.google.com/app/apikey and create your API key. At that time, please enable Generative Language API at the API console. This API key is used for this sample script.

This official document can also be seen. Ref.

2. Create a Google Apps Script project

Please create a standalone Google Apps Script project. Of course, this script can also be used with the container-bound script.

And, please open the script editor of the Google Apps Script project.

3. How to use GeminiWithFiles

There are 2 patterns for using GeminiWithFiles.

1. Use GeminiWithFiles as a Google Apps Script library

If you use this library as a Google Apps Script library, please install the library to your Google Apps Script project as follows.

  1. Create a Google Apps Script project. Or, open your Google Apps Script project.

    • You can use this library for the Google Apps Script project of both the standalone and container-bound script types.
  2. Install this library.

    • The library's project key is as follows.
12DIn1iEX4ebDB3DZAj0TQAgANGmbxZr1jlA4e7vZCdD7gXqYyhj7XePP

This library uses another library PDFApp. This is used for converting PDF data to image data. PDFApp has already been installed in this library. So, you are not required to operate about this.

2. Use GeminiWithFiles in your own Google Apps Script project

If you use this library in your own Google Apps Script project, please copy and paste Class GeminiWithFiles into your Google Apps Script project. By this, the script can be used.

In this case, please install another library PDFApp. This is used for converting PDF data to image data. When an error like ReferenceError: PDFApp is not defined occurs, please check this.

4. Constructor

Before the sample script is shown, please confirm how to create the constructor of GeminiWithFiles.

When you install GeminiWithFiles as a library to your Google Apps Script project, please use the following script. The scripts in this report uses this.

const g = GeminiWithFiles.geminiWithFiles(object);

or

When you directly copy and paste the script of Class GeminiWithFiles into your Google Apps Script project, please use the following script.

const g = new GeminiWithFiles(object);

The value of object is as follows.

{Object} object API key or access token for using Gemini API.
{String} object.apiKey API key.
{String} object.accessToken Access token.
{String} object.model Model. Default is "models/gemini-1.5-pro-latest".
{String} object.version Version of API. Default is "v1beta".
{Boolean} object.doCountToken Default is false. If this is true, when Gemini API is requested, the token of request is shown in the log.
{Array} object.history History for continuing chat.
{Array} object.functions If you want to give the custom functions, please use this.

Samples

1. Generate content

This script generates content from a text.

function myFunction() {
  const apiKey = "###"; // Please set your API key.

  const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
  // const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.

  const res = g.generateContent({ q: "What is Google Apps Script?" });
  console.log(res);
}
  • If you use this by installing it as a library using the library key, please use const g = new GeminiWithFiles.geminiWithFiles({ apiKey });.
  • If you use this by directly copying and pasting, please use const g = new Gemini({ apiKey });.

2. Chat 1

This script generates content with a chat.

function myFunction() {
  const apiKey = "###"; // Please set your API key.

  const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
  // const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.

  // Question 1
  const res1 = g.generateContent({ q: "What is Google Apps Script?" });
  console.log(res1);

  // Question 2
  const res2 = g.generateContent({ q: "What is my 1st question?" });
  console.log(res2);
}

When this script is run, res1 and res2 are as follows.

res1

Google Apps Script is a rapid application development platform that makes it fast and easy to create business applications that integrate with Google Workspace.

res2

Your first question was "What is Google Apps Script?"

Chat 2

function myFunction() {
  const apiKey = "###"; // Please set your API key.

  const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
  // const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.

  // Question 1
  const q =
    "Return the current population of Kyoto, Osaka, Aichi, Fukuoka, Tokyo in Japan as JSON data with the format that the key and values are the prefecture name and the population, respectively.";
  const res1 = g.generateContent({ q });
  console.log(res1);

  // Question 2
  const res2 = g.generateContent({
    q: "Also, return the current area of them as JSON data with the format that the key and values are the prefecture name and the area (km^2), respectively.",
  });
  console.log(res2);
}

When this script is run, the following values can be seen in the log. By doCountToken: true, you can see the total tokens.

{
  "totalTokens": 40
}

res1

{
  Kyoto: 1464956,
  Fukuoka: 5135214,
  Osaka: 8838716,
  Tokyo: 14047594,
  Aichi: 7552873
}
{
  "totalTokens": 77
}

res2

{
  Kyoto: 4612.71,
  Tokyo: 2194.07,
  Aichi: 5172.4,
  Osaka: 1904.99,
  Fukuoka: 4986.51
}

Upload files to Gemini

In this case, async/await is used in the function.

async function myFunction() {
  const apiKey = "###"; // Please set your API key.
  const folderId = "###"; // Please set your folder ID including images.

  let fileIds = [];
  const files = DriveApp.getFolderById(folderId).getFiles();
  while (files.hasNext()) {
    const file = files.next();
    fileIds.push(file.getId());
  }
  const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
  // const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
  const res = await g.setFileIds(fileIds, false).uploadFiles();
  console.log(res);
}
  • When this script is run, the files can be uploaded to Gemini. The uploaded files can be used to generate content with Gemini API.
  • In my test, when the files are uploaded using this script, I confirmed that 100 files can always be uploaded. But, when the number of files is more than 100, an error of Exceeded maximum execution time sometimes occurs. Please be careful about this.

Upload image files and create descriptions of images

In this sample, multiple image files are uploaded and the descriptions are created from the uploaded image files. This sample will be the expanded version of my previous report "Automatically Creating Descriptions of Files on Google Drive using Gemini Pro API with Google Apps Script".

async function myFunction() {
  const apiKey = "###"; // Please set your API key.
  const folderId = "###"; // Please set your folder ID including images.

  const q = [
    `Create each description from each image file within 100 words in the order of given fileData.`,
    `Return the results as an array`,
    `Return only raw Array without a markdown. No markdown format.`,
    `The required properties of each element in the array are as follows`,
    ``,
    `[Properties of each element in the array]`,
    `"name": "Name of file"`,
    `"description": "Created description"`,
    ``,
    `If the requirement information is not found, set "no value".`,
    `Return only raw Array without a markdown. No markdown format. No markdown tags.`,
  ].join("\n");

  const fileIds = [];
  const files = DriveApp.searchFiles(
    `(mimeType = 'image/png' or mimeType = 'image/jpeg') and trashed = false and '${folderId}' in parents`
  );
  while (files.hasNext()) {
    fileIds.push(files.next().getId());
  }
  if (fileIds.length == 0) return;
  const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
  // const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
  const fileList = await g.setFileIds(fileIds).uploadFiles();
  const res = g
    .withUploadedFilesByGenerateContent(fileList)
    .generateContent({ q });
  // g.deleteFiles(fileList.map(({ name }) => name)); // If you want to delete the uploaded files, please use this.
  console.log(res);
}

When this script is run, the following result is obtained. In this case, the value of name is the file ID.

[
  {
    "name": "###",
    "description": "###"
  },
  ,
  ,
  ,
]

When 20 sample images generated by Gemini are used, the following result is obtained.

When this script is run, 20 images are uploaded and the descriptions of the uploaded 20 images can be obtained by one API call.

As an important point, in my test, when the number of image files is large, it was required to separate the script between the file upload and the content generation. Also, in the case of 50 image files, the descriptions could be correctly created. But, in the case of more than 50 images, there was a case that an error occurred. So, please adjust the number of files to your situation.

Upload invoices of PDF data and parse them

In this sample, multiple invoices of PDF files are uploaded and they are parsed as an object. This sample will be the expanded version of my previous report "Parsing Invoices using Gemini 1.5 API with Google Apps Script".

async function myFunction_parseInvoices() {
  const apiKey = "###"; // Please set your API key.

  // Please set file IDs of PDF files of invoices.
  const fileIds = ["###fileID1###", "###fileID2###"];

  const q = [
    `Create an array including JSON object parsed the following images of the invoices.`,
    `The giving images are the invoices.`,
    `Return an array including JSON object.`,
    `No descriptions and explanations. Return only raw array including JSON objects without markdown. No markdown format.`,
    `The required properties in each JSON object in an array are as follows.`,
    ``,
    `[Properties in JSON object]`,
    `"name": "Name given as 'Filename'"`,
    `"invoiceTitle": "title of invoice"`,
    `"invoiceDate": "date of invoice"`,
    `"invoiceNumber": "number of the invoice"`,
    `"invoiceDestinationName": "Name of destination of invoice"`,
    `"invoiceDestinationAddress": "address of the destination of invoice"`,
    `"totalCost": "total cost of all costs"`,
    `"table": "Table of invoice. This is a 2-dimensional array. Add the first header row to the table in the 2-dimensional array."`,
    ``,
    `[Format of 2-dimensional array of "table"]`,
    `"title or description of item", "number of items", "unit cost", "total cost"`,
    ``,
    `If the requirement information is not found, set "no value".`,
    `Return only raw array including JSON objects without markdown. No markdown format. No markcodn tags.`,
  ].join("\n");

  const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
  // const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
  const fileList = await g.setFileIds(fileIds, true).uploadFiles();
  const res = g
    .withUploadedFilesByGenerateContent(fileList)
    .generateContent({ q });
  // g.deleteFiles(fileList.map(({ name }) => name)); // If you want to delete the uploaded files, please use this.
  console.log(res);
}

As the sample papers, when the following papers are used,

This sample invoice is from Invoice design templates of Microsoft.

This sample invoice is from Invoice design templates of Microsoft.

the following result was obtained by one API call. It is found that the uploaded invoices converted from PDF data to image data can be correctly parsed.

[
  {
    "name": "###fileID1###",
    "invoiceDate": "4/1/2024",
    "totalCost": "$192.50",
    "invoiceNumber": "100",
    "invoiceDestinationAddress": "The Palm Tree Nursery\\n987 6th Ave\\nSanta Fe, NM 11121",
    "invoiceTitle": "Invoice",
    "invoiceDestinationName": "Maria Sullivan",
    "table": [
      [
        "Salesperson",
        "Job",
        "Sales",
        "Description",
        "Unit Price",
        "Line Total"
      ],
      ["Sonu Jain", "", "20.00", "Areca palm", "$2.50", "$50.00"],
      ["", "", "35.00", "Majesty palm", "$3.00", "$105.00"],
      ["", "", "15.00", "Bismarck palm", "$2.50", "$37.50"]
    ]
  },
  {
    "name": "###fileID2###",
    "invoiceDate": "4/5, 2024",
    "invoiceTitle": "INVOICE",
    "invoiceDestinationAddress": "Downtown Pets\\n132 South Street\\nManhattan, NY 15161",
    "totalCost": "$4350",
    "table": [
      ["DESCRIPTION", "HOURS", "RATE", "AMOUNT"],
      ["Pour cement foundation", "4.00", "$150.00", "$600"],
      ["Framing and drywall", "16.00", "$180.00", "$2880"],
      ["Tiling and flooring install", "9.00", "$150.00", "$1350"]
    ],
    "invoiceDestinationName": "Nazar Neill",
    "invoiceNumber": "4/5"
  }
]

Upload papers of PDF data and summarize them

In this sample, multiple papers of PDF data are uploaded, and the summarized texts for each paper are output.

async function myFunction_parsePapers() {
  const apiKey = "###"; // Please set your API key.

  // Please set file IDs of the papers of PDF files.
  const fileIds = ["###fileID1###", "###fileID2###"];

  const q = [
    `Summary the following manuscripts within 500 words.`,
    `Return the results as an array`,
    `Return only raw Array without a markdown. No markdown format.`,
    `The required properties of each element in the array are as follows`,
    ``,
    `[Properties of each element in the array]`,
    `"name": "Name given as 'Filename'"`,
    `"title": "Title of manuscript`,
    `"summary": "Created description"`,
    ``,
    `If the requirement information is not found, set "no value".`,
    `Return only raw Array without a markdown. No markdown format. No markdown tags.`,
  ].join("\n");

  const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
  // const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
  const fileList = await g.setFileIds(fileIds, true).uploadFiles();
  const res = g
    .withUploadedFilesByGenerateContent(fileList)
    .generateContent({ q });
  // g.deleteFiles(fileList.map(({ name }) => name)); // If you want to delete the uploaded files, please use this.
  console.log(res);
}

As the sample papers, when the following papers are used,

the following result was obtained by one API call. It is found that the uploaded papers converted from PDF data to image data can be processed.

[
  {
    "name": "###fileID1###",
    "title": "The Particle Problem in the General Theory of Relativity",
    "summary": "This paper investigates the possibility of a singularity-free solution to the field equations in general relativity. The authors propose a new theoretical approach that eliminates singularities by introducing a new variable into the equations. They explore the implications of this approach for the understanding of particles, suggesting that particles can be represented as \"bridges\" connecting different sheets of spacetime."
  },
  {
    "name": "###fileID2###",
    "title": "Attention Is All You Need",
    "summary": "This paper proposes a novel neural network architecture called the Transformer, which relies entirely on an attention mechanism to draw global dependencies between input and output sequences. The Transformer model achieves state-of-the-art results on machine translation tasks and offers significant advantages in terms of parallelization and computational efficiency compared to recurrent neural networks."
  }
]

IMPORTANT

  • If an error occurs, please try again after several minutes.
  • In generative AI, the output is highly dependent on the input prompt (the question you provide). Therefore, if the generated text doesn't meet your expectations, try reformulating your prompt and try again.
  • On April 26, 2024, the following mimeTypes can be used with generateContent. Ref I believe that this will be expanded in the future update. For example, I believe that PDF data can be directly used with generateContent in the future.
  • Images: image/png,image/jpeg,image/webp,image/heic,image/heif
  • Videos: audio/wav,audio/mp3,audio/aiff,audio/aac,audio/ogg,audio/flac
  • In my test, when the files are uploaded using this script, I confirmed that 100 files can be always uploaded. But, when the number of files is more than 100, an error of Exceeded maximum execution time sometimes occurs. Please be careful about this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment