Skip to content

Instantly share code, notes, and snippets.

@tanaikech
Last active March 30, 2024 02:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tanaikech/0ad317503e3a066d715430e7e875abfb to your computer and use it in GitHub Desktop.
Save tanaikech/0ad317503e3a066d715430e7e875abfb to your computer and use it in GitHub Desktop.
Generating Texts using Files Uploaded by Gemini 1.5 API

Generating Texts using Files Uploaded by Gemini 1.5 API

Abstract

The Gemini API allows the generating of text from uploaded files using Google Apps Script. It expands the potential of various scripting languages for diverse applications.

Introduction

With the release of the LLM model Gemini as an API on Vertex AI and Google AI Studio, a world of possibilities has opened up. Ref The Gemini API significantly expands the potential of various scripting languages and paves the way for diverse applications. Also, recently, Gemini 1.5 in AI Studio has been released. Ref In the near future, Gemini 1.5 API will be also released soon.

Recently, the files got to be able to be uploaded with Gemini API. Ref1 and Ref2 When this is used, the text is generated using the uploaded files. This report introduces the sample scripts for uploading the files and generating the texts using Google Apps Script.

Usage

In order to test this script, please do the following steps.

1. Create an API key

Please access https://makersuite.google.com/app/apikey and create your API key. At that time, please enable Generative Language API at the API console. This API key is used for this sample script.

This official document can be also seen. Ref.

2. Create a Google Apps Script project

In this report, Google Apps Script is used. Of course, the method introducing this report can be also used in other languages.

Please create a standalone Google Apps Script project. Of course, this script can be also used with the container-bound script.

And, please open the script editor of the Google Apps Script project.

3. Steps of script

Here, it introduces the following 4 sample scripts.

  1. Upload a file with "Method: files.list".
  2. Confirm the uploaded file with "Method: media.upload".
  3. Generate content using the uploaded file with "Method: models.generateContent".
  4. Delete the uploaded file with "Method: files.delete".

Limitation of the uploaded file Ref

Can only be used with model.generateContent or model.streamGenerateContent Automatic file deletion after 2 days Maximum 2GB per file, 20GB limit per project No downloads allowed

4. Script

Upload a file

You can see the official document at Method: media.upload. The sample script of Google Apps Script is as follows.

Please set your API key and the file ID of the image file. Here, PNG image is used.

function sample1() {
  const apiKey = "###"; // Please set your API key.
  const fileId = "###"; // Please set the file ID of the image file. Here, PNG image is used.

  const url = `https://generativelanguage.googleapis.com/upload/v1beta/files?uploadType=multipart&key=${apiKey}`;
  const metadata = {
    file: { displayName: DriveApp.getFileById(fileId).getName() },
  };
  const payload = {
    metadata: Utilities.newBlob(JSON.stringify(metadata), "application/json"),
    file: UrlFetchApp.fetch(
      `https://drive.google.com/thumbnail?sz=w1000&id=${fileId}`,
      { headers: { authorization: "Bearer " + ScriptApp.getOAuthToken() } }
    ).getBlob(),
  };
  const options = {
    method: "post",
    payload: payload,
    muteHttpExceptions: true,
  };
  const res = UrlFetchApp.fetch(url, options).getContentText();
  console.log(res);
}

When this script is run, the following value is returned.

{
  "file": {
    "name": "files/###",
    "displayName": "###",
    "mimeType": "image/jpeg",
    "sizeBytes": "123456",
    "createTime": "2024-03-30T01:23:00.000000Z",
    "updateTime": "2024-03-30T01:23:00.000000Z",
    "expirationTime": "2024-04-01T01:23:00.000000Z",
    "sha256Hash": "###",
    "uri": "https://generativelanguage.googleapis.com/v1beta/files/###"
  }
}

The values of mimeType and uri are used with generateContent.

In this sample, I used uploadType=multipart because of the small size of the image file. If you want to upload a large file, I think that resumable upload can be also used.

Get the file list

You can see the official document at Method: files.list. The sample script of Google Apps Script is as follows.

Please set your API key.

function sample2() {
  const apiKey = "###"; // Please set your API key.

  const url = `https://generativelanguage.googleapis.com/v1beta/files?pageSize=100&key=${apiKey}`;
  const res = UrlFetchApp.fetch(url);
  console.log(res.getContentText());
}

When this script is run, the following value is returned.

{
  "files": [
    {
      "name": "files/###",
      "displayName": "###",
      "mimeType": "image/jpeg",
      "sizeBytes": "123456",
      "createTime": "2024-03-30T01:23:00.000000Z",
      "updateTime": "2024-03-30T01:23:00.000000Z",
      "expirationTime": "2024-04-01T01:23:00.000000Z",
      "sha256Hash": "###",
      "uri": "https://generativelanguage.googleapis.com/v1beta/files/###"
    },
    ,
    ,
    ,
  ]
}

When the number of files is more than 100, please retrieve all files using pageToken.

Generate content from the uploaded file

You can see the official document at Method: models.generateContent. The sample script of Google Apps Script is as follows.

Please set your API key, the URI of the uploaded file, and the mimeType of the file.

function sample3() {
  const apiKey = "###"; // Please set your API key.
  const fileUri = "https://generativelanguage.googleapis.com/v1beta/files/###"; // Please set your file uri of the uploaded file.
  const mimeType = "image/jpeg"; // Please set the mimeType of the uploaded file.

  const q = "Describe the image and count apples in the image.";
  const model = "models/gemini-1.5-pro-gf-fc";
  const baseUrl = `https://generativelanguage.googleapis.com/v1beta/${model}`;
  const payload = {
    contents: [{ parts: [{ text: q }, { fileData: { fileUri, mimeType } }] }],
  };
  const options = {
    payload: JSON.stringify(payload),
    contentType: "application/json",
    muteHttpExceptions: true,
  };
  const res = UrlFetchApp.fetch(
    `${baseUrl}:generateContent?key=${apiKey}`,
    options
  );
  console.log(res.getContentText());
}

In this sample, the following image created by Gemini was uploaded as a sample file and was used.

There are 12 apples including 7 red apples, 4 green apples, and 1 yellow apple are shown in the image.

When this script is run, the following generated contents are returned.

  • At the model models/gemini-1.0-pro-latest, Image input modality is not enabled for models/gemini-1.0-pro-latest was returned.
  • At the model models/gemini-1.0-pro-vision-latest, There are 10 apples in the image. Four red, five green, and one yellow. was returned.
  • At the model models/gemini-1.5-pro-latest, The image shows a group of apples on a wooden table. There are red, green, and yellow apples. There are 14 apples in total. was returned.
  • At the model models/gemini-1.5-pro-gf-fc, The image shows a group of apples on a wooden table. There are 13 apples in total. The apples are of different colors, including red, green, and yellow. The apples are arranged in a random pattern on the table. The light is coming from the left side of the image, and it is casting shadows on the apples and the table. was returned.

Delete the uploaded file

You can see the official document at Method: files.delete. The sample script of Google Apps Script is as follows.

Please set your API key and the name of the uploaded file.

function sample4() {
  const apiKey = "###"; // Please set your API key.
  const name = "files/###"; // Please set the name of the uploaded file.

  const url = `https://generativelanguage.googleapis.com/v1beta/${name}?key=${apiKey}`;
  const res = UrlFetchApp.fetch(url, { method: "delete" });
  console.log(res.getContentText()); // {}
}

In this case, an empty object like {} is returned.

In the current stage, the expiration time of the uploaded file is 2 days. So, the uploaded file is automatically deleted 2 days later.

Summary

In this report, we present sample scripts for using the Gemini API's generateContent function with uploaded files. Our findings are as follows:

  • Uploading files, retrieving file lists, and deleting files all functioned smoothly using an API key. Also, I heard that at Google APIs, both "snake_case" and "camelCase" within the request body. This was confirmed through testing.
  • For generating content from uploaded image files, Gemini 1.5 API models models/gemini-1.5-pro-latest and models/gemini-1.5-pro-gf-fc can be used for image analysis. However, accurately counting objects within the image might still be challenging.
  • Currently, uploading text, CSV, and PDF files results in an error message like "Request contains an invalid argument." It appears that only image and movie files are supported at this stage. We anticipate this limitation to be addressed in a future update.

Note

  • The top illustration was created by Gemini with giving the abstract.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment