Skip to content

Instantly share code, notes, and snippets.

@simonw
Created August 26, 2024 15:33
Show Gist options
  • Save simonw/0a8b8e76cd73b1bfb72daa716641a57e to your computer and use it in GitHub Desktop.
Save simonw/0a8b8e76cd73b1bfb72daa716641a57e to your computer and use it in GitHub Desktop.

Resize and Compress JPEG Before Gemini API

human (Aug 26, 2024, 08:22 AM)

This function uses canvas to reduce the width and size of a JPEG:

        function processImage(img) {
            output.innerHTML = '';
            const qualities = [1, 0.9, 0.7, 0.5, 0.3];
            const widths = [img.width, img.width / 2];

            widths.forEach(width => {
                const heightRatio = width / img.width;
                const height = img.height * heightRatio;

                qualities.forEach(quality => {
                    const canvas = document.createElement('canvas');
                    canvas.width = width;
                    canvas.height = height;
                    const ctx = canvas.getContext('2d');
                    ctx.drawImage(img, 0, 0, width, height);

                    canvas.toBlob(blob => {
                        const url = URL.createObjectURL(blob);
                        const container = document.createElement('div');
                        container.className = 'image-container';
                        const resultImg = document.createElement('img');
                        resultImg.src = url;
                        resultImg.addEventListener('click', () => {
                            resultImg.classList.toggle('full-width');
                        });

                        const infoDiv = document.createElement('div');
                        infoDiv.className = 'image-info';
                        infoDiv.innerHTML = `
                            Width: ${width}px<br>
                            Quality: ${quality.toFixed(1)}<br>
                            Size: ${(blob.size / 1024).toFixed(2)} KB
                        `;

                        const downloadLink = document.createElement('a');
                        downloadLink.href = url;
                        downloadLink.download = `image_w${width}_q${quality.toFixed(1)}.jpg`;
                        downloadLink.textContent = 'Download';

                        container.appendChild(resultImg);
                        container.appendChild(infoDiv);
                        container.appendChild(downloadLink);
                        output.appendChild(container);
                    }, 'image/jpeg', quality);
                });
            });
        }

Inspired by that code, modify the attached tool such that it first creates a max-1000px-wide 0.7 quality of the JPEG before sending that to the Gemini API

assistant (Aug 26, 2024, 08:22 AM)

Certainly, I can help you modify the attached tool to create a resized and compressed version of the uploaded image before sending it to the Gemini API. This will help reduce the size of the data being sent and potentially improve processing speed. Here's how we can modify the code:

Optimized Gemini API Image Processor

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Optimized Gemini API Image Processor with Bounding Box Visualization</title>
    <script type="module">
        import { GoogleGenerativeAI } from "https://esm.run/@google/generative-ai";
        import { marked } from "https://esm.run/marked";

        function getApiKey() {
          let apiKey = localStorage.getItem("GEMINI_API_KEY");
          if (!apiKey) {
            apiKey = prompt("Please enter your Gemini API key:");
            if (apiKey) {
              localStorage.setItem("GEMINI_API_KEY", apiKey);
            }
          }
          return apiKey;
        }

        async function getGenerativeModel(params) {
            const API_KEY = getApiKey();
            const genAI = new GoogleGenerativeAI(API_KEY);
            return genAI.getGenerativeModel(params);
        }

        async function fileToGenerativePart(file) {
            return new Promise((resolve) => {
                const reader = new FileReader();
                reader.onloadend = () => resolve({
                    inlineData: { 
                        data: reader.result.split(",")[1],
                        mimeType: file.type 
                    }
                });
                reader.readAsDataURL(file);
            });
        }

        function resizeAndCompressImage(file) {
            return new Promise((resolve) => {
                const reader = new FileReader();
                reader.onload = function(event) {
                    const img = new Image();
                    img.onload = function() {
                        const canvas = document.createElement('canvas');
                        const ctx = canvas.getContext('2d');

                        let width = img.width;
                        let height = img.height;

                        if (width > 1000) {
                            height = Math.round((height * 1000) / width);
                            width = 1000;
                        }

                        canvas.width = width;
                        canvas.height = height;

                        ctx.drawImage(img, 0, 0, width, height);

                        canvas.toBlob((blob) => {
                            resolve(new File([blob], "compressed_image.jpg", { type: "image/jpeg" }));
                        }, 'image/jpeg', 0.7);
                    };
                    img.src = event.target.result;
                };
                reader.readAsDataURL(file);
            });
        }

        async function processImageAndPrompt() {
            const fileInput = document.getElementById('imageInput');
            const promptInput = document.getElementById('promptInput');
            const resultDiv = document.getElementById('result');
            
            if (!fileInput.files[0] || !promptInput.value) {
                alert('Please select an image and enter a prompt.');
                return;
            }

            resultDiv.innerHTML = 'Processing...';
            
            try {
                const model = await getGenerativeModel({ model: "gemini-1.5-pro" });
                const compressedImage = await resizeAndCompressImage(fileInput.files[0]);
                const imagePart = await fileToGenerativePart(compressedImage);
                
                const result = await model.generateContent([promptInput.value, imagePart]);
                const response = await result.response;
                const text = response.text();
                
                resultDiv.innerHTML = marked.parse(text);

                // Extract coordinates from the response
                const coordinates = extractCoordinates(text);
                if (coordinates.length > 0) {
                    displayImageWithBoundingBoxes(compressedImage, coordinates);
                }
            } catch (error) {
                resultDiv.innerHTML = `Error: ${error.message}`;
            }
        }

        function extractCoordinates(text) {
            const regex = /\[\s*\d+\s*,\s*\d+\s*,\s*\d+\s*,\s*\d+\s*\]/g;
            const matches = text.match(regex) || [];
            return matches.map(JSON.parse);
        }

        function displayImageWithBoundingBoxes(file, coordinates) {
            const reader = new FileReader();
            reader.onload = function(event) {
                const image = new Image();
                image.onload = function() {
                    const canvas = document.getElementById('canvas');
                    canvas.width = image.width;
                    canvas.height = image.height;
                    const ctx = canvas.getContext('2d');
                    ctx.drawImage(image, 0, 0);

                    const colors = ['#FF0000', '#00FF00', '#0000FF', '#FFFF00', '#FF00FF', '#00FFFF'];
                    coordinates.forEach((box, index) => {
                        const [ymin, xmin, ymax, xmax] = box.map(coord => (1000 - coord) / 1000);
                        const width = (xmax - xmin) * image.width;
                        const height = (ymax - ymin) * image.height;

                        ctx.strokeStyle = colors[index % colors.length];
                        ctx.lineWidth = 5;
                        ctx.strokeRect(xmin * image.width, ymin * image.height, width, height);
                    });
                };
                image.src = event.target.result;
            };
            reader.readAsDataURL(file);
        }

        // Attach event listener to the submit button
        document.getElementById('submitBtn').addEventListener('click', processImageAndPrompt);
    </script>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
        }
        textarea {
            width: 100%;
            height: 100px;
        }
        #result, #imageContainer {
            margin-top: 20px;
            border: 1px solid #ccc;
            padding: 10px;
        }
        #canvas {
            max-width: 100%;
            height: auto;
        }
    </style>
</head>
<body>
    <h1>Optimized Gemini API Image Processor with Bounding Box Visualization</h1>
    <input type="file" id="imageInput" accept="image/*">
    <textarea id="promptInput">Return bounding boxes as [ymin, xmin, ymax, xmax]
</textarea>
    <button id="submitBtn">Process</button>
    <div id="result"></div>
    <div id="imageContainer">
        <canvas id="canvas"></canvas>
    </div>
</body>
</html>

I've made the following changes to optimize the image processing:

  1. Added a new function resizeAndCompressImage that resizes the image to a maximum width of 1000 pixels (maintaining aspect ratio) and compresses it with a quality of 0.7.

  2. Modified the processImageAndPrompt function to use the new resizeAndCompressImage function before sending the image to the Gemini API.

  3. Updated the displayImageWithBoundingBoxes function to use the compressed image for display.

  4. Changed the title to reflect the optimization.

These modifications will ensure that the image is resized and compressed before being sent to the Gemini API, which should improve processing speed and reduce data usage. The bounding box visualization will still work correctly with the resized image.

To use this updated version:

  1. Replace the contents of your HTML file with the code provided in the artifact.
  2. Open the HTML file in a web browser.
  3. Upload an image, enter a prompt, and click "Process" to see the results.

The tool will now automatically resize and compress the image before sending it to the Gemini API, while still maintaining the functionality of displaying bounding boxes on the processed image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment