Skip to content

Instantly share code, notes, and snippets.

@simonw
Last active July 9, 2024 16:29
Show Gist options
  • Save simonw/9d7cbe02d448812f48070e7de13a5ae5 to your computer and use it in GitHub Desktop.
Save simonw/9d7cbe02d448812f48070e7de13a5ae5 to your computer and use it in GitHub Desktop.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>PDF Comparison Tool</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.4.120/pdf.min.js"></script>
<style>
html { font-family: verdana; }
#dropZone {
border: 2px dashed #ccc;
border-radius: 20px;
width: 480px;
padding: 20px;
text-align: center;
margin: 20px auto;
}
#dropZone.dragover {
background-color: #e1f5fe;
border-color: #03a9f4;
}
.page-container {
display: flex;
margin-bottom: 20px;
}
.page-container > div {
margin-right: 10px;
}
#error {
color: red;
text-align: center;
}
</style>
</head>
<body>
<div id="dropZone">
<p>Drag and drop two PDF files here, or click to select files</p>
<input type="file" id="fileInput" multiple accept=".pdf" style="display: none;">
</div>
<div id="error"></div>
<div id="pdfComparison"></div>
<script>
// Initialize PDF.js worker
pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.4.120/pdf.worker.min.js';
const dropZone = document.getElementById('dropZone');
const fileInput = document.getElementById('fileInput');
const errorDiv = document.getElementById('error');
const pdfComparison = document.getElementById('pdfComparison');
let pdf1, pdf2;
dropZone.addEventListener('click', () => fileInput.click());
dropZone.addEventListener('dragover', (e) => {
e.preventDefault();
dropZone.classList.add('dragover');
});
dropZone.addEventListener('dragleave', () => {
dropZone.classList.remove('dragover');
});
dropZone.addEventListener('drop', handleFiles);
fileInput.addEventListener('change', handleFiles);
function handleFiles(e) {
e.preventDefault();
dropZone.classList.remove('dragover');
const files = e.dataTransfer ? e.dataTransfer.files : e.target.files;
if (files.length === 2) {
pdf1 = files[0];
pdf2 = files[1];
errorDiv.textContent = '';
renderPDFs();
} else {
errorDiv.textContent = 'Please select exactly two PDF files.';
}
}
async function renderPDFs() {
try {
const [doc1, doc2] = await Promise.all([
pdfjsLib.getDocument(URL.createObjectURL(pdf1)).promise,
pdfjsLib.getDocument(URL.createObjectURL(pdf2)).promise
]);
const pageCount = Math.max(doc1.numPages, doc2.numPages);
pdfComparison.innerHTML = '';
for (let i = 1; i <= pageCount; i++) {
const [page1, page2] = await Promise.all([
i <= doc1.numPages ? doc1.getPage(i) : null,
i <= doc2.numPages ? doc2.getPage(i) : null
]);
const [canvas1, canvas2] = await Promise.all([
page1 ? renderPageToCanvas(page1) : createEmptyCanvas(),
page2 ? renderPageToCanvas(page2) : createEmptyCanvas()
]);
const diffCanvas = compareCanvases(canvas1, canvas2);
displayPage(i, canvas1, canvas2, diffCanvas);
}
} catch (error) {
console.error('Error rendering PDFs:', error);
errorDiv.textContent = 'Error rendering PDFs. Please try again with valid PDF files.';
}
}
async function renderPageToCanvas(page) {
const scale = 1.5;
const viewport = page.getViewport({ scale });
const canvas = document.createElement('canvas');
const context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
await page.render({ canvasContext: context, viewport }).promise;
return canvas;
}
function createEmptyCanvas() {
const canvas = document.createElement('canvas');
canvas.width = 1;
canvas.height = 1;
return canvas;
}
function compareCanvases(canvas1, canvas2) {
const width = Math.max(canvas1.width, canvas2.width);
const height = Math.max(canvas1.height, canvas2.height);
const diffCanvas = document.createElement('canvas');
diffCanvas.width = width;
diffCanvas.height = height;
const ctx = diffCanvas.getContext('2d');
const img1 = canvas1.getContext('2d').getImageData(0, 0, canvas1.width, canvas1.height);
const img2 = canvas2.getContext('2d').getImageData(0, 0, canvas2.width, canvas2.height);
const diff = ctx.createImageData(width, height);
for (let y = 0; y < height; y++) {
for (let x = 0; x < width; x++) {
const i = (y * width + x) * 4;
const r1 = img1.data[i] || 255;
const g1 = img1.data[i + 1] || 255;
const b1 = img1.data[i + 2] || 255;
const r2 = img2.data[i] || 255;
const g2 = img2.data[i + 1] || 255;
const b2 = img2.data[i + 2] || 255;
if (r1 !== r2 || g1 !== g2 || b1 !== b2) {
diff.data[i] = 255;
diff.data[i + 1] = 0;
diff.data[i + 2] = 0;
diff.data[i + 3] = 128;
}
}
}
ctx.putImageData(diff, 0, 0);
return diffCanvas;
}
function displayPage(pageNumber, canvas1, canvas2, diffCanvas) {
const container = document.createElement('div');
container.className = 'page-container';
const addCanvas = (canvas, title) => {
const div = document.createElement('div');
const h3 = document.createElement('h3');
h3.textContent = title;
div.appendChild(h3);
div.appendChild(canvas);
container.appendChild(div);
};
addCanvas(canvas1, `PDF 1 - Page ${pageNumber}`);
addCanvas(canvas2, `PDF 2 - Page ${pageNumber}`);
addCanvas(diffCanvas, `Differences - Page ${pageNumber}`);
pdfComparison.appendChild(container);
}
</script>
<h3>Claude 3.5 Sonnet prompts used:</h3>
<ol>
<li><code>Build a tool where I can drag and drop on two PDF files and it uses PDF.js to turn each of their pages into canvas elements and then displays those pages side by side with a third image that highlights any differences between them, if any differences exist</code></li>
<li><code>rewrite that code to not use React at all</code></li>
</ol>
</body>
</html>
@simonw
Copy link
Author

simonw commented Jul 2, 2024

Screenshot:

Screenshot of the tool in action. It shows pages in three columns. The left most column is pages from PDF 1. In the middle column are pages from PDF 2 - in this case I've edited it and added a rough red circle around some text. The third column shows just the difference - here it is showing that red circle that I drew.

@simonw
Copy link
Author

simonw commented Jul 2, 2024

Full transcript from Claude:


Build a tool where I can drag and drop on two PDF files and it uses PDF.js to turn each of their pages into canvas elements and then displays those pages side by side with a third image that highlights any differences between them, if any differences exist

Certainly! I can help you create a React component that implements this PDF comparison tool. This tool will allow users to drag and drop two PDF files, render their pages using PDF.js, and display them side by side along with a third image highlighting the differences.

import React, { useState, useEffect, useCallback } from 'react';
import { useDropzone } from 'react-dropzone';
import * as pdfjs from 'pdfjs-dist';
import { AlertCircle } from 'lucide-react';
import { Alert, AlertDescription, AlertTitle } from '@/components/ui/alert';

// Initialize PDF.js worker
pdfjs.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjs.version}/pdf.worker.min.js`;

const PDFComparison = () => {
  const [pdf1, setPdf1] = useState(null);
  const [pdf2, setPdf2] = useState(null);
  const [pages, setPages] = useState([]);
  const [error, setError] = useState(null);

  const onDrop = useCallback((acceptedFiles) => {
    if (acceptedFiles.length === 2) {
      setPdf1(acceptedFiles[0]);
      setPdf2(acceptedFiles[1]);
      setError(null);
    } else {
      setError('Please drop exactly two PDF files.');
    }
  }, []);

  const { getRootProps, getInputProps, isDragActive } = useDropzone({
    onDrop,
    accept: { 'application/pdf': ['.pdf'] },
    multiple: true,
    maxFiles: 2,
  });

  useEffect(() => {
    const renderPDFs = async () => {
      if (!pdf1 || !pdf2) return;

      try {
        const [doc1, doc2] = await Promise.all([
          pdfjs.getDocument(URL.createObjectURL(pdf1)).promise,
          pdfjs.getDocument(URL.createObjectURL(pdf2)).promise,
        ]);

        const pageCount = Math.max(doc1.numPages, doc2.numPages);
        const newPages = [];

        for (let i = 1; i <= pageCount; i++) {
          const [page1, page2] = await Promise.all([
            i <= doc1.numPages ? doc1.getPage(i) : null,
            i <= doc2.numPages ? doc2.getPage(i) : null,
          ]);

          const [canvas1, canvas2] = await Promise.all([
            page1 ? renderPageToCanvas(page1) : createEmptyCanvas(),
            page2 ? renderPageToCanvas(page2) : createEmptyCanvas(),
          ]);

          const diffCanvas = compareCanvases(canvas1, canvas2);

          newPages.push({ canvas1, canvas2, diffCanvas });
        }

        setPages(newPages);
      } catch (error) {
        console.error('Error rendering PDFs:', error);
        setError('Error rendering PDFs. Please try again with valid PDF files.');
      }
    };

    renderPDFs();
  }, [pdf1, pdf2]);

  const renderPageToCanvas = async (page) => {
    const scale = 1.5;
    const viewport = page.getViewport({ scale });
    const canvas = document.createElement('canvas');
    const context = canvas.getContext('2d');
    canvas.height = viewport.height;
    canvas.width = viewport.width;

    await page.render({ canvasContext: context, viewport }).promise;
    return canvas;
  };

  const createEmptyCanvas = () => {
    const canvas = document.createElement('canvas');
    canvas.width = 1;
    canvas.height = 1;
    return canvas;
  };

  const compareCanvases = (canvas1, canvas2) => {
    const width = Math.max(canvas1.width, canvas2.width);
    const height = Math.max(canvas1.height, canvas2.height);

    const diffCanvas = document.createElement('canvas');
    diffCanvas.width = width;
    diffCanvas.height = height;
    const ctx = diffCanvas.getContext('2d');

    const img1 = ctx.getImageData(0, 0, canvas1.width, canvas1.height);
    const img2 = ctx.getImageData(0, 0, canvas2.width, canvas2.height);

    const diff = ctx.createImageData(width, height);

    for (let y = 0; y < height; y++) {
      for (let x = 0; x < width; x++) {
        const i = (y * width + x) * 4;
        const r1 = img1.data[i] || 255;
        const g1 = img1.data[i + 1] || 255;
        const b1 = img1.data[i + 2] || 255;
        const r2 = img2.data[i] || 255;
        const g2 = img2.data[i + 1] || 255;
        const b2 = img2.data[i + 2] || 255;

        if (r1 !== r2 || g1 !== g2 || b1 !== b2) {
          diff.data[i] = 255;
          diff.data[i + 1] = 0;
          diff.data[i + 2] = 0;
          diff.data[i + 3] = 128;
        }
      }
    }

    ctx.putImageData(diff, 0, 0);
    return diffCanvas;
  };

  return (
    <div className="p-4">
      <div
        {...getRootProps()}
        className={`border-2 border-dashed rounded-lg p-4 mb-4 text-center ${
          isDragActive ? 'border-blue-500 bg-blue-50' : 'border-gray-300'
        }`}
      >
        <input {...getInputProps()} />
        <p>{isDragActive ? 'Drop the PDFs here' : 'Drag and drop two PDF files here, or click to select files'}</p>
      </div>

      {error && (
        <Alert variant="destructive" className="mb-4">
          <AlertCircle className="h-4 w-4" />
          <AlertTitle>Error</AlertTitle>
          <AlertDescription>{error}</AlertDescription>
        </Alert>
      )}

      {pages.map((page, index) => (
        <div key={index} className="flex mb-4 space-x-4">
          <div>
            <h3 className="text-lg font-semibold mb-2">PDF 1 - Page {index + 1}</h3>
            <img src={page.canvas1.toDataURL()} alt={`PDF 1 Page ${index + 1}`} className="border" />
          </div>
          <div>
            <h3 className="text-lg font-semibold mb-2">PDF 2 - Page {index + 1}</h3>
            <img src={page.canvas2.toDataURL()} alt={`PDF 2 Page ${index + 1}`} className="border" />
          </div>
          <div>
            <h3 className="text-lg font-semibold mb-2">Differences - Page {index + 1}</h3>
            <img src={page.diffCanvas.toDataURL()} alt={`Differences Page ${index + 1}`} className="border" />
          </div>
        </div>
      ))}
    </div>
  );
};

export default PDFComparison;

I've created a React component that implements the PDF comparison tool you requested. Here's a breakdown of the main features:

  1. Drag and drop functionality for two PDF files using react-dropzone.
  2. PDF rendering using PDF.js library.
  3. Side-by-side display of PDF pages.
  4. Difference highlighting between corresponding pages.

To use this component, you'll need to install the following dependencies:

npm install react-dropzone pdfjs-dist

Also, make sure you have the necessary UI components from the shadcn/ui library installed, specifically the Alert component.

The component works as follows:

  1. Users can drag and drop two PDF files onto the designated area.
  2. The PDFs are rendered using PDF.js, converting each page to a canvas element.
  3. The canvases are displayed side by side for each page.
  4. A third canvas is generated for each page, highlighting the differences between the two PDFs.

Note that this implementation uses a simple pixel-by-pixel comparison for difference detection. For more advanced difference detection, you might want to consider using image processing libraries or more sophisticated algorithms.

Would you like me to explain any part of the code in more detail or make any modifications to the component?

rewrite that code to not use React at all

Certainly! I'll rewrite the PDF comparison tool using vanilla JavaScript without React. This version will still use PDF.js for rendering PDFs and will implement the same functionality.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>PDF Comparison Tool</title>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.4.120/pdf.min.js"></script>
    <style>
        #dropZone {
            border: 2px dashed #ccc;
            border-radius: 20px;
            width: 480px;
            padding: 20px;
            text-align: center;
            margin: 20px auto;
        }
        #dropZone.dragover {
            background-color: #e1f5fe;
            border-color: #03a9f4;
        }
        .page-container {
            display: flex;
            margin-bottom: 20px;
        }
        .page-container > div {
            margin-right: 10px;
        }
        #error {
            color: red;
            text-align: center;
        }
    </style>
</head>
<body>
    <div id="dropZone">
        <p>Drag and drop two PDF files here, or click to select files</p>
        <input type="file" id="fileInput" multiple accept=".pdf" style="display: none;">
    </div>
    <div id="error"></div>
    <div id="pdfComparison"></div>

    <script>
        // Initialize PDF.js worker
        pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.4.120/pdf.worker.min.js';

        const dropZone = document.getElementById('dropZone');
        const fileInput = document.getElementById('fileInput');
        const errorDiv = document.getElementById('error');
        const pdfComparison = document.getElementById('pdfComparison');

        let pdf1, pdf2;

        dropZone.addEventListener('click', () => fileInput.click());

        dropZone.addEventListener('dragover', (e) => {
            e.preventDefault();
            dropZone.classList.add('dragover');
        });

        dropZone.addEventListener('dragleave', () => {
            dropZone.classList.remove('dragover');
        });

        dropZone.addEventListener('drop', handleFiles);
        fileInput.addEventListener('change', handleFiles);

        function handleFiles(e) {
            e.preventDefault();
            dropZone.classList.remove('dragover');
            const files = e.dataTransfer ? e.dataTransfer.files : e.target.files;

            if (files.length === 2) {
                pdf1 = files[0];
                pdf2 = files[1];
                errorDiv.textContent = '';
                renderPDFs();
            } else {
                errorDiv.textContent = 'Please select exactly two PDF files.';
            }
        }

        async function renderPDFs() {
            try {
                const [doc1, doc2] = await Promise.all([
                    pdfjsLib.getDocument(URL.createObjectURL(pdf1)).promise,
                    pdfjsLib.getDocument(URL.createObjectURL(pdf2)).promise
                ]);

                const pageCount = Math.max(doc1.numPages, doc2.numPages);
                pdfComparison.innerHTML = '';

                for (let i = 1; i <= pageCount; i++) {
                    const [page1, page2] = await Promise.all([
                        i <= doc1.numPages ? doc1.getPage(i) : null,
                        i <= doc2.numPages ? doc2.getPage(i) : null
                    ]);

                    const [canvas1, canvas2] = await Promise.all([
                        page1 ? renderPageToCanvas(page1) : createEmptyCanvas(),
                        page2 ? renderPageToCanvas(page2) : createEmptyCanvas()
                    ]);

                    const diffCanvas = compareCanvases(canvas1, canvas2);

                    displayPage(i, canvas1, canvas2, diffCanvas);
                }
            } catch (error) {
                console.error('Error rendering PDFs:', error);
                errorDiv.textContent = 'Error rendering PDFs. Please try again with valid PDF files.';
            }
        }

        async function renderPageToCanvas(page) {
            const scale = 1.5;
            const viewport = page.getViewport({ scale });
            const canvas = document.createElement('canvas');
            const context = canvas.getContext('2d');
            canvas.height = viewport.height;
            canvas.width = viewport.width;

            await page.render({ canvasContext: context, viewport }).promise;
            return canvas;
        }

        function createEmptyCanvas() {
            const canvas = document.createElement('canvas');
            canvas.width = 1;
            canvas.height = 1;
            return canvas;
        }

        function compareCanvases(canvas1, canvas2) {
            const width = Math.max(canvas1.width, canvas2.width);
            const height = Math.max(canvas1.height, canvas2.height);

            const diffCanvas = document.createElement('canvas');
            diffCanvas.width = width;
            diffCanvas.height = height;
            const ctx = diffCanvas.getContext('2d');

            const img1 = canvas1.getContext('2d').getImageData(0, 0, canvas1.width, canvas1.height);
            const img2 = canvas2.getContext('2d').getImageData(0, 0, canvas2.width, canvas2.height);

            const diff = ctx.createImageData(width, height);

            for (let y = 0; y < height; y++) {
                for (let x = 0; x < width; x++) {
                    const i = (y * width + x) * 4;
                    const r1 = img1.data[i] || 255;
                    const g1 = img1.data[i + 1] || 255;
                    const b1 = img1.data[i + 2] || 255;
                    const r2 = img2.data[i] || 255;
                    const g2 = img2.data[i + 1] || 255;
                    const b2 = img2.data[i + 2] || 255;

                    if (r1 !== r2 || g1 !== g2 || b1 !== b2) {
                        diff.data[i] = 255;
                        diff.data[i + 1] = 0;
                        diff.data[i + 2] = 0;
                        diff.data[i + 3] = 128;
                    }
                }
            }

            ctx.putImageData(diff, 0, 0);
            return diffCanvas;
        }

        function displayPage(pageNumber, canvas1, canvas2, diffCanvas) {
            const container = document.createElement('div');
            container.className = 'page-container';

            const addCanvas = (canvas, title) => {
                const div = document.createElement('div');
                const h3 = document.createElement('h3');
                h3.textContent = title;
                div.appendChild(h3);
                div.appendChild(canvas);
                container.appendChild(div);
            };

            addCanvas(canvas1, `PDF 1 - Page ${pageNumber}`);
            addCanvas(canvas2, `PDF 2 - Page ${pageNumber}`);
            addCanvas(diffCanvas, `Differences - Page ${pageNumber}`);

            pdfComparison.appendChild(container);
        }
    </script>
</body>
</html>

I've rewritten the PDF comparison tool using vanilla JavaScript, HTML, and CSS. This version maintains the same functionality as the React version but doesn't require any additional JavaScript frameworks or libraries except for PDF.js.

Here's a breakdown of the changes and how to use this version:

  1. The entire application is contained in a single HTML file, including the JavaScript code and CSS styles.

  2. It uses the same drag-and-drop functionality, but implemented with native JavaScript event listeners.

  3. PDF.js is loaded from a CDN, so you don't need to install any dependencies.

  4. The rendering logic remains similar, but it's now using vanilla JavaScript DOM manipulation instead of React components.

To use this tool:

  1. Save the code as an HTML file (e.g., pdf-comparison-tool.html).
  2. Open the file in a modern web browser.
  3. Drag and drop two PDF files onto the designated area, or click to select files.

The tool will then render the PDFs side by side and show the differences, just like the React version.

Note that this version doesn't have as sophisticated error handling or styling as the React version, but it demonstrates the core functionality without any framework dependencies.

Would you like me to explain any part of the code in more detail or make any modifications to this vanilla JavaScript version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment