Skip to content

Instantly share code, notes, and snippets.

@m0o0scar
Created July 26, 2024 10:23
Show Gist options
  • Save m0o0scar/590b8745c88719e03ea24127e66256b1 to your computer and use it in GitHub Desktop.
Save m0o0scar/590b8745c88719e03ea24127e66256b1 to your computer and use it in GitHub Desktop.
ol-th/pdf-img-convert.js. Continue this conversation at https://readfm.vercel.app?gist=590b8745c88719e03ea24127e66256b1

[github] ol-th/pdf-img-convert.js

Source

JavaScript / 297 lines of code. Simple node package to convert a PDF into images.

URL: https://github.com/ol-th/pdf-img-convert.js

Conversation

β–² Explain the major function of the project and how it works by referencing the most important and relevant code snippets.

The project, pdf-img-convert.js, is a pure JavaScript library designed to convert PDF files into image buffers without relying on filesystem operations or external binaries. It leverages Mozilla's PDF.js library for PDF parsing and rendering.

The core functionality is encapsulated in the convert function, which takes a PDF file as input and returns an array of image buffers (either as Uint8Array or base64-encoded strings).

Here's a breakdown of how it works:

  1. Input Handling: The convert function accepts various input formats for the PDF:

    • URL: If the input is a URL, it fetches the PDF data using fetch.
    • Local Path: If the input is a local file path, it reads the file using fs.readFile.
    • Buffer: If the input is a Buffer object, it converts it to a Uint8Array.
    • Uint8Array: If the input is already a Uint8Array, it's used directly.
    • Base64-encoded String: If the input is a base64-encoded string, it decodes it into a Uint8Array.
    let pdfData = pdf;
    
    if (typeof pdf === 'string') {
      // ...
    } else if (Buffer.isBuffer(pdf)) {
      pdfData = new Uint8Array(pdf);
    } else if (!pdf instanceof Uint8Array) {
      return pdf;
    }
  2. PDF Parsing: The pdfjs.getDocument function from PDF.js is used to parse the PDF data into a pdfDocument object.

    var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0});
    var pdfDocument = await loadingTask.promise;
  3. Page Rendering: The doc_render function iterates through each page of the PDF (either all pages or a specific set of pages specified in the conversion_config). For each page, it:

    • Gets the page object using pdfDocument.getPage.
    • Creates a viewport based on the desired scale (either default or specified in the conversion_config).
    • Renders the page onto a Canvas object using page.render.
    • Converts the Canvas to an image buffer using canvas.toBuffer.
    let page = await pdfDocument.getPage(pageNo);
    let viewport = page.getViewport({ scale: outputScale });
    let renderTask = await page.render(renderContext).promise;
    let image = canvasAndContext.canvas.toBuffer();
  4. Output Handling: The convert function returns an array of image buffers, either as Uint8Array or base64-encoded strings, depending on the base64 flag in the conversion_config.

    if (conversion_config.base64)
      outputPages.push(currentPage.toString('base64'));
    else
      outputPages.push(new Uint8Array(currentPage));

In summary, pdf-img-convert.js provides a convenient way to convert PDF files into images using pure JavaScript, without relying on external dependencies or filesystem operations. It leverages the capabilities of PDF.js for PDF parsing and rendering, and offers flexibility in terms of output format and page selection.

{
"url": "https://github.com/ol-th/pdf-img-convert.js",
"type": "github",
"title": "ol-th/pdf-img-convert.js",
"picture": "https://avatars.githubusercontent.com/u/30042235?v=4",
"description": "JavaScript / 297 lines of code.\nSimple node package to convert a PDF into images."
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment