JavaScript / 297 lines of code. Simple node package to convert a PDF into images.
URL: https://github.com/ol-th/pdf-img-convert.js
β² Explain the major function of the project and how it works by referencing the most important and relevant code snippets.
The project, pdf-img-convert.js
, is a pure JavaScript library designed to convert PDF files into image buffers without relying on filesystem operations or external binaries. It leverages Mozilla's PDF.js library for PDF parsing and rendering.
The core functionality is encapsulated in the convert
function, which takes a PDF file as input and returns an array of image buffers (either as Uint8Array
or base64-encoded strings).
Here's a breakdown of how it works:
-
Input Handling: The
convert
function accepts various input formats for the PDF:- URL: If the input is a URL, it fetches the PDF data using
fetch
. - Local Path: If the input is a local file path, it reads the file using
fs.readFile
. - Buffer: If the input is a
Buffer
object, it converts it to aUint8Array
. - Uint8Array: If the input is already a
Uint8Array
, it's used directly. - Base64-encoded String: If the input is a base64-encoded string, it decodes it into a
Uint8Array
.
let pdfData = pdf; if (typeof pdf === 'string') { // ... } else if (Buffer.isBuffer(pdf)) { pdfData = new Uint8Array(pdf); } else if (!pdf instanceof Uint8Array) { return pdf; }
- URL: If the input is a URL, it fetches the PDF data using
-
PDF Parsing: The
pdfjs.getDocument
function from PDF.js is used to parse the PDF data into apdfDocument
object.var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0}); var pdfDocument = await loadingTask.promise;
-
Page Rendering: The
doc_render
function iterates through each page of the PDF (either all pages or a specific set of pages specified in theconversion_config
). For each page, it:- Gets the page object using
pdfDocument.getPage
. - Creates a viewport based on the desired scale (either default or specified in the
conversion_config
). - Renders the page onto a
Canvas
object usingpage.render
. - Converts the
Canvas
to an image buffer usingcanvas.toBuffer
.
let page = await pdfDocument.getPage(pageNo); let viewport = page.getViewport({ scale: outputScale }); let renderTask = await page.render(renderContext).promise; let image = canvasAndContext.canvas.toBuffer();
- Gets the page object using
-
Output Handling: The
convert
function returns an array of image buffers, either asUint8Array
or base64-encoded strings, depending on thebase64
flag in theconversion_config
.if (conversion_config.base64) outputPages.push(currentPage.toString('base64')); else outputPages.push(new Uint8Array(currentPage));
In summary, pdf-img-convert.js
provides a convenient way to convert PDF files into images using pure JavaScript, without relying on external dependencies or filesystem operations. It leverages the capabilities of PDF.js for PDF parsing and rendering, and offers flexibility in terms of output format and page selection.