Skip to content

Instantly share code, notes, and snippets.

@QuanticPotatoes
Created February 21, 2022 15:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save QuanticPotatoes/521e0972c268f0fbd9301ed71be1485d to your computer and use it in GitHub Desktop.
Save QuanticPotatoes/521e0972c268f0fbd9301ed71be1485d to your computer and use it in GitHub Desktop.
extract images from pdf with pdf-lib, Jimp and file-type
import { PDFDocument, PDFRawStream } from 'pdf-lib';
import * as FileType from 'file-type';
import Jimp from 'jimp';
const extractImageFromPDF = async (file: Buffer) => {
const pdfDoc = await PDFDocument.load(file, { ignoreEncryption: true });
const indirects = pdfDoc.context.enumerateIndirectObjects();
const images = [];
for (let i = 0; i < indirects.length; i += 1) {
const [, pdfObject] = indirects[i];
if (pdfObject instanceof PDFRawStream) {
const buffer = Buffer.from(pdfObject.contents);
const { mime } = await FileType.fromBuffer(buffer);
const image = await Jimp.read(buffer);
images.push(await image.getBufferAsync(mime));
}
}
return images;
};
export {
extractImageFromPDF,
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment