Skip to content

Instantly share code, notes, and snippets.

View documentprocessing's full-sized avatar

Document Processing documentprocessing

View GitHub Profile
@documentprocessing
documentprocessing / render-or-view-pdf-document-in-browser-using-pdfjs-javascript-library.html
Last active September 20, 2023 18:06
Render or View PDF Document in Browser using PDF.js JavaScript Library. Check https://products.documentprocessing.com/viewer/javascript/pdf.js/ for more details.
// This example contains necessary HTML and JavaScript code to demonstrate the use of PDF.js library
// by rendering a PDF document in the browser
<html>
<head>
// Link to PDF.js library
<script src="../build/pdf.js"></script>
</head>
<body>
@documentprocessing
documentprocessing / convert-html-to-pdf-via-web-url-in-python-using-weasyprint-library.html
Last active September 20, 2023 18:05
Convert HTML to PDF via Web URL and also with Inline CSS in Python using WeasyPrint Library. Check https://products.documentprocessing.com/conversion/python/weasyprint/ for more details.
// Import the HTML class from the WeasyPrint library
from weasyprint import HTML
// Instantiate HTML class and call write_pdf() method to convert Website URL to PDF
HTML('https://www.groupdocs.com/').write_pdf('groupdocs-weasyprint.pdf')
@documentprocessing
documentprocessing / add-annotations-to-images-in-javascript-using-annotorious-library.html
Last active February 21, 2024 15:52
Add annotations to images manually or automatically using JSON in JavaScript using Annotorious Library. Check https://products.documentprocessing.com/annotation/javascript/annotorious/ for more details.
<html>
<head>
<!-- Linking Annotorious Stylesheet -->
<link rel="stylesheet" href="dist/annotorious.min.css">
<!-- Integrating Annotorious JavaScript Library -->
<script type="text/javascript" src="dist/annotorious.min.js"></script>
</head>
<body>
@documentprocessing
documentprocessing / extract-images-from-pdf-in-python-using-pymupdf-library.py
Last active October 20, 2023 10:37
Explore PDF parsing features of PyMuPDF like extracting text, images & tables from PDF, inserting text into PDF or text recognition using OCR etc. Check https://products.documentprocessing.com/parser/python/pymupdf/ for more details.
# Import PyMuPDF
import fitz
# File path you want to extract images from
file = "data.pdf"
# Open the file
pdf_file = fitz.open(file)
# Iterate over PDF pages
@documentprocessing
documentprocessing / combine-or-join-multiple-pdfs-in-python-using-pymupdf-library.py
Last active October 10, 2023 08:21
Learn to combine or join multiple PDFs into one, split a PDF into multiple PDFs, rotate and delete PDF pages in Python using PyMuPDF library. Check https://products.documentprocessing.com/merger/python/pymupdf/ for more details.
# Import PyMuPDF
import fitz
# Open first document
doc1 = fitz.open("documentprocessing.pdf")
# Open second document
doc2 = fitz.open("data.pdf")
# Append document 2 after document 1
@documentprocessing
documentprocessing / add-rotate-and-crop-pdf-pages-in-python-using-pypdf-library.py
Last active October 18, 2023 07:37
Add, Rotate, Crop, Merge & Split PDF Files in Python using pypdf Library. Check https://products.documentprocessing.com/merger/python/pypdf/ for more details.
# Import the PdfWriter & PdfReader classes from the pypdf library
from pypdf import PdfWriter, PdfReader
# Open PDF document and instantiate writer object for performing operations on the PDF
reader = PdfReader("documentprocessing.pdf")
writer = PdfWriter()
# Add page 1 from reader to output document, unchanged:
writer.add_page(reader.pages[0])
@documentprocessing
documentprocessing / extract-attachments-from-pdf-in-python-using-pypdf-library.py
Last active October 18, 2023 13:08
Extract text, images and attachments from PDF files in Python using pypdf Library. Check https://products.documentprocessing.com/parser/python/pypdf/ for the details.
# Import the PdfReader class from the pypdf library
from pypdf import PdfReader
# Open a PDF file
reader = PdfReader("data.pdf")
# Iterate through the attachments in the PDF
for name, content_list in reader.attachments:
# Iterate through the contents in each attachment
@documentprocessing
documentprocessing / extract-font-information-from-pdf-document-in-python-using-pdfminersix-library.py
Last active October 24, 2023 14:26
Extract Text and Font Information from PDF documents in Python using pdfminer.six Library. Check https://products.documentprocessing.com/parser/python/pdfminer.six/ for more details.
# Import required classes from the pdfminer.six library
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
# Open the PDF file
with open('documentprocessing.pdf', 'rb') as pdf_file:
@documentprocessing
documentprocessing / convert-pdf-to-html-in-python-using-pdfminersix-library.py
Last active October 27, 2023 02:45
Convert PDF to HTML and PDF to XML in Python using pdfminer.six Library. Check https://products.documentprocessing.com/conversion/python/pdfminer.six/ for more details.
# Import extract_text_to_fp function from pdfminer.high_level module
from pdfminer.high_level import extract_text_to_fp
# Import BytesIO class from io module
from io import BytesIO
# Specify the PDF file you want to convert to HTML
pdf_file = 'documentprocessing.pdf'
# Create an in-memory buffer to store the HTML output
@documentprocessing
documentprocessing / add-crossed-out-text-to-pdf-in-javascript-using-pdfkit.js
Last active November 28, 2023 08:24
Add Links, Crossed-Out Text & Interactive Notes Annotations to PDF documents in JavaScript using PDFKit Library. Check https://products.documentprocessing.com/annotation/javascript/pdfkit/ for more details.
// Include pdfkit library and fs module of Node.js
const PDFDocument = require('pdfkit');
const fs = require('fs');
// Create a new PDF document
const doc = new PDFDocument();
// Create a writable stream to save the PDF
const stream = fs.createWriteStream('annotations.pdf');