Skip to content

Instantly share code, notes, and snippets.

View fileformat-blog-gists's full-sized avatar

fileformat-blog-gists

View GitHub Profile
@fileformat-blog-gists
fileformat-blog-gists / extract-text-from-pdf-using-pypdf-in-python.py
Created January 29, 2025 02:47
This Python script demonstrates how to extract text from a PDF file using the PyPDF library.
from pypdf import PdfReader
# Path to the PDF file
pdf_file = "pdf-to-extract-text/input.pdf"
# Create a PDF reader object
reader = PdfReader(pdf_file)
# Loop through the pages and extract text
for page_number, page in enumerate(reader.pages, start=1):
@fileformat-blog-gists
fileformat-blog-gists / convert-pdf-to-image-in-python.py
Created January 29, 2025 02:05
Convert PDF to Image in Python
# Import required libraries
from pdf2image import convert_from_path
from PIL import Image
# Specify the path to the PDF file
pdf_path = 'sample.pdf'
# Convert PDF to a list of images
try:
images = convert_from_path(pdf_path)
@fileformat-blog-gists
fileformat-blog-gists / extract-text-from-pdf-using-pypdf-in-python.py
Last active January 28, 2025 13:39
This Python script demonstrates how to extract text from PDF files using the PyPDF library.
from pypdf import PdfReader
# Path to the PDF file
pdf_file = "pdf-to-extract-text/input.pdf"
# Create a PDF reader object
reader = PdfReader(pdf_file)
# Loop through the pages and extract text
for page_number, page in enumerate(reader.pages, start=1):
@fileformat-blog-gists
fileformat-blog-gists / output-extract-text-from-pdf-using-pymupdf
Created January 15, 2025 22:15
Output - Extracting text from a PDF using PyMuPDF
Extracted Text using PyMuPDF:
This is a sample pdf. Page 1
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been
the
industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type
and scrambled it to make a type specimen book. It has survived not only five centuries, but also the
leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with
the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop
publishing software like Aldus PageMaker including versions of Lorem Ipsum.
@fileformat-blog-gists
fileformat-blog-gists / extract-text-from-pdf-using-pymupdf.py
Created January 15, 2025 22:10
Here is a complete code example for extracting text from a PDF using PyMuPDF.
import fitz # PyMuPDF library
# Specify the PDF file path
pdf_file_path = "sample.pdf"
# Open the PDF file
pdf_document = fitz.open(pdf_file_path)
# Initialize a variable to store the extracted text
extracted_text = ""
@fileformat-blog-gists
fileformat-blog-gists / output-extract-text-from-pdf-using-pypdf
Created January 15, 2025 22:05
Output - Extracting text from a PDF using pypdf
Extracted Text using pypdf:
This is a sample pdf. Page 1
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the
industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type
and scrambled it to make a type specimen book. It has survived not only five centuries, but also the
leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with
the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop
publishing software like Aldus PageMaker including versions of Lorem Ipsum.
This is sample pdf. Page 2
@fileformat-blog-gists
fileformat-blog-gists / extract-text-from-pdf-using-pypdf.py
Created January 15, 2025 21:56
Here is a complete code example for extracting text from a PDF using pypdf.
from pypdf import PdfReader
# Specify the PDF file path
pdf_file_path = "sample.pdf"
# Create a PDF reader object
reader = PdfReader(pdf_file_path)
# Initialize a variable to store the extracted text
extracted_text = ""