Skip to content

Instantly share code, notes, and snippets.

@alsiesta
alsiesta / chunk_pdf_with_overlap.py
Last active September 2, 2024 11:18
Chunks a PDF into chunks with a specific chunk_size n pages and n overlap per chunk
#!/usr/bin/env python3
# ------------------------------------------------------------------
# Script Name: chunk_pdf_with_overlap.py
# Description: This script processes one PDF file and chunks it into
# chunks of chunk_size=n pages with a overlap of
# overlap=n
# Website: https://gist.github.com/alsiesta
# Version: 1.0
# Usage: py chunk_pdf_with_overlap.py <input_pdf> <output_prefix>
@alsiesta
alsiesta / countcharacter_in_pdf_range.py
Last active September 2, 2024 08:34
This script processes a range of PDF files and identifies those with a character count exceeding a specified threshold.
#!/usr/bin/env python3
# ------------------------------------------------------------------
# Script Name: pdfcwcount_range.py
# Description: This script processes a range of PDF files and
# identifies those with a character count exceeding
# a specified threshold.
# Website: https://gist.github.com/ostechnix
# Version: 1.0
# Usage: py pdf_processor.py <range_param> <char_threshold>