This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# ------------------------------------------------------------------ | |
# Script Name: chunk_pdf_with_overlap.py | |
# Description: This script processes one PDF file and chunks it into | |
# chunks of chunk_size=n pages with a overlap of | |
# overlap=n | |
# Website: https://gist.github.com/alsiesta | |
# Version: 1.0 | |
# Usage: py chunk_pdf_with_overlap.py <input_pdf> <output_prefix> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# ------------------------------------------------------------------ | |
# Script Name: pdfcwcount_range.py | |
# Description: This script processes a range of PDF files and | |
# identifies those with a character count exceeding | |
# a specified threshold. | |
# Website: https://gist.github.com/ostechnix | |
# Version: 1.0 | |
# Usage: py pdf_processor.py <range_param> <char_threshold> |