Skip to content

Instantly share code, notes, and snippets.

@kordless
Created December 1, 2023 22:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kordless/e62fba83431ccebdf7c38e7b3ba051aa to your computer and use it in GitHub Desktop.
Save kordless/e62fba83431ccebdf7c38e7b3ba051aa to your computer and use it in GitHub Desktop.
PDF File Splitter

To run the Python script for splitting a PDF into segments of just under 25MB each, you'll need to follow these steps:

Prerequisites

Python Installation: Ensure that Python is installed on your system. If not, you can download and install it from python.org.

PyPDF2 Library: The script uses the PyPDF2 library. You can install it using pip, Python's package installer. If pip is not already installed, it comes bundled with Python 3.4 and later versions.

Installation Steps

Open Terminal or Command Prompt: On Windows, you can open Command Prompt by searching for cmd in the Start menu. On macOS or Linux, open Terminal.

Install PyPDF2: Run the following command to install the PyPDF2 library:

pip install PyPDF2

Running the Script

Save the Script: Save the provided Python script to a file on your computer. Let's name it split_pdf_25MB.py.

Locate the PDF: Make sure you know the path of the PDF file you want to split and that it is accessible.

Open Terminal/Command Prompt in the Script's Directory:

On Windows: Navigate to the folder where split_pdf_25MB.py is saved using the cd command. For example, if it's saved in C:\Users\YourUsername\Documents, use cd C:\Users\YourUsername\Documents. On macOS/Linux: Use the cd command to navigate to the directory where the script is saved. Run the Script: Execute the script by typing:

python split_pdf_25MB.py

Follow the on-screen prompts to enter the filename and the output prefix.

Notes

The script will ask for the file name of the PDF and a prefix for the output files. Ensure the file name is correct and the file exists in the specified default directory (~/Desktop/mitta/). The output PDFs will be saved in the same directory as the input file, named with the provided prefix and indicating the page ranges. If you encounter any issues during installation or running the script, check for error messages in the command prompt or terminal, which can provide insights into what might be going wrong.

# written by ChatGPT 4 and Kord Campbell.
# do what you will with it
import PyPDF2
import os
from io import BytesIO
def get_pdf_size(writer):
"""Get the size of the PDF currently in the writer."""
temp_buffer = BytesIO()
writer.write(temp_buffer)
size = len(temp_buffer.getvalue())
return size
def split_pdf(file_path, output_prefix, max_size=25*1024*1024): # max_size in bytes
with open(file_path, 'rb') as file:
reader = PyPDF2.PdfReader(file)
total_pages = len(reader.pages)
start_page = 0
while start_page < total_pages:
writer = PyPDF2.PdfWriter()
end_page = start_page
current_size = 0
while end_page < total_pages:
writer.add_page(reader.pages[end_page])
temp_size = get_pdf_size(writer)
if temp_size > max_size:
if end_page == start_page:
# This means a single page is larger than the max size, so we have to include it.
end_page += 1
break
else:
current_size = temp_size
end_page += 1
output_filename = os.path.join(os.path.dirname(file_path), f"{output_prefix}_pages_{start_page + 1}_to_{end_page}.pdf")
with open(output_filename, 'wb') as output_file:
writer.write(output_file)
start_page = end_page
if __name__ == "__main__":
default_directory = os.path.expanduser("~/Desktop/mitta/")
file_name = input("Enter the file name (e.g., filename.pdf): ")
file_path = os.path.join(default_directory, file_name)
output_prefix = input("Enter the prefix for the output files: ")
split_pdf(file_path, output_prefix)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment