Skip to content

Instantly share code, notes, and snippets.

@tommorris
Last active September 20, 2024 22:13
Show Gist options
  • Save tommorris/6983a78a61ed3cae55d3bf62b9486bf7 to your computer and use it in GitHub Desktop.
Save tommorris/6983a78a61ed3cae55d3bf62b9486bf7 to your computer and use it in GitHub Desktop.
a simple script to extract nested PDFs (aka "PDF Portfolios") using pypdf
#! /usr/bin/env nix-shell
#! nix-shell -i python3 -p python311 python311Packages.pypdf
import sys
from pypdf import PdfReader
def extract_subpdfs():
for filename in sys.argv[1:]:
print("Extracting " + filename)
root_doc = PdfReader(open(filename, "rb"))
for name, doc in root_doc.attachments.items():
print (" Found " + name)
doc_bytes = doc[0]
with open(name, "wb") as fw:
fw.write(doc_bytes)
if __name__ == "__main__":
extract_subpdfs()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment