Skip to content

Instantly share code, notes, and snippets.

@tommorris
Last active March 9, 2024 07:37
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tommorris/6983a78a61ed3cae55d3bf62b9486bf7 to your computer and use it in GitHub Desktop.
Save tommorris/6983a78a61ed3cae55d3bf62b9486bf7 to your computer and use it in GitHub Desktop.
a simple script to extract nested PDFs (aka "PDF Portfolios") using pypdf
#! /usr/bin/env nix-shell
#! nix-shell -i python3 -p python311 python311Packages.pypdf
import sys
from pypdf import PdfReader
def extract_subpdfs():
for filename in sys.argv[1:]:
print("Extracting " + filename)
root_doc = PdfReader(open(filename, "rb"))
for name, doc in root_doc.attachments.items():
print (" Found " + name)
doc_bytes = doc[0]
with open(name, "wb") as fw:
fw.write(doc_bytes)
if __name__ == "__main__":
extract_subpdfs()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment