Skip to content

Instantly share code, notes, and snippets.

@dreua
Forked from Geekfish/merge_pdfs.py
Last active April 28, 2023 01:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dreua/ab99543b7cc1b670419d1d3054a3a30e to your computer and use it in GitHub Desktop.
Save dreua/ab99543b7cc1b670419d1d3054a3a30e to your computer and use it in GitHub Desktop.
pyPDF2 merge 2 pdf pages into one
#!/bin/python3
from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2 import PageObject
# Theses files are just for testing, no point in merging these
reader = PdfFileReader(open("Nextcloud Manual.pdf",'rb'))
# this defines the output page format (relevant if not the same)
sup_reader = PdfFileReader(open("Cplusplus.pdf",'rb'))
writer = PdfFileWriter()
for pageNo in range(min(reader.getNumPages(), sup_reader.getNumPages())):
print("Merging page:", pageNo)
invoice_page = reader.getPage(pageNo)
sup_page = sup_reader.getPage(pageNo)
translated_page = PageObject.createBlankPage(None, sup_page.mediaBox.getWidth(), sup_page.mediaBox.getHeight())
translated_page.mergeScaledTranslatedPage(sup_page, 1, 0, 0)
translated_page.mergePage(invoice_page)
writer.addPage(translated_page)
with open('out.pdf', 'wb') as f:
writer.write(f)
@digitalix-ai
Copy link

This helped me a lot, thanks!

@dreua
Copy link
Author

dreua commented Jul 15, 2020

@digitalix-ai Thats great, you're welcome!

Just one word of advice if you have any issues or plan to use this on a regular basis: Since I wrote this I realised that PyPDF has quite a number of bugs and last time I checked it was very much unmaintained. (I think that there are new PDF versions or -features that were just never implemented there so it depends on what PDFs you are working with.)

What I can really recommend is qpdf (C and CLI) and pikepdf for Python bindings. Both libraries are extremely well maintained. Unfortunately this exact usecase is not supported yet in pikepdf, see pikepdf/pikepdf#42.

@digitalix-ai
Copy link

Thank you. I just used your example to complete one exercise from the online Python course I'm on. I didn't want to check the answer before I solve it myself (with googling). I modified it a little, as per my needs. Yet, my mentor's solution was somewhat better. I actually don't need to create a new PageObject, I can just merge what is invoice_page in your example with the other page(s). If I need to work with pdf files in the future, I'll probably just google 'pdf python' and choose the most popular library. But your work helped me for now, thank you for that and your advice too.

@fomightez
Copy link

fomightez commented Sep 16, 2022

As noted by @dreua below, this has been addressed in the code posted here. I'm going to leave an edited version of this note though because others may still find code with from PyPDF2.pdf import PageObject in places and wonder about the difference.

Note that the second import statement of this code was outdated up until last week with regards to current use. Here in the documentation for PyPDF it states: "PyPDF2.pdf no longer exists. You can import from PyPDF2 directly". So the second import line became from PyPDF2 import PageObject. Otherwise, you get the error ModuleNotFoundError: No module named 'PyPDF2.pdf'.

@dreua
Copy link
Author

dreua commented Sep 16, 2022

@fomightez Thank you, I just edited this Gist accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment