|from PyPDF2 import PdfFileReader, PdfFileWriter|
|from PyPDF2 import PageObject|
|# Theses files are just for testing, no point in merging these|
|reader = PdfFileReader(open("Nextcloud Manual.pdf",'rb'))|
|# this defines the output page format (relevant if not the same)|
|sup_reader = PdfFileReader(open("Cplusplus.pdf",'rb'))|
|writer = PdfFileWriter()|
|for pageNo in range(min(reader.getNumPages(), sup_reader.getNumPages())):|
|print("Merging page:", pageNo)|
|invoice_page = reader.getPage(pageNo)|
|sup_page = sup_reader.getPage(pageNo)|
|translated_page = PageObject.createBlankPage(None, sup_page.mediaBox.getWidth(), sup_page.mediaBox.getHeight())|
|translated_page.mergeScaledTranslatedPage(sup_page, 1, 0, 0)|
|with open('out.pdf', 'wb') as f:|
Jul 15, 2020
@digitalix-ai Thats great, you're welcome!
Just one word of advice if you have any issues or plan to use this on a regular basis: Since I wrote this I realised that PyPDF has quite a number of bugs and last time I checked it was very much unmaintained. (I think that there are new PDF versions or -features that were just never implemented there so it depends on what PDFs you are working with.)
What I can really recommend is qpdf (C and CLI) and pikepdf for Python bindings. Both libraries are extremely well maintained. Unfortunately this exact usecase is not supported yet in pikepdf, see pikepdf/pikepdf#42.
Thank you. I just used your example to complete one exercise from the online Python course I'm on. I didn't want to check the answer before I solve it myself (with googling). I modified it a little, as per my needs. Yet, my mentor's solution was somewhat better. I actually don't need to create a new PageObject, I can just merge what is invoice_page in your example with the other page(s). If I need to work with pdf files in the future, I'll probably just google 'pdf python' and choose the most popular library. But your work helped me for now, thank you for that and your advice too.
Sep 16, 2022
As noted by @dreua below, this has been addressed in the code posted here. I'm going to leave an edited version of this note though because others may still find code with
from PyPDF2.pdf import PageObject in places and wonder about the difference.
Note that the second import statement of this code was outdated up until last week with regards to current use. Here in the documentation for PyPDF it states: "PyPDF2.pdf no longer exists. You can import from PyPDF2 directly". So the second import line became
from PyPDF2 import PageObject. Otherwise, you get the error
ModuleNotFoundError: No module named 'PyPDF2.pdf'.
Sep 16, 2022
@fomightez Thank you, I just edited this Gist accordingly.
This helped me a lot, thanks!