Skip to content

Instantly share code, notes, and snippets.

@MuhammadSawalhy
Last active July 25, 2022 19:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MuhammadSawalhy/fb21fcb8275c2042e6cf22307e06bc8d to your computer and use it in GitHub Desktop.
Save MuhammadSawalhy/fb21fcb8275c2042e6cf22307e06bc8d to your computer and use it in GitHub Desktop.
Replace images in a pdf using python
import math
import glob
from PIL import Image
for img in glob.glob("images/*.png"):
# for img in ["imageroot-014.png"]:
with Image.open(img) as im:
x = math.floor(im.size[0]/2)
px = im.getpixel((x,-1))
if px == (12,12,12):
print(img)

Plan A

  1. list images with command: pdfimages -j -png file.pdf img
  2. run list-code-images.py to find the dark theme code images
  3. invert these image to make them light theme code:
    for f in `cat file.code-images.txt`; do
      convert $f -channel RGB -negate inversed/$f
    done
  4. find a way to replace images in a pdf with code (but I gave up here)

Plan B

I end up using PyMuPDF to invert dark theme code images and save them in the same position using replace-images.py.

You will need to install these packages:

pip install fitz PyMuPDF

image

Alhamdulillah, all images replaced. This is an illusion of replacement, because the new images are placed in top of the old images. I think there exists some possible ways to really replace using Document.update_object or Document.update_stream provided by PyMuPDF package.

import fitz
from os import listdir
# This creates the Document object doc
for file in listdir("./files"):
doc: fitz.Document = fitz.open(f"./files/{file}")
for page in doc:
for img in page.get_images(full=True):
xref = img[0]
pix = fitz.Pixmap(doc, xref)
bg_color = pix.pixel(pix.width - 1, int(pix.height / 2))
if bg_color == (12,12,12):
pix.invert_irect()
rect = page.get_image_bbox(img)
page.insert_image(rect, pixmap=pix, keep_proportion=False)
# doc.save(filename=r"file.new.pdf", clean=True)
# doc.save(filename=r"file.new.pdf", clean=True, garbage=4)
# without deflate_images=1 the file size is 112MB, but now it is just 12MB
doc.save(filename=f"./processed-files/{file}", clean=True, deflate=4, deflate_images=1, deflate_fonts=1)
doc.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment