Skip to content

Instantly share code, notes, and snippets.

@StevenMapes
Created February 18, 2021 14:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save StevenMapes/dc9b31e80ac5e718a7d6565265a85c21 to your computer and use it in GitHub Desktop.
Save StevenMapes/dc9b31e80ac5e718a7d6565265a85c21 to your computer and use it in GitHub Desktop.
Extract JPG from pdf
import sys
with open("test.pdf", "rb") as f:
pdf = f.read()
startmark = b"\xff\xd8"
startfix = 0
endmark = b"\xff\xd9"
endfix = 2
i = 0
njpg = 0
while True:
istream = pdf.find(b"stream", i)
if istream < 0:
break
istart = pdf.find(startmark, istream, istream+20)
if istart < 0:
i = istream+20
continue
iend = pdf.find(b"endstream", istart)
if iend < 0:
raise Exception("Didn't find end of stream!")
iend = pdf.find(endmark, iend-20)
if iend < 0:
raise Exception("Didn't find end of JPG!")
istart += startfix
iend += endfix
print("JPG %d from %d to %d" % (njpg, istart, iend))
jpg = pdf[istart:iend]
with open("jpg%d.jpg" % njpg, "wb") as f:
f.write(jpg)
njpg += 1
i = iend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment