Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Decompress FlateDecode Objects in PDF
#!/bin/bash
import re
import zlib
pdf = open("some_doc.pdf", "rb").read()
stream = re.compile(r'.*?FlateDecode.*?stream(.*?)endstream', re.S)
for s in stream.findall(pdf):
s = s.strip('\r\n')
try:
print(zlib.decompress(s))
print("")
except:
pass
@trthhrtz

This comment has been minimized.

Copy link

trthhrtz commented Aug 19, 2017

It actually worked. Cool, man. Thanks!

@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Aug 22, 2018

only the first line is wrong. works with #!/usr/bin/python

@Mitmischer

This comment has been minimized.

Copy link

Mitmischer commented Dec 1, 2018

You must add b before the strings to make it work with current python. Other than that, works like a charm!

@Jonaphant

This comment has been minimized.

Copy link

Jonaphant commented Jan 11, 2019

Where exactly does the "b" need to be added?

@jsawruk

This comment has been minimized.

Copy link

jsawruk commented Jan 15, 2019

@Jonaphant: In the regular expression definition:

stream = re.compile(rb'.*?FlateDecode.*?stream(.*?)endstream', re.S)
@karan-ta

This comment has been minimized.

Copy link

karan-ta commented Feb 23, 2019

on one of my pdf file - I get this error
zlib.error: Error -5 while decompressing data: incomplete or truncated stream

only 3 out of 4 pages are compressed ...

@johnebgood

This comment has been minimized.

Copy link

johnebgood commented Mar 5, 2019

I also needed to add a 'b' to the newline removal: s = s.strip(b'\r\n')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.