Skip to content

Instantly share code, notes, and snippets.

@NanoDano
Created July 31, 2016 18:23
Show Gist options
  • Save NanoDano/e092cf9f219e4b0506743bb64d303452 to your computer and use it in GitHub Desktop.
Save NanoDano/e092cf9f219e4b0506743bb64d303452 to your computer and use it in GitHub Desktop.
Extract PNGs from a file using Python
# extract_pngs.py
# Extract PNGs from a file and put them in a pngs/ directory
import sys
with open(sys.argv[1], "rb") as binary_file:
binary_file.seek(0, 2) # Seek the end
num_bytes = binary_file.tell() # Get the file size
count = 0
for i in range(num_bytes):
binary_file.seek(i)
eight_bytes = binary_file.read(8)
if eight_bytes == b"\x89\x50\x4e\x47\x0d\x0a\x1a\x0a": # PNG signature
count += 1
print("Found PNG Signature #" + str(count) + " at " + str(i))
# Next four bytes after signature is the IHDR with the length
png_size_bytes = binary_file.read(4)
png_size = int.from_bytes(png_size_bytes, byteorder='little', signed=False)
# Go back to beginning of image file and extract full thing
binary_file.seek(i)
# Read the size of image plus the signature
png_data = binary_file.read(png_size + 8)
with open("pngs/" + str(i) + ".png", "wb") as outfile:
outfile.write(png_data)
@arpruss
Copy link

arpruss commented Jan 11, 2024

Byte order in png files is bigendian. And I think the data after the signature is the size of the IHDR, not the whole png.

@arpruss
Copy link

arpruss commented Jan 11, 2024

Here's a working version:

# extract_pngs.py
# Extract PNGs from a file and put them in a pngs/ directory
import sys
import os

try:
    os.mkdir("pngs")
except:
    pass

with open(sys.argv[1], "rb") as binary_file:
    binary_file.seek(0, 2)  # Seek the end
    num_bytes = binary_file.tell()  # Get the file size

    count = 0
    for i in range(num_bytes):
        binary_file.seek(i)
        eight_bytes = binary_file.read(8)
        if eight_bytes == b"\x89\x50\x4e\x47\x0d\x0a\x1a\x0a":  # PNG signature
            count += 1
            print("Found PNG Signature #%d at 0x%08x" % (count,i))
            
            with open("pngs/%08x.png" % i, "wb") as outfile:
                outfile.write(eight_bytes)

                while True:
                    sizeData = binary_file.read(4)
                    size = 4+int.from_bytes(sizeData, byteorder='big', signed=False)
                    chunk = binary_file.read(4)
                    
                    outfile.write(sizeData)
                    outfile.write(chunk)
                    
                    data = binary_file.read(size)
                    outfile.write(data)
                    
                    if chunk == b'IEND':
                        break
                    
                        

@NanoDano
Copy link
Author

Thanks. I’m certain I would have tested this….now I will have to go back and try

@arpruss
Copy link

arpruss commented Jan 11, 2024

The two bugs in the original script kind of canceled out: reading the length as low-endian basically took the length of IHDR and made a very large number out of it, and then it copied a long file that probably would include all of the original png. As a result, you got a usable png but with a lot of junk at the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment