Skip to content

Instantly share code, notes, and snippets.

@GrayedFox
Last active April 22, 2023 03:02
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save GrayedFox/8cabb5bc81312cbff0a0a9244683d06c to your computer and use it in GitHub Desktop.
Save GrayedFox/8cabb5bc81312cbff0a0a9244683d06c to your computer and use it in GitHub Desktop.
Extract JPG image from binary data
#!/usr/bin/env python
# please ensure python means python3 on your system
# the file can be any binary file that contains a JPG image
# note that it's hungry and doesn't chunk the read so careful with large files
# usage: extract-jpg file_name
import sys
file_name = sys.argv[1]
def extract_jpg_image():
jpg_byte_start = b'\xff\xd8'
jpg_byte_end = b'\xff\xd9'
jpg_image = bytearray()
with open(file_name, 'rb') as f:
req_data = f.read()
start = req_data.find(jpg_byte_start)
if start == -1:
print('Could not find JPG start of image marker!')
return
end = req_data.find(jpg_byte_end, start) + len(jpg_byte_end)
jpg_image += req_data[start:end]
print(f'Size: {end - start} bytes')
with open(f'{file_name}-extracted-img.jpg', 'wb') as f:
f.write(jpg_image)
if __name__ == "__main__":
extract_jpg_image()
@countingpine
Copy link

countingpine commented Mar 30, 2023

Hey, thanks for posting this.

I think Line 28 should be something like start:end+len(jpg_byte_end), because the footer is two bytes. Otherwise it chops off the last byte.
EDIT: Actually, probably better to do end = req_data.find(jpg_byte_end, start)+len(jpg_byte_end). And then +1 isn't needed on the following line.

There's also a typo on Line 32 ("exracted").

@GrayedFox
Copy link
Author

Thanks @countingpine - happy it served you - took your advice and added the correct length in the variable declaration - haven't tested but it looks fine, also fixed the typo ⚡

@countingpine
Copy link

Hey. Thanks for responding.
Unfortunately it looks like the jpg_image += req_data[start:end+1] line got deleted?
There's a declaration for an empty array at the top of the function body, which isn't needed, but allows the code to still "compile".
I think that one could be deleted, and the other one fixed and brought back.

@GrayedFox
Copy link
Author

Added back the line I deleted by mistake - I don't mind having the bytearray initialised as empty, I find it makes the snippet easier to read - but feel free to instead just write the JPG chunk of data directly if you prefer 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment