Last active
April 22, 2023 03:02
-
-
Save GrayedFox/8cabb5bc81312cbff0a0a9244683d06c to your computer and use it in GitHub Desktop.
Extract JPG image from binary data
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# please ensure python means python3 on your system | |
# the file can be any binary file that contains a JPG image | |
# note that it's hungry and doesn't chunk the read so careful with large files | |
# usage: extract-jpg file_name | |
import sys | |
file_name = sys.argv[1] | |
def extract_jpg_image(): | |
jpg_byte_start = b'\xff\xd8' | |
jpg_byte_end = b'\xff\xd9' | |
jpg_image = bytearray() | |
with open(file_name, 'rb') as f: | |
req_data = f.read() | |
start = req_data.find(jpg_byte_start) | |
if start == -1: | |
print('Could not find JPG start of image marker!') | |
return | |
end = req_data.find(jpg_byte_end, start) + len(jpg_byte_end) | |
jpg_image += req_data[start:end] | |
print(f'Size: {end - start} bytes') | |
with open(f'{file_name}-extracted-img.jpg', 'wb') as f: | |
f.write(jpg_image) | |
if __name__ == "__main__": | |
extract_jpg_image() | |
Thanks @countingpine - happy it served you - took your advice and added the correct length in the variable declaration - haven't tested but it looks fine, also fixed the typo ⚡
Hey. Thanks for responding.
Unfortunately it looks like the jpg_image += req_data[start:end+1]
line got deleted?
There's a declaration for an empty array at the top of the function body, which isn't needed, but allows the code to still "compile".
I think that one could be deleted, and the other one fixed and brought back.
Added back the line I deleted by mistake - I don't mind having the bytearray initialised as empty, I find it makes the snippet easier to read - but feel free to instead just write the JPG chunk of data directly if you prefer 👍
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey, thanks for posting this.
I think Line 28 should be something likestart:end+len(jpg_byte_end)
, because the footer is two bytes. Otherwise it chops off the last byte.EDIT: Actually, probably better to do
end = req_data.find(jpg_byte_end, start)+len(jpg_byte_end)
. And then+1
isn't needed on the following line.There's also a typo on Line 32 ("exracted").