Skip to content

Instantly share code, notes, and snippets.

@leommoore
Last active December 26, 2024 06:16
Show Gist options
  • Save leommoore/f9e57ba2aa4bf197ebc5 to your computer and use it in GitHub Desktop.
Save leommoore/f9e57ba2aa4bf197ebc5 to your computer and use it in GitHub Desktop.
File Magic Numbers

File Magic Numbers

Magic numbers are the first bits of a file which uniquely identify the type of file. This makes programming easier because complicated file structures need not be searched in order to identify the file type.

For example, a jpeg file starts with ffd8 ffe0 0010 4a46 4946 0001 0101 0047 ......JFIF.....G ffd8 shows that it's a JPEG file, and ffe0 identify a JFIF type structure. There is an ascii encoding of "JFIF" which comes after a length code, but that is not necessary in order to identify the file. The first 4 bytes do that uniquely.

This gives an ongoing list of file-type magic numbers.

Image Files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
Bitmap format .bmp 42 4d BM
FITS format .fits 53 49 4d 50 4c 45 SIMPLE
GIF format .gif 47 49 46 38 GIF8
Graphics Kernel System .gks 47 4b 53 4d GKSM
IRIS rgb format .rgb 01 da ..
ITC (CMU WM) format .itc f1 00 40 bb ....
JPEG File Interchange Format .jpg ff d8 ff e0 ....
NIFF (Navy TIFF) .nif 49 49 4e 31 IIN1
PM format .pm 56 49 45 57 VIEW
PNG format .png 89 50 4e 47 .PNG
Postscript format .[e]ps 25 21 %!
Sun Rasterfile .ras 59 a6 6a 95 Y.j.
Targa format .tga xx xx xx ...
TIFF format (Motorola - big endian) .tif 4d 4d 00 2a MM.*
TIFF format (Intel - little endian) .tif 49 49 2a 00 II*.
X11 Bitmap format .xbm xx xx
XCF Gimp file structure .xcf 67 69 6d 70 20 78 63 66 20 76 gimp xcf
Xfig format .fig 23 46 49 47 #FIG
XPM format .xpm 2f 2a 20 58 50 4d 20 2a 2f /* XPM */

Compressed files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
Bzip .bz 42 5a BZ
Compress .Z 1f 9d ..
gzip format .gz 1f 8b ..
pkzip format .zip 50 4b 03 04 PK..

Archive files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
TAR (pre-POSIX) .tar xx xx (a filename)
TAR (POSIX) .tar 75 73 74 61 72 ustar (offset by 257 bytes)

Excecutable files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
MS-DOS, OS/2 or MS Windows   4d 5a MZ
Unix elf   7f 45 4c 46 .ELF

##Miscellaneous files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
pgp public ring   99 00 ..
pgp security ring   95 01 ..
pgp security ring   95 00 ..
pgp encrypted data   a6 00 ¦.

Source

@spixi
Copy link

spixi commented Aug 13, 2021

There are two different metadata formats for JPEG files: JFIF and EXIF. JFIF starts with ff d8 ff e0 and EXIF starts with ff d8 ff e1.

@BathriNathan
Copy link

There are two different metadata formats for JPEG files: JFIF and EXIF. JFIF starts with ff d8 ff e0 and EXIF starts with ff d8 ff e1.

thanks for sharing the link for an explanation, I can now understand that we have some other metadata formats for JPEG however I could not still get the image file from the link. do you have any solution for this. can we add ff d8 ff e0 ff d8 ff e1 ff d8 ff e2 ff d8 ff e8 for JPEG validation?

@spixi
Copy link

spixi commented Aug 17, 2021

@BathriNathan Here are two example files.
JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=13, manufacturer=Panasonic, model=DMC-TZ10, orientation=upper-left, xresolution=196, yresolution=204, resolutionunit=2, software=Ver.1.1 , datetime=2014:07:03 16:26:13], baseline, precision 8, 4000x3000, components 3
JPEG image data, JFIF standard 1.01, resolution (DPI), density 180x180, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=11, manufacturer=Panasonic, model=DMC-TZ10, orientation=upper-left, software=Ver.1.1 , datetime=2014:07:15 13:17:54], baseline, precision 8, 2597x1948, components 3

I would consider all headers from ff d8 ff e0 to ff d8 ff ef as valid JPEG files. I could find further examples on the Internet:
e2 Canon
e3 Samsung
e4 (maybe a corruption, so no 100 % evidence, see espressif/esp32-camera#4 )
e8 SPIFF (Still Picture Interchange File Format)
ee unknown, mentioned on Wikipedia

The JPEG specification can be found here. If you look at page 32, you can see that ff e0 to ff ef are used for application segments. If there is no application segment at all (which is totally fine, since application metadata are optional), you could expect ff db (define quantization table(s)) after ff d8 (start of image), which is also mentioned as valid on the Wikipedia article.

@BathriNathan
Copy link

@spixi thanks for sharing the image. so now to be in the safe side, I will also consider the other number as you said in the validation. do we have this problem only in the JPEG or also in PNG and GIF?

@spixi
Copy link

spixi commented Aug 17, 2021

@BathriNathan Although PNG allows a wide range of header fields, the magic word .PNG always appears in the first four bytes. So everything is fine. GIF always starts with GIF87a or GIF89a, which have both GIF8 in common. Everything is fine. I don’t know much about TIFF, because it has many variants like Exif, sDCF, TIFF/EP, TIFF/IT an GeoTIFF and also may come with different byte orders in the header.

@BathriNathan
Copy link

@spixi I have all the mandatory hex numbers in my validation for JPEG, PNG, and GIF as per our discussion. It is working fine as expected.

@racg0092
Copy link

racg0092 commented Mar 26, 2022

What about for .pdf file what are the magic numbers in that case ? Is it <Buffer 25 50 44 46> ?

@leommoore
Copy link
Author

leommoore commented Mar 26, 2022

@racg0092 If you check out the Wikipedia list of file signatures it looks like the signature for a .pdf is 25 50 44 46 2D OR DF BF 34 EB CE

@BathriNathan
Copy link

@racg0092
Copy link

@leommoore @BathriNathan

Thanks for feedback ! Much appreciated it.

@merajhasan88
Copy link

merajhasan88 commented Jul 13, 2022

My JPEG has several ffd8. One at the start and several throughout the image. What are the middle ones supposed to be?

Edit: these not markers. But rather 'ff' may be from one marker and 'd8' from another and they would be adjacent by chance.
And no, these aren't thumbnails.

@mrpc4
Copy link

mrpc4 commented Feb 23, 2023

Great information

@spixi
Copy link

spixi commented Feb 23, 2023

My JPEG has several ffd8. One at the start and several throughout the image. What are the middle ones supposed to be?

Edit: these not markers. But rather 'ff' may be from one marker and 'd8' from another and they would be adjacent by chance. And no, these aren't thumbnails.

@merajhasan88 ffd8 is the start of the image. IIRC JPEG container may contain more than one image, not only thumbnails, but also preview images for the web. A web browser could show a low-quality image while the high-quality image is loading. TIFF has a similar feature.

@nishantbhadke
Copy link

@qti3e There is no Magic number for plain Text files. As I debugged Text file changes the magic number every time. Other magic numbers are working but for the Text File I am getting new Magic numbers every time. Can anyone help me here!

@spixi
Copy link

spixi commented Apr 17, 2023

@qti3e There is no Magic number for plain Text files. As I debugged Text file changes the magic number every time. Other magic numbers are working but for the Text File I am getting new Magic numbers every time. Can anyone help me here!

Plain text files have no magic number. However, they may contain a byte order mark. You can consider a file as a plain text file when it only contains characters valid for a certain codepage like Windows-1252 or Unicode UTF-8 and is no source code (e. g. Java, C++) / script (e. g. bash, PowerShell) / markup (e. g. XML, TeX) / config file (e. g. INI, YAML).

@KikyoShaw
Copy link

What you tell over JPEG files is not true. There also exist JPEG files with a ff d8 ff e1 magic number and an Exif data block, which come without the JFIF string.

I have tested for the file formats of JPEG, JPG, and JFIF. All the three have the same result ff d8 ff e0. kindly share your image file to test it.

You should test the Exif format, now Exif is more popular, Exif starts with FF E1 byte

@MahalaksmiSR
Copy link

Hi,
Please help me to convert from .bmp/.jpg to ff d8 ff e0 bitmap format.

Which tool need to use for conversion from .jpg to ff d8 ff e0 bitmap format.

please provide the solution.

Thanks,  

@spixi
Copy link

spixi commented Oct 11, 2023

Hi, Please help me to convert from .bmp/.jpg to ff d8 ff e0 bitmap format.

Which tool need to use for conversion from .jpg to ff d8 ff e0 bitmap format.

please provide the solution.

Thanks,

Try ImageMagick

@RevoltEnergy
Copy link

hi, sorry for possibly a silly question, you are talking about the start bytes, but how to detect the end of the file? where to find the end byte signature for a specific file extension?

@spixi
Copy link

spixi commented Oct 19, 2023

hi, sorry for possibly a silly question, you are talking about the start bytes, but how to detect the end of the file? where to find the end byte signature for a specific file extension?

The magic numbers refer to the starting bytes of a file. End bytes signatures are not very useful, because that would require you to read the entire file. Some file types end with checksums and/or padding (which is usually filling up with 0x00).

@spixi
Copy link

spixi commented Oct 23, 2023

where can i have magic number for csv file or how can i make changes to magic.mgc to identify csv file

Short answer: You can't.
Explanation: CSV files are just plaintext files. They have no special header. You can detect an XML file, because it always begins with <?xml (Unicode plaintext files may start with a Byte Order Mark), but there is no way to detect a CSV file. Only heuristic (like only usage of ASCII characters and a regular pattern of semicolons)

@RevoltEnergy
Copy link

The magic numbers refer to the starting bytes of a file. End bytes signatures are not very useful, because that would require you to read the entire file. Some file types end with checksums and/or padding (which is usually filling up with 0x00).

I suppose the different tools for file recovery from a corrupt USB or hard drive work like they do a byte-by-byte reading with the file start and end byte signature detection. correct me if I'm wrong. If I'm not, so they somehow know when the file ends...

@spixi
Copy link

spixi commented Oct 23, 2023

The magic numbers refer to the starting bytes of a file. End bytes signatures are not very useful, because that would require you to read the entire file. Some file types end with checksums and/or padding (which is usually filling up with 0x00).

I suppose the different tools for file recovery from a corrupt USB or hard drive work like they do a byte-by-byte reading with the file start and end byte signature detection. correct me if I'm wrong. If I'm not, so they somehow know when the file ends...

The size of a file is saved in the directory information of the file system. When this information is corrupt, recovery tools can try to find the boundary of files (e. g. by scanning for magic numbers), but this is not 100 % reliable, especially when a file is fragmented over multiple data block ranges.

@FLAK-ZOSO
Copy link

What about 7z? It is 37 7A BC AF 27 1C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment