Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
File Magic Numbers

File Magic Numbers

Magic numbers are the first bits of a file which uniquely identify the type of file. This makes programming easier because complicated file structures need not be searched in order to identify the file type.

For example, a jpeg file starts with ffd8 ffe0 0010 4a46 4946 0001 0101 0047 ......JFIF.....G ffd8 shows that it's a JPEG file, and ffe0 identify a JFIF type structure. There is an ascii encoding of "JFIF" which comes after a length code, but that is not necessary in order to identify the file. The first 4 bytes do that uniquely.

This gives an ongoing list of file-type magic numbers.

Image Files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
Bitmap format .bmp 42 4d BM
FITS format .fits 53 49 4d 50 4c 45 SIMPLE
GIF format .gif 47 49 46 38 GIF8
Graphics Kernel System .gks 47 4b 53 4d GKSM
IRIS rgb format .rgb 01 da ..
ITC (CMU WM) format .itc f1 00 40 bb ....
JPEG File Interchange Format .jpg ff d8 ff e0 ....
NIFF (Navy TIFF) .nif 49 49 4e 31 IIN1
PM format .pm 56 49 45 57 VIEW
PNG format .png 89 50 4e 47 .PNG
Postscript format .[e]ps 25 21 %!
Sun Rasterfile .ras 59 a6 6a 95 Y.j.
Targa format .tga xx xx xx ...
TIFF format (Motorola - big endian) .tif 4d 4d 00 2a MM.*
TIFF format (Intel - little endian) .tif 49 49 2a 00 II*.
X11 Bitmap format .xbm xx xx
XCF Gimp file structure .xcf 67 69 6d 70 20 78 63 66 20 76 gimp xcf
Xfig format .fig 23 46 49 47 #FIG
XPM format .xpm 2f 2a 20 58 50 4d 20 2a 2f /* XPM */

Compressed files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
Bzip .bz 42 5a BZ
Compress .Z 1f 9d ..
gzip format .gz 1f 8b ..
pkzip format .zip 50 4b 03 04 PK..

Archive files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
TAR (pre-POSIX) .tar xx xx (a filename)
TAR (POSIX) .tar 75 73 74 61 72 ustar (offset by 257 bytes)

Excecutable files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
MS-DOS, OS/2 or MS Windows   4d 5a MZ
Unix elf   7f 45 4c 46 .ELF

##Miscellaneous files

File type Typical
extension
Hex digits
xx = variable
Ascii digits
. = not an ascii char
pgp public ring   99 00 ..
pgp security ring   95 01 ..
pgp security ring   95 00 ..
pgp encrypted data   a6 00 ¦.

Source

@qti3e
Copy link

qti3e commented Sep 3, 2017

Hi
I've also published another gist but all data is in JSON format:
https://gist.github.com/Qti3e/6341245314bf3513abb080677cd1c93b

@spixi
Copy link

spixi commented Apr 4, 2021

What you tell over JPEG files is not true. There also exist JPEG files with a ff d8 ff e1 magic number and an Exif data block, which come without the JFIF string.

@BathriNathan
Copy link

BathriNathan commented Aug 13, 2021

What you tell over JPEG files is not true. There also exist JPEG files with a ff d8 ff e1 magic number and an Exif data block, which come without the JFIF string.

I have tested for the file formats of JPEG, JPG, and JFIF. All the three have the same result ff d8 ff e0. kindly share your image file to test it.

@spixi
Copy link

spixi commented Aug 13, 2021

There are two different metadata formats for JPEG files: JFIF and EXIF. JFIF starts with ff d8 ff e0 and EXIF starts with ff d8 ff e1.

@BathriNathan
Copy link

BathriNathan commented Aug 16, 2021

There are two different metadata formats for JPEG files: JFIF and EXIF. JFIF starts with ff d8 ff e0 and EXIF starts with ff d8 ff e1.

thanks for sharing the link for an explanation, I can now understand that we have some other metadata formats for JPEG however I could not still get the image file from the link. do you have any solution for this. can we add ff d8 ff e0 ff d8 ff e1 ff d8 ff e2 ff d8 ff e8 for JPEG validation?

@spixi
Copy link

spixi commented Aug 17, 2021

@BathriNathan Here are two example files.
JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=13, manufacturer=Panasonic, model=DMC-TZ10, orientation=upper-left, xresolution=196, yresolution=204, resolutionunit=2, software=Ver.1.1 , datetime=2014:07:03 16:26:13], baseline, precision 8, 4000x3000, components 3
JPEG image data, JFIF standard 1.01, resolution (DPI), density 180x180, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=11, manufacturer=Panasonic, model=DMC-TZ10, orientation=upper-left, software=Ver.1.1 , datetime=2014:07:15 13:17:54], baseline, precision 8, 2597x1948, components 3

I would consider all headers from ff d8 ff e0 to ff d8 ff ef as valid JPEG files. I could find further examples on the Internet:
e2 Canon
e3 Samsung
e4 (maybe a corruption, so no 100 % evidence, see espressif/esp32-camera#4 )
e8 SPIFF (Still Picture Interchange File Format)
ee unknown, mentioned on Wikipedia

The JPEG specification can be found here. If you look at page 32, you can see that ff e0 to ff ef are used for application segments. If there is no application segment at all (which is totally fine, since application metadata are optional), you could expect ff db (define quantization table(s)) after ff d8 (start of image), which is also mentioned as valid on the Wikipedia article.

@BathriNathan
Copy link

BathriNathan commented Aug 17, 2021

@spixi thanks for sharing the image. so now to be in the safe side, I will also consider the other number as you said in the validation. do we have this problem only in the JPEG or also in PNG and GIF?

@spixi
Copy link

spixi commented Aug 17, 2021

@BathriNathan Although PNG allows a wide range of header fields, the magic word .PNG always appears in the first four bytes. So everything is fine. GIF always starts with GIF87a or GIF89a, which have both GIF8 in common. Everything is fine. I don’t know much about TIFF, because it has many variants like Exif, sDCF, TIFF/EP, TIFF/IT an GeoTIFF and also may come with different byte orders in the header.

@BathriNathan
Copy link

BathriNathan commented Aug 19, 2021

@spixi I have all the mandatory hex numbers in my validation for JPEG, PNG, and GIF as per our discussion. It is working fine as expected.

@racg0092
Copy link

racg0092 commented Mar 26, 2022

What about for .pdf file what are the magic numbers in that case ? Is it <Buffer 25 50 44 46> ?

@leommoore
Copy link
Author

leommoore commented Mar 26, 2022

@racg0092 If you check out the Wikipedia list of file signatures it looks like the signature for a .pdf is 25 50 44 46 2D OR DF BF 34 EB CE

@BathriNathan
Copy link

BathriNathan commented Mar 26, 2022

@racg0092
Copy link

racg0092 commented Mar 26, 2022

@leommoore @BathriNathan

Thanks for feedback ! Much appreciated it.

@merajhasan88
Copy link

merajhasan88 commented Jul 13, 2022

My JPEG has several ffd8. One at the start and several throughout the image. What are the middle ones supposed to be?

Edit: these not markers. But rather 'ff' may be from one marker and 'd8' from another and they would be adjacent by chance.
And no, these aren't thumbnails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment