Skip to content

Instantly share code, notes, and snippets.

@magnetikonline
Last active August 15, 2017 23:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save magnetikonline/251fa678668b2898bd9f9c8c28d9ff96 to your computer and use it in GitHub Desktop.
Save magnetikonline/251fa678668b2898bd9f9c8c28d9ff96 to your computer and use it in GitHub Desktop.
Unicode and the Byte order mark (BOM).

Unicode and the Byte order mark (BOM)

The byte order mark (BOM) is a Unicode character(U+FEFF) at the start of a text stream/file to signify the endianness and encoding used to a high level of confidence.

Examples

# utf-8
$ hexdump -C utf-8-bom.txt
00000000  ef bb bf 54 65 73 74 20  73 74 72 69 6e 67 20 e2  |...Test string .|
00000010  98 ba 0a                                          |...|
00000013

# utf-16 - big endian
$ hexdump -C utf-16be-bom.txt
00000000  fe ff 00 54 00 65 00 73  00 74 00 20 00 73 00 74  |...T.e.s.t. .s.t|
00000010  00 72 00 69 00 6e 00 67  00 20 26 3a 00 0a        |.r.i.n.g. &:..|
0000001e

# utf-16 - little endian
$ hexdump -C utf-16le-bom.txt
00000000  ff fe 54 00 65 00 73 00  74 00 20 00 73 00 74 00  |..T.e.s.t. .s.t.|
00000010  72 00 69 00 6e 00 67 00  20 00 3a 26 0a 00        |r.i.n.g. .:&..|
0000001e

Therefore, byte order marks are:

  • UTF-8: ef,bb,bf
  • UTF-16-BE: fe,ff
  • UTF-16-LE: ff,fe

Reference

Test string ☺
Test string ☺
Test string ☺
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment