jtacoma/index.md

## index.md

      
    Raw
  

              index.md
            
          
    Unicode transformation formats (UTF) is not a character encoding.  It is a family of mutually incompatible character encodings that are each capable of expressing the full range of possible Unicode characters.
Microsoft desktop applications that deal with plain text files, e.g. Notepad and Excel, use UTF-16LE under the name Unicode.  Newer versions also offer UTF-16BE under the name Unicode big endian.  An idiosyncrasy of Microsoft applications is that the character encoding of a plain text file is declared in a byte order mark (BOM) at the beginning of the file.  This works like magic in many cases, but results in a few garbled characters at the beginning of the file when the BOM is not respected as such.
While the preferred encoding for web applications these days is UTF-8, not all platforms allow custom content to declare its character encoding.  Even Microsoft's own IIS doesn't respect the BOM.  Plain text file formats like CSS and JavaScript that, unlike XML and HTML, can't declare their own character encoding, should therefore be encoded in ASCII.  JavaScript supports Unicode escape sequences so that expressivity is not lost.