Skip to content

Instantly share code, notes, and snippets.

@cmbuckley
Last active December 31, 2020 02:26
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cmbuckley/495da67b60e0453a7f18eb5060243566 to your computer and use it in GitHub Desktop.
Save cmbuckley/495da67b60e0453a7f18eb5060243566 to your computer and use it in GitHub Desktop.
<?php
$char = 'イ';
// What's the UTF-8 encoding of this character? (This file is saved in UTF-8)
var_dump(bin2hex($char)); // e382a4
// What does another encoding look like?
// See http://www.fileformat.info/info/unicode/char/30a4/charset_support.htm
var_dump(bin2hex(mb_convert_encoding($char, "EUC-JP", "UTF-8"))); // a5a4
// Notice how ISO-8859-1 isn’t listed - イ can’t be represented in that encoding
var_dump(bin2hex(mb_convert_encoding($char, "ISO-8859-1", "UTF-8"))); // 3f ("?")
// Alternative output
// Also E_NOTICE: iconv(): Detected an illegal character in input string
var_dump(iconv("UTF-8", "ISO-8859-1", $char)); // false
// So what is the HTML-ENTITIES version?
var_dump(mb_convert_encoding($char, "HTML-ENTITIES", "UTF-8")); // &#12452;
// This is just the ASCII representation of the HTML entity: & # 1 2 4 5 2 ;
var_dump(bin2hex(mb_convert_encoding($char, "HTML-ENTITIES", "UTF-8"))); // 262331323435323b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment