Skip to content

Instantly share code, notes, and snippets.

@makryl
Last active February 23, 2020 22:30
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save makryl/9092213 to your computer and use it in GitHub Desktop.
Save makryl/9092213 to your computer and use it in GitHub Desktop.
file_get_contents with detect any UTF or one specified 8-bit encoding (default Windows-1251)
<?php
file_get_contents_utf_ansi($filename, $defAnsiEnc = 'Windows-1251')
{
$buf = file_get_contents($filename);
if (substr($buf, 0, 3) == "\xEF\xBB\xBF") return substr($buf,3);
else if (substr($buf, 0, 2) == "\xFE\xFF") return mb_convert_encoding(substr($buf, 2), 'UTF-8', 'UTF-16BE');
else if (substr($buf, 0, 2) == "\xFF\xFE") return mb_convert_encoding(substr($buf, 2), 'UTF-8', 'UTF-16LE');
else if (substr($buf, 0, 4) == "\x00\x00\xFE\xFF") return mb_convert_encoding(substr($buf, 4), 'UTF-8', 'UTF-32BE');
else if (substr($buf, 0, 4) == "\xFF\xFE\x00\x00") return mb_convert_encoding(substr($buf, 4), 'UTF-8', 'UTF-32LE');
else if (mb_detect_encoding(trim($buf), $defAnsiEnc)
|| utf8_encode(utf8_decode($buf)) != $buf) return mb_convert_encoding($buf, 'UTF-8', $defAnsiEnc);
else return $buf;
}
@feloy
Copy link

feloy commented Feb 4, 2015

Seems that UTF-32LE will be detected as UTF-16LE because of the order of the tests.
Very interesting function though. Got it! ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment