Skip to content

Instantly share code, notes, and snippets.

@masakielastic
Created October 11, 2023 08:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save masakielastic/09380f97a586bf9fe201324a50cd9d76 to your computer and use it in GitHub Desktop.
Save masakielastic/09380f97a586bf9fe201324a50cd9d76 to your computer and use it in GitHub Desktop.
Round-trip conversion between Shift_JIS (CP932) and Unicode
<?php
count_unsafe_chars('cp932');
function roundtrip($char, $enc) {
return mb_convert_encoding(mb_convert_encoding($char, 'utf-8', $enc), $enc, 'utf-8');
}
function tohexupper($char) {
return strtoupper(bin2hex($char));
}
function count_unsafe_chars($enc) {
$count = 0;
for ($cp = 0x1000; $cp < 0x10000; ++$cp) {
$c = chr($cp >> 8).chr($cp & 0xff);
if (!mb_check_encoding($c, $enc)) {
continue;
}
$ret = roundtrip($c, $enc);
if ($c !== $ret) {
echo '[', tohexupper($c), ' ', tohexupper($ret), ']';
++$count;
}
}
echo PHP_EOL, PHP_EOL;
echo $enc, PHP_EOL;
echo 'count:', $count, PHP_EOL;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment