Skip to content

Instantly share code, notes, and snippets.

@heiglandreas
Created April 11, 2022 07:55
Show Gist options
  • Save heiglandreas/86c7ccfaceb0ccf355a03c01326fae4c to your computer and use it in GitHub Desktop.
Save heiglandreas/86c7ccfaceb0ccf355a03c01326fae4c to your computer and use it in GitHub Desktop.
Never use "unicode" as Content-Type
<?php
declare(strict_types=1);
$string = <<<'EOF'
Bar
<meta content="text/html; charset=unicode" http-equiv="Content-Type">
Foo
EOF;
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($string);
$f = $dom->saveHTML();
echo $f . PHP_EOL . PHP_EOL;
$weirdStuff = preg_match('/>([\&\#a-zA-Z0-9\;]+)</', $f, $result);
echo $result[1] . PHP_EOL . PHP_EOL;
$utf8 = mb_convert_encoding($result[1], 'UTF-8', 'HTML-ENTITIES');
echo $utf8 . PHP_EOL . PHP_EOL;
echo iconv('UTF-8', 'unicode', $utf8) . PHP_EOL . PHP_EOL;
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Bar
   <meta content="text/html; charset=unicode" http-equiv="Content-Type">&#2622;&dagger;&#17952;&#28527;</p></body></html>


&#2622;&dagger;&#17952;&#28527;

ਾ†䘠潯

��>
   Foo

// Check out https://3v4l.org/kJ0tV

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Bar
   <meta content="text/html; charset=unicode" http-equiv="Content-Type">&#15882;&dagger;&#8262;&#28527;</p></body></html>


&#15882;&dagger;&#8262;&#28527;

㸊†⁆潯


Warning: iconv(): Wrong encoding, conversion from "UTF-8" to "unicode" is not allowed in /in/kJ0tV on line 22


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment