Skip to content

Instantly share code, notes, and snippets.

@superhero
Last active April 23, 2024 06:14
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save superhero/6906377 to your computer and use it in GitHub Desktop.
Save superhero/6906377 to your computer and use it in GitHub Desktop.
UTF-8 Encoding Debugging
<?php
$search = array(
'á',
'ä',
'Ä',
'ç',
'é',
'É',
'è',
'ì',
'ê',
'í',
'ï',
'Ä©',
'ó',
'ø',
'ö',
'Ö',
'Å¡',
'ü',
'Lú',
'Å©',
'ñ',
'Ã¥',
'ä',
'ö',
'Ã…',
'Ä',
'Ö',
'é',
'ø',
'æ',
'Ø',
'õ',
'•',
'ú',
'Ã',
'Ã',
'Ç',
'â€',
'“',
'É',
'”',
'Ù',
'„',
'´',
'†',
'ÿ',
'ë',
'›',
'À',
'Â',
'Ã',
'È',
'É',
'Ê',
'Ë',
'Ì',
);
$replace = array(
'á',
'ä',
'ä',
'ç',
'é',
'É',
'è',
'ě',
'ê',
'í',
'ï',
'ĩ',
'ó',
'ø',
'ö',
'ö',
'š',
'ü',
'ú',
'ũ',
'ñ',
'å',
'ä',
'ö',
'Å',
'Ä',
'Ö',
'©',
'œ',
'æ',
'Ø',
'õ',
'-',
'ú',
'À',
'Ã',
'Ç',
'"',
'"',
'É',
'ö',
'Ù',
'Ä',
'ô',
'Æ',
'ÿ',
'ë',
'Û',
'À',
'Â',
'Ã',
'È',
'É',
'Ê',
'Ë',
'Ì',
);
$handle = @fopen("in.txt", "r");
$file = @fopen("out.txt", "w");
if ($handle)
{
while (($buffer = fgets($handle)) !== false)
{
$buffer = str_replace($search, $replace, $buffer);
fwrite($file, $buffer);
}
fclose($handle);
}
@rockneters
Copy link

�“€( 0£ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿàl� ��ì�½Ò÷·[ÑÞôû¹ï»½}sµÏ>õíß}ë^Ýuéݺ·næéÈ÷r»»Çϻޮø÷q<×V÷žïi×»w»iõ½mGºµ¾·]Þç—mwg{{¶Ã¯]ó;¾çmßv»¼ù}ïMÞ�=#Ñõݾ޽åŽç]ìÏY»ÚÖæ¶÷»kͯ´ùï·{½y®Û{›ï³ï»{O�³¾÷=×Þëï|Ò÷ß7·;Ú÷Ö>Ù}ŒVÓÛ{<ºïZöö¼Þö÷×{ëï}ö¾s�³<íï•ç�-ït®î»®nöo{£½ŒÜôqÛ{ÝÞmÉï:�Ûq»³�{ÞöÞÖûÞ:û]ßt¯m¯hק}ßu÷Ï·¯{�_+ÛÓ½Þ[yMw¼÷z7��î®ã»µ+�ëK^ëÝÝÍïMÞöêîî�³{Ù|{¼÷|ë×»Î=½w·­¹×ww½nO{ÞïVœç½Þ}Ûî <uSý L F˜ ‚dÄÀj FL �dÈ` �À ©âa�™4Á0
4h1 4É=

@rkok
Copy link

rkok commented Apr 23, 2024

Thanks for sharing this. In my case, I had an even more severely garbled MySQL table where "é" had become "é", "è" had become "è", etc.

I first had to convert that using the query shown here: https://stackoverflow.com/a/74092827/3018750, and then convert that down using your conversion table.

That whole thing converted to an SQL query became:

SELECT 

REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(

CONVERT(BINARY(CONVERT(CONVERT(post_content USING utf8mb4) USING latin1)) USING utf8mb4),

'á', 'á'), 'ä', 'ä'), 'Ä', 'ä'), 'ç', 'ç'), 'é', 'é'), 'É', 'É'), 'è', 'è'), 'ì', 'ě'), 'ê', 'ê'), 'í', 'í'), 'ï', 'ï'), 'Ä©', 'ĩ'), 'ó', 'ó'), 'ø', 'ø'), 'ö', 'ö'), 'Ö', 'ö'), 'Å¡', 'š'), 'ü', 'ü'), 'Lú', 'ú'), 'Å©', 'ũ'), 'ñ', 'ñ'), 'Ã¥', 'å'), 'ä', 'ä'), 'ö', 'ö'), 'Ã…', 'Å'), 'Ä', 'Ä'), 'Ö', 'Ö'), 'é', '©'), 'ø', 'œ'), 'æ', 'æ'), 'Ø', 'Ø'), 'õ', 'õ'), '•', '-'), 'ú', 'ú'), 'Ã', 'À'), 'Ã', 'Ã'), 'Ç', 'Ç'), 'â€', '"'), '“', '"'), 'É', 'É'), '”', 'ö'), 'Ù', 'Ù'), '„', 'Ä'), '´', 'ô'), '†', 'Æ'), 'ÿ', 'ÿ'), 'ë', 'ë'), '›', 'Û'), 'À', 'À'), 'Â', 'Â'), 'Ã', 'Ã'), 'È', 'È'), 'É', 'É'), 'Ê', 'Ê'), 'Ë', 'Ë'), 'ÃŒ', 'Ì'), '„', 'Ä'), '†', 'Æ'), '”', 'ö'), '›', 'Û'), 'ÃŒ', 'Ì');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment