Skip to content

Instantly share code, notes, and snippets.

@kasperkamperman
Created July 3, 2018 11:58
Show Gist options
  • Save kasperkamperman/198c6389840532b96069ba6a776d69e6 to your computer and use it in GitHub Desktop.
Save kasperkamperman/198c6389840532b96069ba6a776d69e6 to your computer and use it in GitHub Desktop.
Function to fix ut8 special characters displayed as 2 characters (utf-8 interpreted as ISO-8859-1 or Windows-1252)
<?php header('Content-Type: text/html; charset=utf-8'); ?>
<html>
<head>
<title>Fix wrong encoded UTF8 characters</title>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
</head>
<body>
<pre>
<?php
/* Problem description:
A common problem is for characters encoded as UTF-8 to have their individual bytes interpreted as ISO-8859-1 or Windows-1252.
Instead of an expected character, a sequence of Latin characters is shown, typically starting with à or Â. For example, instead of "è" these characters occur: "è".
A Web page is encoded as UTF-8 characters. The Web server mistakenly declares the charset to be ISO-8859-1 in the HTTP protocol that delivers the page to the browser.
The browser will then display each of the UTF-8 bytes in the Web page as Latin-1 characters.
source: http://www.i18nqa.com/debug/bug-utf-8-latin1.html
code source: https://github.com/devgeniem/wp-sanitize-accented-uploads/blob/master/plugin.php#L152
table source: http://www.i18nqa.com/debug/utf8-debug.html
https://www.kasperkamperman.com/ 2018-07-03
*/
$str = 'BLØF - ZOUTELANDE, MØ - FINAL SONG, Fédération Camerounaise de Football, It’s Getting the Best of Me';
echo "original string: ".$str.'<br/>';
echo "fixed string: ".fixWrongUTF8Encoding($str).'<br/>';
// displays: BLØF - ZOUTELANDE, MØ - FINAL SONG, Fédération Camerounaise de Football, It’s Getting the Best of Me
function fixWrongUTF8Encoding($inputString) {
// code source: https://github.com/devgeniem/wp-sanitize-accented-uploads/blob/master/plugin.php#L152
// table source: http://www.i18nqa.com/debug/utf8-debug.html
$fix_list = array(
// 3 char errors first
'‚' => '‚', '„' => '„', '…' => '…', '‡' => '‡',
'‰' => '‰', '‹' => '‹', '‘' => '‘', '’' => '’',
'“' => '“', '•' => '•', '–' => '–', '—' => '—',
'â„¢' => '™', '›' => '›', '€' => '€',
// 2 char errors
'Â' => 'Â', 'Æ’' => 'ƒ', 'Ã' => 'Ã', 'Ä' => 'Ä',
'Ã…' => 'Å', 'â€' => '†', 'Æ' => 'Æ', 'Ç' => 'Ç',
'ˆ' => 'ˆ', 'È' => 'È', 'É' => 'É', 'Ê' => 'Ê',
'Ë' => 'Ë', 'Å’' => 'Œ', 'ÃŒ' => 'Ì', 'Ž' => 'Ž',
'ÃŽ' => 'Î', 'Ñ' => 'Ñ', 'Ã’' => 'Ò', 'Ó' => 'Ó',
'â€' => '”', 'Ô' => 'Ô', 'Õ' => 'Õ', 'Ö' => 'Ö',
'×' => '×', 'Ëœ' => '˜', 'Ø' => 'Ø', 'Ù' => 'Ù',
'Å¡' => 'š', 'Ú' => 'Ú', 'Û' => 'Û', 'Å“' => 'œ',
'Ãœ' => 'Ü', 'ž' => 'ž', 'Þ' => 'Þ', 'Ÿ' => 'Ÿ',
'ß' => 'ß', '¡' => '¡', 'á' => 'á', '¢' => '¢',
'â' => 'â', '£' => '£', 'ã' => 'ã', '¤' => '¤',
'ä' => 'ä', 'Â¥' => '¥', 'Ã¥' => 'å', '¦' => '¦',
'æ' => 'æ', '§' => '§', 'ç' => 'ç', '¨' => '¨',
'è' => 'è', '©' => '©', 'é' => 'é', 'ª' => 'ª',
'ê' => 'ê', '«' => '«', 'ë' => 'ë', '¬' => '¬',
'ì' => 'ì', '®' => '®', 'î' => 'î', '¯' => '¯',
'ï' => 'ï', '°' => '°', 'ð' => 'ð', '±' => '±',
'ñ' => 'ñ', '²' => '²', 'ò' => 'ò', '³' => '³',
'ó' => 'ó', '´' => '´', 'ô' => 'ô', 'µ' => 'µ',
'õ' => 'õ', '¶' => '¶', 'ö' => 'ö', '·' => '·',
'÷' => '÷', '¸' => '¸', 'ø' => 'ø', '¹' => '¹',
'ù' => 'ù', 'º' => 'º', 'ú' => 'ú', '»' => '»',
'û' => 'û', '¼' => '¼', 'ü' => 'ü', '½' => '½',
'ý' => 'ý', '¾' => '¾', 'þ' => 'þ', '¿' => '¿',
'ÿ' => 'ÿ', 'À' => 'À',
// 1 char errors last
'Ã' => 'Á', 'Å' => 'Š', 'Ã' => 'Í', 'Ã' => 'Ï',
'Ã' => 'Ð', 'Ã' => 'Ý', 'Ã' => 'à', 'í' => 'í'
);
$error_chars = array_keys($fix_list);
$real_chars = array_values($fix_list);
return str_replace($error_chars, $real_chars, $inputString);
}
?>
</pre>
</body>
</html>
@georgechalhoub
Copy link

That's very useful, thank you!

@Mehdise00
Copy link

thank you 🙏

@Mrcel01
Copy link

Mrcel01 commented May 31, 2023

p á�ÐMâ � ã¨��ã´�Gá�� ã���å��Gâ� @ã���åÔ��ã �å �@ã(  ã ��å@��ã��@ã�; ãgÅ�ë  ã�Рက½èð@-é p�â
-é8ÐMâ  ãÄû�ë\� ã�ë�ë� �åGº�ë(»
ãI@�ã�°@ã �å ›å�@@ã�� á�ž�ë$ �å�� á ›å�ž�ëL�ã( �å ›å�@ã�� áý��ë$ �å�� á ›åü��ëS �ã �å ›å� @ã
� áó��ë € á ›å
� áò��ë[P�ã @ á ›å�P@ã�� áé��ë  á ›å�� áè��ë� �åE��ã��å��@ã�@�å<0 ã�€�å �å �å$ �å� �å( �å� �å$ �å �å� �å �åÒ£ ë¨��ã � ã� @ã �å�ÐGâ
½èð@½è°Å�ê�@-é�p�â ÐMâ�@ á�0—å��—å�0�å�0 á À—å ��èJ†�ëÐ��ãô@€å �@ã���â �€å�ÐGâ�€½è¼†�êþÞÿçþÞÿç�  á � ãàŸ�ê�ÿ/á  ã�ÿ/á  ã�ÿ/áð@-é p�â�€-å�ÐMâ @ áŒ��å  ã Pã ����”� Q�� ��  á�ÐGâ�€�äð€½èÐ�”å@¢�ë Pã÷ÿÿ���”å�% ãŒ�”å�„à���â���à��€à ��å�  áÚ& ëœ�”å P á˜�”å���â���à��€à ��å�  áÑ& ë? UãQ Š��„à? Pã(V‘å< Š UãK
�„à(��å Qã8
†�…âŒ4”åCŠ€â  ã Sã �˜å� ˜å� �d”åô�”å��â†a†à†1ƒà  ã 0“å� Pá� ˜Ä”å  ã \ãÃÿÿ
œ�”å� â€�€à€�Œà �å Sá¼ÿÿ�¨d”å¤�”å��â†a†á†�€à �å Sà��� Ià Pã Ê Vã� �� Õå Pã� Ñ� P�( �Ô�”åï¡�ë PãA  ã�  á�ÐGâ�€�äð€½è Uã�
-ê�ë´ �ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ãÆ! ã,¥�ëì¤�ë ð ãýÿÿê�ê�ë¡ �ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ãÅ! ã�¥�ëݤ�ë ð ãýÿÿê� Õå Pã� � ˜å YáÔÿÿ�� Õå Pã� �Ô�”彡�ë Pã�
��”å  ãŒ�”å���âœ4”å˜$”å���à��€à���â���à �å��‚à �‘å� Pá �dÿÿê� Õå Pã� � ˜å �˜å� Pá� êØ�”墡�ë Pã  ãXÿÿ���Õå  ãè�Äå Qã

�  á � ã�  ã¿% 댔”å��”å ™å� Qá������„�� êÐ�”å„&�ã @ã � ã¶¡�댔”å��”åÐ4”åø� ãD•å²PÕá ˜å°P„á� ƒâÐ�„å� ‚â ˆå� �âðd„å� ã€�€à€�‰à �åô�„å�  á�ÐGâ�€�äð€½è  ã�ÿ/áð@-é p�â �-é�ÐMâ @ á˜��å P ã Pã ��œ�”� Q�� ��  á�ÐGâ �½èð€½èÔ�”å_¡�ë Pã÷ÿÿ�œ�”å�% ã˜�”å���â���à��€à ��å� „àú% ë? Pã� Š �„à(f�å Vã�
†�†âCŠ€â �˜å� ˜å QáâÿÿšØ�”åÜ$”å� QáÞÿÿ�h!–å Rã�
�0Öå$ –å��Öå Sã'

1Šàx!�ã� @ãƒ1†à� Óç Rã � Yã �Ìÿÿêyé�ë´ �ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ã×# ãx¤�ë8¤�ë ð ãýÿÿêjé�ëû �ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ãï# ãi¤�ë)¤�ë ð ãýÿÿê Yã�
���âØ�„å˜�”å� €â� ˆå Qã�
œ�”å� €â� âœ$„å€�€à€��à �å� ê�  ã �å��„å�  á�ÐGâ �½èð€½èœ�”å˜�”å���â¨4”å¤$”å���à��€à���â���à �å��‚à �‘å� Páˆÿÿ�� Öå PãN
H –å� Pã Ö� P�I
”å  ãÀ�”å P ã@0�å�  á3ÿ/á˜�”åêTÄå Pãœ�”� ��� Q�����������€��P�� �•å Qã,�•åX"��ØT”åp&�����àœ$”å 0‘å���â���á��€àÇ��ã��@ã �å�  á P�å°� ëœ$”å�P ã˜�”å� �â��”å¼3Öá‚!‚á Åá��„å‚��à��Öå(–å$0�å�%”å 0‘å� �å� CàÔ�”å�/oá  �åD�å¢" áH�Áå, �å„&�ã @ã � ã�5„åÒ �ëØ�”å��˜å� €âØ�„å� �â� ˆå�  á�ÐGâ �½èð€½èØ�”åš �ë Pã2ÿÿ�°ÿÿêð@-é p�â
-é�ÐMâ P á¤��å @ ã Pã�
 á �å¨�¶å Qá� Ø�•å‡ �ë Pã� �  á�ÐGâ ½èð€½è¨�•å�% ã¤�•å���â���à��€à ��å� …à�% ë? Pã� Š �…à(��å Pã� "�€âCº�â�  á ›å��ºå� Ráåÿÿ �€Ðå Xã� ¤$•å���âÜ�•å Rã �Šå� €âÜ�…åÚÿÿ –å�  á �’å� PáÕÿÿ � ê¡è�ëÿ �ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ã›$ ã £�ë£�ë ð ãýÿÿê� Ðå Pã�
Ø�•å„&�ã @ã � ão �ëÜ�•å šå���âÜ�…å��oá� €â Šå¡B á�  á�ÐGâ
½èð€½èÔ�•å5 �ë Pãêÿÿ
›å��›å� @à� Pã¨ÿÿ�äÿÿêð@-é p�â
-é ÐMâ @ á¬� ã „àè�Ôåˆ� ã Pã� „ ��’å Qã� ’� 0‘�� R�� �cè�ë´��ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ã
& ãb£�ë"£�ë ð ãýÿÿê� �â(P„â81�ã‚!‚à�0@ã‚��à�  á� á �¶å� Fâ� �å���è���ã��@ã�  áê� ë�� ã �–å „àŸ$ ë� pã& ? Pã� Š �„à(��å Qã� h�‘å Pã' è4Ôå%�ã •”å –å Sã�‰çC ���Ñå Yã,
�5”å� ã�0Æá�5„å- ê&è�ëw��ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ã�& ã%£�ëå¢�ë ð ãýÿÿê
�ã8!�ã� @ã� @ã�� á¿¢�ë  ã�ÐGâ
½èð€½è
è�ëû �ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ã�& ã £�ëÌ¢�ë ð ãýÿÿê�5Ôå� ã Sã�5”��5”��0Æá�$”åŒä”å� �â�5„å�Ñå‚!‚á¸ÃÑá °‘å‚!Žà�•Äå�eÄå °‚å$À‚å�0‚å �å �‘å �ˆå
� ál0’å àã3ÿ/á Pã�
áç�ë©
�ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ãW& ãà¢�ë ¢�ë ð ãýÿÿê���ã81�ã��@ã�0@ã�  á�  áq� ë  ã�ÐGâ
½èð€½èð@-é p�â�€-å ÐMâ @ á¸� ã „àê�Ôå”� ã Pã� „ ��’å Qã� ’� 0‘�� R�� �¶ç�ë[ �ãc��ã� @ãÞ7 ã �åº� ã @ã��@ã 0@ã¸& ãµ¢�ëu¢�ë ð ãýÿÿê� �â(€„âI1�ã‚!‚à�0@ã‚Q�à�  á �µå�`Eâ� �åB �è���ã��@ã�  á>� ë�� ã �•å „àó# ë� pã1
? Pã Š �„à(��å Pã�
h��å Pã2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment