Skip to content

Instantly share code, notes, and snippets.

@Daniel-KM
Created June 9, 2018 05:18
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save Daniel-KM/9754f18f9632423fb1a08909e9f01c04 to your computer and use it in GitHub Desktop.
Save Daniel-KM/9754f18f9632423fb1a08909e9f01c04 to your computer and use it in GitHub Desktop.
Check and fix Unicode issues on a web server with php
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
</head>
<body>
<h1>Check Unicode on a web server with php</h1>
<p>If your filenames requires characters with diacritics or any Unicode characters and not only the strict latin characters (26 letters + 10 numbers + some symbols and punctuation marks), the web server should be checked and fixed.</p>
<h2>Checks</h2>
<h3>First character check.</h3>
<pre class="prettyprint">
$filename = 'éfilé.jpg';
if (basename($filename) != $filename) {
echo sprintf('An error occurs when testing function "basename(\'%s\') : %s".', $filename, basename($filename));
} else {
echo 'Success!';
}
</pre>
<?php
$success = 0;
$filename = 'éfilé.jpg';
if (basename($filename) != $filename) {
echo sprintf('An error occurs when testing function "basename(\'%s\') : %s".', $filename, basename($filename));
} else {
echo 'Success!';
++$success;
}
?>
<h3>Command line via web check (comparaison with a trivial function).</h3>
<pre class="prettyprint">
// @see http://www.php.net/manual/function.escapeshellarg.php#111919
function escapeshellarg_unicode($string)
{
return "'" . str_replace("'", "'\\''", $string) . "'";
}
$filename = "File~1 -À-é-ï-ô-ů-ȳ-Ø-ß-ñ-Ч-Ł-'.Test.png";
if (escapeshellarg($filename) != escapeshellarg_unicode($filename)) {
echo sprintf('An error occurs when testing function "escapeshellarg(\'%s\')": %s', $filename, escapeshellarg_unicode($filename));
} else {
echo 'Success!';
}
</pre>
<?php
// @see http://www.php.net/manual/function.escapeshellarg.php#111919
function escapeshellarg_unicode($string)
{
return "'" . str_replace("'", "'\\''", $string) . "'";
}
$filename = "File~1 -À-é-ï-ô-ů-ȳ-Ø-ß-ñ-Ч-Ł-'.Test.png";
if (escapeshellarg($filename) != escapeshellarg_unicode($filename)) {
echo sprintf('An error occurs when testing function "escapeshellarg(\'%s\')": %s', $filename, escapeshellarg_unicode($filename));
} else {
echo 'Success!';
++$success;
}
?>
<h2>Fix for Apache (here for Debian and derivative distribution)</h2>
<?php if ($success !== 2): ?>
<p>Your server is not fully compatible with Unicode. The following fix (or another one) is required.</p>
<?php else: ?>
<p>Your server is fully compatible with Unicode. The following fix is not required.</p>
<?php endif; ?>
<br />
<p>Two solutions are possible, and they require to config the file /etc/apache2/envvars, where it is indicated:</p>
<pre class="prettyprint">
## The locale used by some modules like mod_dav
export LANG=C
## Uncomment the following line to use the system default locale instead:
#. /etc/default/locale
</pre>
<ul>
<li>First solution is to uncomment the line as specified and to add a generic value for numbers to avoid other issues:</li>
<pre class="prettyprint">
. /etc/default/locale
export LC_NUMERIC=C
</pre>
<li>The second solution is more generic: don‘t uncomment the line, but replace "export LANG=C" by "C.UTF-8":</li>
<pre class="prettyprint">
#export LANG=C
export LANG="C.UTF-8"
</pre>
</ul>
<p>Don‘t forget to relaunch the server between two tests.</p>
<pre class="prettyprint">
sudo systemctl restart apache2
</pre>
<p>In fact, the default locale of Apache is "C" for historic and geographic reasons (USA based), so it should be changed to any UTF-8 compliant locale, for example the default locale of Debian, "en_US.UTF-8". Apache does not apply it by default, so it should be fixed.</p>
<p>Ideally, the default locale of Apache should be the generic "C.UTF-8", but it is not possible, because American people wouldn't understand why they would lose their "en_US.UTF-8".</p>
<br />
<script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js"></script>
</body>
</html>
@Lee76m
Copy link

Lee76m commented Jul 25, 2021

If your filenames requires characters with diacritics or any Unicode characters and not only the strict latin characters (26 letters + 10 numbers + some symbols and punctuation marks), the web server should be checked and fixed.

Checks

First character check.

$filename = 'éfilé.jpg';
if (basename($filename) != $filename) {
    echo sprintf('An error occurs when testing function "basename(\'%s\') : %s".', $filename, basename($filename));
} else {
    echo 'Success!';
}

Command line via web check (comparaison with a trivial function).

// @see http://www.php.net/manual/function.escapeshellarg.php#111919
function escapeshellarg_unicode($string)
{
    return "'" . str_replace("'", "'\\''", $string) . "'";
}
$filename = "File~1 -À-é-ï-ô-ů-ȳ-Ø-ß-ñ-Ч-Ł-'.Test.png";
if (escapeshellarg($filename) != escapeshellarg_unicode($filename)) {
    echo sprintf('An error occurs when testing function "escapeshellarg(\'%s\')": %s', $filename, escapeshellarg_unicode($filename));
} else {
    echo 'Success!';
}

Fix for Apache (here for Debian and derivative distribution)

Your server is not fully compatible with Unicode. The following fix (or another one) is required.

Your server is fully compatible with Unicode. The following fix is not required.


Two solutions are possible, and they require to config the file /etc/apache2/envvars, where it is indicated:

    ## The locale used by some modules like mod_dav
    export LANG=C
    ## Uncomment the following line to use the system default locale instead:
    #. /etc/default/locale
  • First solution is to uncomment the line as specified and to add a generic value for numbers to avoid other issues:
  • . /etc/default/locale
    export LC_NUMERIC=C
    
  • The second solution is more generic: don‘t uncomment the line, but replace "export LANG=C" by "C.UTF-8":
  • #export LANG=C
    export LANG="C.UTF-8"
    

Don‘t forget to relaunch the server between two tests.

    sudo systemctl restart apache2

In fact, the default locale of Apache is "C" for historic and geographic reasons (USA based), so it should be changed to any UTF-8 compliant locale, for example the default locale of Debian, "en_US.UTF-8". Apache does not apply it by default, so it should be fixed.

Ideally, the default locale of Apache should be the generic "C.UTF-8", but it is not possible, because American people wouldn't understand why they would lose their "en_US.UTF-8".


https://gist.github.com/Daniel-KM/9754f18f9632423fb1a08909e9f01c04#gistcomment-3827544 <script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js"></script>

@Daniel-KM
Copy link
Author

Thanks, a standard text is more readable to understand the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment