ICU library provides some facilities to work with Unicode and Globalization. The problem is that this library is installed system-wide and it is not so trivial to update it. We are going to update only some parts of the ICU Database, but the rest of the Database can be updated in the same way. You just need to know what exactly you need to change.
This manual is not bound to PHP, it will work for other languages as well. The PHP uses php-intl
extension which uses in its turn libicu
and we are going to update some of the libicu database parts.
This manual might be not so good for Java users, because as for Java the ICU4J must be used and the approach can differ.
We'll try to solve two problems: outdated Timezone DB and IDN to ASCII conversion of domain names (IDN to Punycode conversion, idn_to_ascii()
and idn_to_urf8()
functions in PHP).
To figure out the version of libicu
installed on your system, run:
apt-cache search '^libicu'
# or
apt search '^libicu'
In my case it outputs:
libicu52 - International Components for Unicode
It means, ICU version 52 is installed in my system.
Here is the code snippet which reproduces the problem (reproducible in libicu
version 52 and probably 55):
<?php
// tztest.php
$date = new DateTime();
$x = \IntlDateFormatter::create('en', 2, 2, 'Europe/Moscow', 1, 'Y-MM-dd HH:mm:ss Z');
echo $x->format($date) . PHP_EOL;
Expected output: current time in Europe/Moscow timezone (+0300)
Actual output: +0400
idn_to_utf8
and idn_to_ascii
functions convert IDN (Unicode) domain names to IDNA ASCII (punycode) form. The problem is reproducible in libicu
till version 57 (maybe even in 58)
<?php
// idntest.php
$xn = 'xn--4s9haa.ws';
$result = idn_to_utf8($xn, IDNA_NONTRANSITIONAL_TO_ASCII, INTL_IDNA_VARIANT_UTS46);
if ($result === false) {
throw new \InvalidArgumentException("Could not convert Punycode '$xn' to IDN.");
}
echo "SUCCESS: $result\n";
Expected output: SUCCESS
Actual output: Exception 'Could not convert Punycode ... to IDN'
According to the ICU Data documentation, if the ICU_DATA
environment variable is set, the ICU Data will be loaded from that path. For example, if libicu52
is installed in your system and you run your PHP script like
ICU_DATA=/opt/icu php tztest.php
then the ICU Database will be searched in /opt/icu/icudt52l/
directory. The pattern is: /opt/icu/icudt<version><byte ordering>/
, where:
<version>
is the version of libicu installed in your system<byte ordering>
can bel
,b
ore
. See Sharing ICU Data Between Platforms. For regular x86_64 platform it will bel
.
So, the algorithm for finding the ICU data is the following:
- If the
ICU_DATA
is not set, load the data directly fromlibicudata.so.52
- If the
ICU_DATA
is set, try to load the data from$ICU_DATA/icudt52l
- If the requested file is not found or the directory doesn't exist, fall back to
libicudata.so.52
There is a great article (in Russian) about that. In two words, using strace
we can find which files are requested by the libicu
:
export ICU_DATA=/opt/icu
strace php -f tztest.php
(Note that on Debian Jessie when strace -o 'output.txt' php -f tztest.php
is used instead of the example above, the libicu
ignores ICU_DATA
. Would be interesting to figure out why.)
In the strace
log you'll see the following:
stat("/opt/icu/icudt52l/zoneinfo64.res", 0x7ffe20d1a7c0) = -1 ENOENT (No such file or directory)
stat("/opt/icu/icudt52l/timezoneTypes.res", 0x7ffe20d1a120) = -1 ENOENT (No such file or directory)
stat("/opt/icu/icudt52l/metaZones.res", 0x7ffe20d1a450) = -1 ENOENT (No such file or directory)
... and many more
According to the official documentation Updating the Time Zone Data, we need just 4 files:
- zoneinfo64.res
- windowsZones.res
- timezoneTypes.res
- metaZones.res
They can be downloaded from the repository. Just click on the latest year, then 44
, then le
for little-endian
systems and place those 4 files into /opt/icu/icudt52l/
directory. The complete download link.
Check the output of the tztest.php
, it will contain "+0300" instead of "+0400". Also you can check the strace
output to be sure the files are loaded:
stat("/opt/icu/icudt52l/metaZones.res", {st_mode=S_IFREG|0644, st_size=40960, ...}) = 0
open("/opt/icu/icudt52l/metaZones.res", O_RDONLY) = 4
stat("/opt/icu/icudt52l/zoneinfo64.res", {st_mode=S_IFREG|0644, st_size=151872, ...}) = 0
open("/opt/icu/icudt52l/zoneinfo64.res", O_RDONLY) = 4
stat("/opt/icu/icudt52l/timezoneTypes.res", {st_mode=S_IFREG|0644, st_size=20032, ...}) = 0
open("/opt/icu/icudt52l/timezoneTypes.res", O_RDONLY) = 4
First, run strace
to see which files are requested by the libicu
:
export ICU_DATA=/opt/icu
strace php -f idntest.php
Grep by /opt/icu
and here it is:
stat("/opt/icu/icudt52l/uts46.nrm", 0x7fffc812b2c0) = -1 ENOENT (No such file or directory)
This file is "Unicode Character Data (Normalization since ICU 4.4)" , see ICU Data File Formats for more description.
Where to get this file from? On the Download page, click on the latest ICU version under the ICU4C
column (as for the mid 2017, version 59 is the latest). Find the link to the repository
The needed uts46.nrm
file will be in icu4c/source/data/in
. Just download it from there (or clone the whole repo, of course) and place into /opt/icu/icudt52l/
. That repository directory contains some pre-compiled Unicode Database, so you won't need to compile the whole ICU Database.
Now if you run the script
export ICU_DATA=/opt/icu
php -f idntest.php
you'll see the SUCCESS output. Lets check strace
:
stat("/opt/icu/icudt52l/uts46.nrm", {st_mode=S_IFREG|0644, st_size=60668, ...}) = 0
open("/opt/icu/icudt52l/uts46.nrm", O_RDONLY) = 4
If you need other latest parts of the ICU DB, it seems, you'll need to compile the whole DB yourself. If you don't want to do that, you can use the ICU Data Library Customizer tool. But as of mid 2017, the latest available ICU version is 57 in that tool.
Select the ICU version you need (the latest, of course!), click "Get Data Library" button and download the zip archive, extract it and you'll get the .dat
file, lets say, icudt57l.dat
. This file is an archive containing everything.
To see what is inside, be sure you installed icu-devtools
package:
icupkg -l icudt57l.dat
In order to extract any file:
icupkg -x uts46.nrm icudt57l.dat
So you can extract some files and place them on your server's /opt/icu/icudt52l
directory.
Strace
'ing it further, it seems that sometimes the libicu
tries to open /opt/icu/icudt52l.dat
file. I tried to feed this file (renaming icudt57l.dat
into icudt52l.dat
). But it didn't work as expected, although the file was loaded accroding to the strace
. The reason for that could be that the .dat
files have different formats depending on the ICU version.