Steffen Winkler perl-ws@steffen-winkler.de
Since 1960, I exist.
I've been programming Perl since the end of 2000, first privately and then professionally.
Currently I am working at SIEMENS AG in Erlangen, primarily in the area of web programming.
I have been attending the German Perlworkshop since 2003.
Why use Locale::TextDomain when so many frameworks on CPAN use Locale::Maketext?
Following my presentation on DBD::PO in Frankfurt/Main there was a lively discussion, both in Frankfurt and at Erlangen-PM.
There are 2 internationalization frameworks on CPAN, Locale::TextDomain (Perl interface to Uniform Message Translation) and Locale::Maketext (framework for localization).
What are the differences?
Where are the limits?
From source to multilingual application in 2 ways.
No matter what internationalization framework from the CPAN you use you have to live with limitations. A good choice greatly reduces them.
print 'You can log out here.';
printf 'He lives in %s, %s.', $town, $address;
printf '%d people live here.', $people;
printf 'These are %d books.', $books;
printf 'He has %s houses in %s, %s.', $houses, $town, $address;
printf '%s books are on %s shelves.', $books, shelves;
PO is an abbreviation for "portable object".
GNU gettext PO files can be used to make programs multilingual.
Along with the original text and its translation the file contains various comments and flags.
MO files are the binary version of PO files.
Here we use the basic module Locale::Maketext and a module which reads gettext PO/MO files, namely Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exports the function "loc".
[_n] where n = 1, 2, ...
is the general notation for placeholders. Within [] a function name can be used as a prefix followed by its parameters, separated by ",". "quant", or "*", is the function name for plural processing.
print loc('You can log out here.');
print loc(
'He lives in [_1], [_2].',
$town,
$address,
);
print loc(
'[quant,_1,person lives,people live] here.',
$people,
);
I have no idea how to write the following phrase with "quant". With "quant" you write something along the lines of value followed by unit. But here the plural form starts before the value. The problem is that "quant" requires the omission of "_1" in the plural forms and also the omission of the following space.
print loc(
'[myplural,_1,It is _1 book,These are _1 books].',
# ???????? ^^^^^ ??? ^^^^^^^^^ ???
$books,
);
print loc(
'He has [quant,_1,house,houses] in [_2], [_3].',
$houses,
$town,
$address,
);
print loc(
'[quant,_1,book is,books are] on [*,_2,shelf,shelves].',
$books,
$shelves,
);
Locale::TextDomain is part of the libintl-perl distribution. There are several exported functions. Function names follow a simple scheme.
x for a placeholder,
n for plural and
p for context.
The order of parameters, when present:
Context,
singular,
plural,
number for plural selection and
finally a hash with placeholder data.
Not all combinations of n, p and x are implemented. If you use x without a placeholder and adhere to the alphabetical order then __x, __nx, __px und __npx are left.
__('msgid')
__x(
'msgid',
name1 => $value1, name2 => $value2, ...
)
__n('msgid', 'msgid_plural', $count)
__nx(
'msgid', 'msgid_plural', $count,
name1 => $value1, name2 => $value2, ...
)
__xn(
'msgid', 'msgid_plural',
$count, name1 => $value1, name2 => $value2, ...
)
__p('context', 'msgid')
__px(
'context', 'msgid',
name1 => $value1, name2 => $value2, ...
)
__np('context', 'msgid', 'msgid_plural', $count)
__npx(
'context', 'msgid', 'msgid_plural', $count,
name1 => $value1, name2 => $value2, ...
)
print __('You can log out here.');
print __x(
'He lives in {town}, {address}.',
town => $town,
address => $address,
);
print __nx(
'{num} person lives here.',
'{num} people live here.',
$people,
num => $people,
);
print __nx(
'It is {num} book.',
'These are {num} books.',
$books,
num => $books,
);
print __nx(
'He has {num} house in {town}, {address}.',
'He has {num} houses in {town}, {address}.',
$houses,
num => $houses,
town => $town,
address => $address,
);
print
__nx(
'{num} book is',
'{num} books are',
$books,
num => $books,
),
__nx(
' on {num} shelf.',
' on {num} shelves.',
$shelves,
num => $shelves,
);
Locale::Maketext has numbered parameters. If there are many, you may confuse them. All the translator knows is that something is included, but not what.
[_1] is a [_2] in [_3].
Locale::Maketext can handle multiple plural forms in a text phrase.
[quant,_1,book is,books are] on [*,_2,shelf,shelves].
The text in plural forms (quant) is not automatically translatable because it's contained in a kind of "or" block.
Within this "or" block does placeholders such as _1 are absent. There no plural forms can be represented which start before the number.
[myplural,_1,It is _1 book,These are _1 books].
Of course this "myplural" function does not exist.
***
Locale::TextDomain has named parameters, which are easier to translate because the translator can understand the meaning of the sentence in spite of the placeholders.
{name} is a {locality} in {country}.
A text phrase containing several plural forms needs to be divided which makes it not automatically translatable.
Locale Maketext:
singular
singular + plural
singular + plural + zero
Locale::Textdomain:
2 in the source language
arbitrarily many in the target language
The header of each PO/MO file contains something called "Plural-Forms". This is a calculation formula, written in C except for one thing, "OR" is allowed in place of "||". Different versions are contained in different PO/MO files depending on language. Locale::Maketext ignores this entry.
German/English:
"Plural-Forms: nplurals=2; plural=n != 1\n";
Russian:
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
An example from the Russian language:
0 books -> книг (Plural 2)
1 book -> книга (Singular)
2 .. 4 books -> книги (Plural 1)
5 .. 20 books -> книг (Plural 2)
21 books -> книга (Singular)
22 .. 30 books -> книг (Plural 2)
...
100 books -> книг (Plural 2)
101 books -> книга (Singular)
102 .. 104 books -> книги (Plural 1)
105 .. 120 books -> книг (Plural 2)
121 book -> книга (Singular)
122 .. 124 books -> книги (Plural 1)
125 .. 130 books -> книг (Plural 2)
...
3 plural forms also exist in e.g. Czech, Lithuanian, Polish, Romanian, Slovak. 4 plural forms exist in eg. Slovenian and Celtic. So in the EU we can get by with 4 plural forms. Arabic has 6 has plural forms.
Because Locale::Maketext ignores "Plural-Forms" in PO/MO files, it can only support languages with 2 plural forms, that is, singular and plural, such as we know from German and English. There is a function "quant" which in principle corresponds to "quant2" (singular + 1st plural) assuming we ignore the zero form. One could define functions "quant3" to "quant6" for Locale::Maketext But then the programmer would need to already know which text phrases need 2, 3, 4, 5 or 6 plural forms. Because he does not know, he would have to always use "quant6". That's a whole lot of typing.
The position of the individual words can differ in different languages e.g. in one language it is
I have 2 books.
and in another
2 books I have.
If that is so, then with Locale::Maketext you have to write complete sentences as the plural forms. The English-native programmer cannot know that. The conflict is thus only discovered during translation.
If you want to avoid the conflict, you always write entire sentences.
But even that doesn't always work, because Locale::Maketext always expects "quant" to be followed by "_1" and then implicitly adds a space and then the text.
Yet what's needed is:
[myplural,_1,It is _1 book.,These are _1 books.]
But then that's no different than Locale::TextDomain.
Due to the use of commas as separators no commas may exist in enumerating texts.
Is there any simple quoting mechanism as in Text::CSV? I know of none.
I need 1 book, computer or notebook to do this.
Here's a dirty workaround using ";".
I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.
Due to string concatenation using spaces line breaks may occur between value and unit.
Depending on line length you get
I have
1 book.
or
I have 1
book.
With Locale::TextDomain you can write:
I have {num}\N{NO-BREAK SPACE}book.
I have {num}\N{NO-BREAK SPACE}books.
In Locale::Maketext the space is hard-coded into the module.
# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."
msgid "You can log out here."
msgstr "Sie können sich hier abmelden."
msgid "He lives in %1, %2."
msgstr "Er wohnt in %1, %2."
msgid "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,Mensch wohnt,Leute wohnen) hier."
# a bad workaround (no singular before placeholder)
msgid "This are %quant(%1,book,books)."
msgstr "Das sind %quant(%1,Buch,Bücher)."
msgid "%quant(%1,book is,books are) on %quant(%2,shelf,shelves)."
msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."
# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."
msgid "You can log out here."
msgstr "Sie können sich hier abmelden."
msgid "He lives in {town}, {address}."
msgstr "Er wohnt in {town}, {address}."
msgid "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0] "{num} Mensch wohnt hier."
msgstr[1] "{num} Leute wohnen hier."
msgid "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0] "Es ist {num} Buch."
msgstr[1] "Es sind {num} Bücher."
msgid "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0] "Er hat {num} Haus in {town}, {address}."
msgstr[1] "Er hat {num} Häuser in {town}, {address}."
msgid "{num} book is"
msgid_plural "{num} books are"
msgstr[0] "{num} Buch ist"
msgstr[1] "{num} Bücher sind"
msgid " on {num} shelf."
msgid_plural " on {num} shelves.
msgstr[0] " in {num} Regal."
msgstr[1] " in {num} Regalen."
# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."
msgid "You can log out here."
msgstr "��й�и из �и��ем�."
# Declination of the town name would be needed here:
# �о�ква -> в �о�кве
# �иев -> в �иеве
# ���и�и -> в ���и�а� (nicht regulär)
msgid "He lives in %1, %2."
msgstr "�н жив�� в %1, %2"
# This is not correctly translatable.
# The plural form for number 2 to 4 (�еловека жив��) is not storable.
msgid "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,�еловек жив��,�еловек жив��) зде��."
# This is not correctly translatable.
# The plural form for number 2 to 4 (дома) is not storable.
msgid "He has %quant(%1,house,houses) in %2, %3."
msgstr "У него %quant(%1,дом,домов) в %2, %3."
# This is not correctly translatable.
# The plural form for number 2 to 4 (книги) is not storable.
msgid "%quant(%1,book is,books are) on %quant(%2,shelf,shelves)."
msgstr "%quant(%1,книга,книг) на %quant(%1,полке,полка�)."
# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."
msgid "You can log out here."
msgstr "��й�и из �и��ем�."
# Declination of the town name would be needed here:
# �о�ква -> в �о�кве
# �иев -> в �иеве
# ���и�и -> в ���и�а� (nicht regulär)
msgid "He lives in {town}, {address}."
msgstr "�н жив�� в {town}, {address}."
msgid "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0] "{num} �еловек жив�� зде��."
msgstr[1] "{num} �еловека жив�� зде��."
msgstr[2] "{num} �еловек жив�� зде��."
msgid "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0] "ÐÑ�о {num} книга."
msgstr[1] "ÐÑ�о {num} книги."
msgstr[2] "ÐÑ�о {num} книг."
msgid "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0] "У него {num} дом в {town}, {address}."
msgstr[1] "У него {num} дома в {town}, {address}."
msgstr[2] "У него {num} домов в {town}, {address}."
# Translate this phrase together with the next one.
msgid "{num} book is"
msgid_plural "{num} books are"
msgstr[0] "{num} книга"
msgstr[1] "{num} книги"
msgstr[2] "{num} книг"
# Translate this phrase together with the previous one.
msgid " on {num} shelf."
msgid_plural " on {num} shelves."
msgstr[0] " на {num} полке."
msgstr[1] " на {num} полка�."
msgstr[2] " на {num} полка�."
Berlin -> �е�лин
in Berlin -> в �е�лине
If you want this, you need to also translate placeholder values and only then insert them.
That's doable, but it makes it impossible to automatically translate the phrase in which it is to be inserted. Moreover, that one is then also hard to translate manually because again to some extent the context is lost.
You can only tinker.
Declination of nouns:
masculine singular -> Arzt
feminine singular -> Ärztin
masculine plural -> Ärzte
feminine plural -> Ärztinnen
Conjugation of verbs:
Mascha ist zur Schule gegangen. -> �а�а по�ла в �кол�.
Petja ist zur Schule gegangen. -> �е�� по��л в �кол�.
msgid "design"
msgstr "Design"
msgctxt "automobile"
msgid "design"
msgstr "Konstruktion"
msgctxt "verb"
msgid "design"
msgstr "zeichnen"
He writes:
Since I wrote this article in 1998, I now see that the gettext docs are now trying more to come to terms with plurality. Whether useful conclusions have come from it is another question altogether. --- SMB, May 2001
It is many years later now and an all-in-one solution still does not exist.
In the current case known to me the translation agency uses the software "SDL Trados". Like other similar software it is based on a "translation memory". This works very well for static documents.
For dynamism which exists in software localization due to plural and context, such a software seems less suited. It assumes a 1:1 relation in translations. Therefor one has to expect that the relatively small portion needing context or plural forms can not well be accomplished with aid from software.
In the current case the POT file had to be converted into XML and the target language had to be filled from the source language. This kind of work would normally be expected from the translation agency.
Recommendation: Have a translation done with a smaller test file. This should contain all the typical constructs. Do this per language because subcontractors may be involved.
GNU gettext
wikipedia http://en.wikipedia.org/wiki/Gettext
gettext homepage http://www.gnu.org/software/gettext/gettext.html
Singular, Plural, Dual, Trial, Quadral
wikipedia - dual http://en.wikipedia.org/wiki/Dual_%28grammatical_number%29
wikipedia - all forms http://en.wikipedia.org/wiki/Sursurunga_language
sourceforge - which language - which plural form http://translate.sourceforge.net/wiki/l10n/pluralforms
CPAN module Locale::Maketext
CPAN module Locale::Maketext::Simple
obsolete article by Sean M. Burke about software localization
CPAN module Locale::TextDomain
Thanks for the support, the many ideas, examples and corrections.
Nikolai Prokoschenko http://rassie.org/
Nikolai Prokoschenko - On the state of i18n in Perl http://rassie.org/archives/247