public
Last active

Selecting an Internationalization Framework (GPW10)

  • Download Gist
I18N_STEFFENW.en.pod
Perl

Selecting an Internationalization Framework

Author

Steffen Winkler perl-ws@steffen-winkler.de

Bio

Since 1960, I exist.

I've been programming Perl since the end of 2000, first privately and then professionally.

Currently I am working at SIEMENS AG in Erlangen, primarily in the area of web programming.

I have been attending the German Perlworkshop since 2003.

Abstract

Why use Locale::TextDomain when so many frameworks on CPAN use Locale::Maketext?

Following my presentation on DBD::PO in Frankfurt/Main there was a lively discussion, both in Frankfurt and at Erlangen-PM.

There are 2 internationalization frameworks on CPAN, Locale::TextDomain (Perl interface to Uniform Message Translation) and Locale::Maketext (framework for localization).

What are the differences?

Where are the limits?

What I want to talk about today

From source to multilingual application in 2 ways.

No matter what internationalization framework from the CPAN you use you have to live with limitations. A good choice greatly reduces them.

It begins with the application's source code

 print  'You can log out here.';
 printf 'He lives in %s, %s.', $town, $address;
 printf '%d people live here.', $people;
 printf 'These are %d books.', $books;
 printf 'He has %s houses in %s, %s.', $houses, $town, $address;
 printf '%s books are in %s shelves.', $books, shelves;

PO files - what's that?

PO is an abbreviation for "portable object".

GNU gettext PO files can be used to make programs multilingual.

Along with the original text and its translation the file contains various comments and flags.

MO files are the binary version of PO files.

Rewriting to Locale::Maketext::Simple

Here we use the basic module Locale::Maketext and a module which reads gettext PO/MO files, namely Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exports the function "loc".

 [_n] where n = 1, 2, ...

is the general notation for placeholders. Within [] a function name can be used as a prefix followed by its parameters, separated by ",". "quant", or "*", is the function name for plural processing.

 print loc('You can log out here.');

 print loc(
     'He lives in [_1], [_2].',
     $town,
     $address,
 );

 print loc(
     '[quant,_1,person lives,people live] here.',
     $people,
 );

I have no idea how to write the following phrase with "quant". With "quant" you write something along the lines of value followed by unit. But here the plural form starts before the value. The problem is that "quant" requires the omission of "_1" in the plural forms and also the omission of the following space.

 print loc(  
     '[myplural,_1,It is _1 book,These are _1 books].',
     # ????????    ^^^^^ ???     ^^^^^^^^^ ???
     $books, 
 );

 print loc(
     'He has [quant,_1,house,houses] in [_2], [_3].',
     $houses,
     $town,
     $address,
 );

 print loc(
     '[quant,_1,book is,books are] in [*,_2,shelf,shelves].',
     $books,
     $shelves,
 );

Rewriting to Locale::TextDomain

Locale::TextDomain is part of the libintl-perl distribution. There are several exported functions. Function names follow a simple scheme.

 x for a placeholder,
 n for plural and
 p for context.

The order of parameters, when present:

 Context,
 singular,
 plural,
 number for plural selection and
 finally a hash with placeholder data.

Not all combinations of n, p and x are implemented. If you use x without a placeholder and adhere to the alphabetical order then __x, __nx, __px und __npx are left.

 __('msgid')
 __x(
     'msgid',
     name1 => $value1, name2 => $value2, ...
 )
 __n('msgid', 'msgid_plural', $count)
 __nx(
     'msgid', 'msgid_plural', $count,
     name1 => $value1, name2 => $value2, ...
 )
 __xn(
     'msgid', 'msgid_plural',
     $count, name1 => $value1, name2 => $value2, ...
 )
 __p('context', 'msgid')
 __px(
     'context', 'msgid',
     name1 => $value1, name2 => $value2, ...
 )
 __np('context', 'msgid', 'msgid_plural', $count)
 __npx(
     'context', 'msgid', 'msgid_plural', $count,
     name1 => $value1, name2 => $value2, ...
 )

 print __('You can log out here.');

 print __x(
     'He lives in {town}, {address}.',
     town    => $town,
     address => $address,
 );

 print __nx(
     '{num} person lives here.',
     '{num} people live here.',
     $people,
     num => $people,
 );


 print __nx(
     'It is {num} book.',
     'These are {num} books.',
     $books,
     num => $books,
 );

 print __nx(
     'He has {num} house in {town}, {address}.',
     'He has {num} houses in {town}, {address}.',
     $houses,
     num     => $houses,
     town    => $town,
     address => $address,
 );

 print
     __nx(
         '{num} book is',
         '{num} books are',
         $books,
         num => $books,
     ),
     __nx(
         ' in {num} shelf.',
         ' in {num} shelves.',
         $shelves,
         num => $shelves,
     );

What do you see at first glance?

Locale::Maketext has numbered parameters. If there are many, you may confuse them. All the translator knows is that something is included, but not what.

 [_1] is a [_2] in [_3].

Locale::Maketext can handle multiple plural forms in a text phrase.

 [quant,_1,book is,books are] in [*,_2,shelf,shelves].

The text in plural forms (quant) is not automatically translatable because it's contained in a kind of "or" block.

Within this "or" block does placeholders such as _1 are absent. There no plural forms can be represented which start before the number.

 [myplural,_1,It is _1 book,These are _1 books].

Of course this "myplural" function does not exist.

***

Locale::TextDomain has named parameters, which are easier to translate because the translator can understand the meaning of the sentence in spite of the placeholders.

 {name} is a {locality} in {country}.

A text phrase containing several plural forms needs to be divided which makes it not automatically translatable.

Things you won't spot immediately

Number of plural forms

Locale Maketext:

 singular
 singular + plural
 singular + plural + zero

Locale::Textdomain:

 2 in the source language
 arbitrarily many in the target language

The header of each PO/MO file contains something called "Plural-Forms". This is a calculation formula, written in C except for one thing, "OR" is allowed in place of "||". Different versions are contained in different PO/MO files depending on language. Locale::Maketext ignores this entry.

German/English:

 "Plural-Forms: nplurals=2; plural=n != 1\n";

Russian:

 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
 " ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

An example from the Russian language:

 0          books -> книг  (Plural 2)
 1          book  -> книга (Singular)
 2 .. 4     books -> книги (Plural 1)
 5 .. 20    books -> книг  (Plural 2)
 21         books -> книга (Singular)
 22 .. 30   books -> книг  (Plural 2)
 ...
 100        books -> книг  (Plural 2)
 101        books -> книга (Singular)
 102 .. 104 books -> книги (Plural 1)
 105 .. 120 books -> книг  (Plural 2)
 121        book  -> книга (Singular)
 122 .. 124 books -> книги (Plural 1)
 125 .. 130 books -> книг  (Plural 2)
 ...

3 plural forms also exist in e.g. Czech, Lithuanian, Polish, Romanian, Slovak. 4 plural forms exist in eg. Slovenian and Celtic. So in the EU we can get by with 4 plural forms. Arabic has 6 has plural forms.

Because Locale::Maketext ignores "Plural-Forms" in PO/MO files, it can only support languages with 2 plural forms, that is, singular and plural, such as we know from German and English. There is a function "quant" which in principle corresponds to "quant2" (singular + 1st plural) assuming we ignore the zero form. One could define functions "quant3" to "quant6" for Locale::Maketext But then the programmer would need to already know which text phrases need 2, 3, 4, 5 or 6 plural forms. Because he does not know, he would have to always use "quant6". That's a whole lot of typing.

Position of words in a sentence in different languages

The position of the individual words can differ in different languages e.g. in one language it is

 I have 2 books.

and in another

 2 books I have.

If that is so, then with Locale::Maketext you have to write complete sentences as the plural forms. The English-native programmer cannot know that. The conflict is thus only discovered during translation.

If you want to avoid the conflict, you always write entire sentences.

But even that doesn't always work, because Locale::Maketext always expects "quant" to be followed by "_1" and then implicitly adds a space and then the text.

Yet what's needed is:

 [myplural,_1,It is _1 book.,These are _1 books.]

But then that's nothing else than Locale::TextDomain.

Comma in plural forms, or the "join and can never split" trap

Due to the use of commas as separators no commas may exist in enumerating texts.

Is there any simple quoting mechanism as in Text::CSV? I know of none.

 I need 1 book, computer or notebook to do this.

Here's a dirty workaround using ";".

 I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.

Value and unit may be wrapped

Due to string concatenation using spaces line breaks may occur between value and unit.

Depending on line length you get

 I have
 1 book.

or

 I have 1
 book.

With Locale::TextDomain you can write:

 I have {num}\N{NO-BREAK SPACE}book.
 I have {num}\N{NO-BREAK SPACE}books.

In Locale::Maketext the space is hardwrite in the module code.

Excerpt from a PO file for Locale::Maketext

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=2; plural=n != 1;\n"
 "..."

 msgid  "You can log out here."
 msgstr "Sie können sich hier abmelden."

 msgid  "He lives in %1, %2."
 msgstr "Er wohnt in %1, %2."

 msgid  "%quant(%1,person lives,people live) here."
 msgstr "%quant(%1,Mensch wohnt,Menschen wohnen) hier."
 
 # a bad workaround (no singular before placeholder)
 msgid  "This are %quant(%1,book,books)."
 msgstr "Das sind %quant(%1,Buch,Bücher)."

 msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
 msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."

extract from the PO file for Locale::TextDomain

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=2; plural=n != 1;\n"
 "..."

 msgid        "You can log out here."
 msgstr       "Sie können sich hier abmelden."

 msgid        "He lives in {town}, {address}."
 msgstr       "Er wohnt in {town}, {address}."

 msgid        "{num} person lives here."
 msgid_plural "{num} people live here."
 msgstr[0]    "{num} Mensch wohnt hier."
 msgstr[1]    "{num} Menschen wohnen hier."

 msgid        "It is {num} book."
 msgid_plural "These are {num} books."
 msgstr[0]    "Es ist {num} Buch."
 msgstr[1]    "Es sind {num} Bücher."

 msgid        "He has {num} house in {town}, {address}."
 msgid_plural "He has {num} houses in {town}, {address}."
 msgstr[0]    "Er hat {num} Haus in {town}, {address}."
 msgstr[1]    "Er hat {num} Häuser in {town}, {address}."

 msgid        "{num} book is"
 msgid_plural "{num} books are"
 msgstr[0]    "{num} Buch ist"
 msgstr[1]    "{num} Bücher sind"

 msgid        " in {num} shelf."
 msgid_plural " in {num} shelves.
 msgstr[0]    " in {num} Regal."
 msgstr[1]    " in {num} Regalen."

PO file for English/Russian translation

for Locale::Maketext

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
 " ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
 "..."

 msgid  "You can log out here."
 msgstr "Выход из системы."

 # The town name should be inflected here: 
 # Москва -> в Москве
 # Киев   -> в Киеве
 # Мытищи -> в Мытищах (nicht regulär)
 msgid  "He lives in %1, %2."
 msgstr "Он живет в %1, %2"

 # This is not correctly translatable.
 # The plural form for number 2 to 4 (человека живут) is not storable.
 msgid  "%quant(%1,person lives,people live) here."
 msgstr "%quant(%1,человек живет,человек живут) здесь."

 # This is not correctly translatable.
 # The plural form for number 2 to 4 (дома) is not storable.
 msgid  "He has %quant(%1,house,houses) in %2, %3."
 msgstr "У него %quant(%1,дом,домов) в %2, %3."
 
 # This is not correctly translatable.
 # The plural form for number 2 to 4 (книги) is not storable.
 msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
 msgstr "%quant(%1,книга,книг) на %quant(%1,полке,полках)."

for Locale::TextDomain

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
 " ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
 "..."

 msgid        "You can log out here."
 msgstr       "Выход из системы."

 # The town name should be inflected here: 
 # Москва -> в Москве
 # Киев   -> в Киеве
 # Мытищи -> в Мытищах (nicht regulär)
 msgid        "He lives in {town}, {address}."
 msgstr       "Он живет в {town}, {address}."

 msgid        "{num} person lives here."
 msgid_plural "{num} people live here."
 msgstr[0]    "{num} человек живет здесь."
 msgstr[1]    "{num} человека живут здесь."
 msgstr[2]    "{num} человек живут здесь."

 msgid        "It is {num} book."
 msgid_plural "These are {num} books."
 msgstr[0]    "Это {num} книга."
 msgstr[1]    "Это {num} книги."
 msgstr[2]    "Это {num} книг."

 msgid        "He has {num} house in {town}, {address}."
 msgid_plural "He has {num} houses in {town}, {address}."
 msgstr[0]    "У него {num} дом в {town}, {address}."
 msgstr[1]    "У него {num} дома в {town}, {address}."
 msgstr[2]    "У него {num} домов в {town}, {address}."

 # Translate this phrase together with the next one.
 msgid        "{num} book is"
 msgid_plural "{num} books are"
 msgstr[0]    "{num} книга"
 msgstr[1]    "{num} книги"
 msgstr[2]    "{num} книг"

 # Translate this phrase together with the previous one.
 msgid        " in {num} shelf."
 msgid_plural " in {num} shelves."
 msgstr[0]    " на {num} полке."
 msgstr[1]    " на {num} полках."
 msgstr[2]    " на {num} полках."

Inflecting "in {town}"

 Berlin    -> Берлин
 in Berlin -> в Берлине

If you want this, you need to also translate placeholder values and only them insert them.

That's doable, but it makes it impossible to automatically translate the phrase in which it is to be inserted. Moreover, that one is then also hard to translate manually because again to some extent the context is lost.

You can only tinker.

neutral/masculine/feminine singular/plural

Inflection of nouns:

 masculine singular -> Arzt
 feminine  singular -> Ärztin
 masculine plural   -> Ärzte
 feminine  plural   -> Ärztinnen

Inflection of verbs:

 Mascha ist zur Schule gegangen. -> Маша пошла в школу.
 Petja ist zur Schule gegangen.  -> Петя пошёл в школу.

Context

 msgid   "design"
 msgstr  "Design"

 msgctxt "automobile"
 msgid   "design"
 msgstr  "Konstruktion"

 msgctxt "verb"
 msgid   "design"
 msgstr  "zeichnen"

Locale::Maketext::TPJ13 - Article by Sean M. Burke about software localization

He writes:

Since I wrote this article in 1998, I now see that the gettext docs are now trying more to come to terms with plurality. Whether useful conclusions have come from it is another question altogether. -- SMB, May 2001

[repeat, translated]

It is many years later now and a jack of all trades still does not exist.

Software for translation agencies

In the current case known to me the translation agency uses the software "SDL Trados". Like other similar software it is based on a "translation memory". This works very well for static documents.

For dynamism, which exists in software localization due to plural and context, such a software seems less suited. It assumes a 1:1 relation in translations. Therefor one has to expect that the relatively small portion needing context or plural forms can not well be accomplished with aid from software.

In the current case the POT file had to be converted into XML and the target language had to be filled from the source language. This kind of work would normally be expected from the translation agency.

Recommendation: Have a translation done with a smaller test file. This should contain all the typical constructs. Do this per language, because subcontractors may be involved.

Bibliography

I18N_STEFFENW.pod
Perl

Internationalisierungs-Framework auswählen

Autor

Steffen Winkler perl-ws@steffen-winkler.de

Bio

Seit 1960 gibt es mich.

Ich programmiere Perl seit Ende 2000, erst privat und dann auch beruflich.

Zur Zeit bin ich bei der SIEMENS AG in Erlangen beschäftigt. Dort arbeite ich vorwiegend im Bereich der Webprogrammierung.

Den Deutschen Perlworkshop besuche ich seit 2003.

Abstract

Warum Locale::TextDomain, obwohl viele Frameworks im CPAN Locale::Maketext benutzen?

Im Anschluss an meinen Vortrag DBD::PO in Frankfurt/Main gab es eine rege Diskussion, sowohl in Frankfurt als auch bei Erlangen-PM.

Es gibt im CPAN 2 Internationalisierungs-Frameworks, Locale::TextDomain (Perl Interface to Uniform Message Translation) und Locale::Maketext (framework for localization).

Was sind die Unterschiede?

Wo sind die Grenzen?

Über was ich heute sprechen möchte.

Vom Quelltext bis zur mehrsprachigen Anwendung auf 2 Wegen.

Egal welches Internationalisierungs-Framework man vom CPAN benutzt, man muss mit Einschränkungen leben. Bei guter Wahl sind diese sehr gering.

Am Anfang ist der Quelltext der Anwendung.

 print  'You can log out here.';
 printf 'He lives in %s, %s.', $town, $address;
 printf '%d people live here.', $people;
 printf 'These are %d books.', $books;
 printf 'He has %s houses in %s, %s.', $houses, $town, $address;
 printf '%s books are in %s shelves.', $books, shelves;

PO-Files - Was ist das?

PO ist die Abkürzung für "portable object".

GNU gettext PO-Files kann man benutzen, um Programme mehrsprachig zu machen.

Im File stehen neben dem Originaltext und der Übersetzung verschiedene Kommentare und Flags.

MO-Files sind die Binärvariante von PO-Files.

auf Locale::Maketext::Simple umschreiben

Verwendet wird dabei das Basismodul Locale::Maketext und ein Modul, welches gettext PO/MO-Files einliest. Das ist Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exportiert die Funktion "loc".

 [_n] mit n = 1, 2, ...

ist die generelle Schreibweise für Platzhalter. In den [] kann ein Funktionsname vorangestellt werden, nachgestellt die Parameter. Das Trennzeichen ist das ",". "quant", kurz "*", ist der Funktionsname für Pluralverarbeitung.

 print loc('You can log out here.');

 print loc(
     'He lives in [_1], [_2].',
     $town,
     $address,
 );

 print loc(
     '[quant,_1,person lives,people live] here.',
     $people,
 );

Ich habe keine Idee, wie man nachfolgende Phrase mit "quant" schreiben soll. Mit "quant" schreibt man so etwas wie Wert und nachfolgender Maßeinheit. Hier beginnt die Pluralform aber schon vor dem Wert. Das Problem ist, "quant" verlangt das Weglassen von "_1" in den Pluralformen und auch das Weglassen des darauf folgenden Leerzeichens.

 print loc(  
     '[myplural,_1,It is _1 book,These are _1 books].',
     # ????????    ^^^^^ ???     ^^^^^^^^^ ???
     $books, 
 );

 print loc(
     'He has [quant,_1,house,houses] in [_2], [_3].',
     $houses,
     $town,
     $address,
 );

 print loc(
     '[quant,_1,book is,books are] in [*,_2,shelf,shelves].',
     $books,
     $shelves,
 );

auf Locale::TextDomain umschreiben

Locale::TextDomain gehört zur Distribution libintl-perl. Es gibt mehrere exportierte Funktionen. Der Funktionsname ist einfach gebaut.

 x steht für Platzhalter,
 n für Pluralform und
 p für Kontext.

Die Parameterreihenfolge ist, wenn vorhanden:

 Kontext,
 Singular,
 Plural,
 Anzahl für Pluralauswahl und
 dann der Hash mit den Platzhalterdaten.

Nicht alle Varianten aus n, p und x sind implementiert. Wenn man x auch ohne Platzhalter benutzt und sich an die alphabetische Reihenfolge hält, bleiben __x, __nx, __px und __npx übrig.

 __('msgid')
 __x(
     'msgid',
     name1 => $value1, name2 => $value2, ...
 )
 __n('msgid', 'msgid_plural', $count)
 __nx(
     'msgid', 'msgid_plural', $count,
     name1 => $value1, name2 => $value2, ...
 )
 __xn(
     'msgid', 'msgid_plural',
     $count, name1 => $value1, name2 => $value2, ...
 )
 __p('context', 'msgid')
 __px(
     'context', 'msgid',
     name1 => $value1, name2 => $value2, ...
 )
 __np('context', 'msgid', 'msgid_plural', $count)
 __npx(
     'context', 'msgid', 'msgid_plural', $count,
     name1 => $value1, name2 => $value2, ...
 )

 print __('You can log out here.');

 print __x(
     'He lives in {town}, {address}.',
     town    => $town,
     address => $address,
 );

 print __nx(
     '{num} person lives here.',
     '{num} people live here.',
     $people,
     num => $people,
 );


 print __nx(
     'It is {num} book.',
     'These are {num} books.',
     $books,
     num => $books,
 );

 print __nx(
     'He has {num} house in {town}, {address}.',
     'He has {num} houses in {town}, {address}.',
     $houses,
     num     => $houses,
     town    => $town,
     address => $address,
 );

 print
     __nx(
         '{num} book is',
         '{num} books are',
         $books,
         num => $books,
     ),
     __nx(
         ' in {num} shelf.',
         ' in {num} shelves.',
         $shelves,
         num => $shelves,
     );

Was sieht man auf den ersten Blick?

Locale::Maketext hat durchnummerierte Parameter. Werden es viele, kann man sie verwechseln. Der Übersetzer, weiß nur, dass etwas eingefügt wird aber nicht was.

 [_1] is a [_2] in [_3].

Locale::Maketext kann mit mehreren Pluralformen in einer Textphrase umgehen.

 [quant,_1,book is,books are] in [*,_2,shelf,shelves].

Der Text bei Pluralformen (quant) ist nicht mehr automatisch übersetzbar, weil eine Art "oder"-Block enthalten ist.

In diesem "oder"-Block ist der Platzhalter wie z.B. _1 nicht mehr enthalten. Damit sind Pluralformen nicht darstellbar, welche bereits vor der Zahl beginnen.

 [myplural,_1,It is _1 book,These are _1 books].

Die Funktion "myplural" gibt es natürlich nicht.

***

Locale::TextDomain hat benannte Parameter, welche sich besser übersetzen lassen, weil der Übersetzer den Sinn des Satzes trotz Platzhalter immer noch verstehen kann.

 {name} is a {locality} in {country}.

Bei mehreren Pluralformen in einer Textphrase muss diese zerlegt werden, was nicht mehr automatisch übersetzbar ist.

Was man nicht gleich erkennt.

Anzahl der Pluralformen

Locale Maketext:

 Singular
 Singular + Plural
 Singular + Plural + Zero

Locale::Textdomain:

 2 in der Quellsprache
 beliebig viele in der Zielsprache

Im Header jedes PO-/MO-Files steht "Plural-Forms". Das ist die Berechnungsvorschrift als C-Code mit einer Ausnahme, "OR" anstatt von "||" ist erlaubt. Diese ist sprachabhängig unterschiedlich in den einzelnen PO-/MO-Files gespeichert. Locale::Maketext ignoriert diesen Eintrag.

Deutsch/Englisch:

 "Plural-Forms: nplurals=2; plural=n != 1\n";

Russisch:

 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
 " ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

Ein Beispiel aus dem Russischen:

 0          books -> книг  (Plural 2)
 1          book  -> книга (Singular)
 2 .. 4     books -> книги (Plural 1)
 5 .. 20    books -> книг  (Plural 2)
 21         books -> книга (Singular)
 22 .. 30   books -> книг  (Plural 2)
 ...
 100        books -> книг  (Plural 2)
 101        books -> книга (Singular)
 102 .. 104 books -> книги (Plural 1)
 105 .. 120 books -> книг  (Plural 2)
 121        book  -> книга (Singular)
 122 .. 124 books -> книги (Plural 1)
 125 .. 130 books -> книг  (Plural 2)
 ...

3 Pluralformen haben z.B. auch Tschechisch, Litauisch, Polnisch, Rumänisch, Slowakisch. 4 Pluralformen haben z.B. Slovenisch und Keltisch. In der EU kommen wir also mit 4 Plualformen aus. 6 Pluralformen hat Arabisch.

Weil Locale::Maketext "Plural-Forms" im PO-/MO-File ignoriert, sind damit nur Sprachen mit 2 Pluralformen möglich, also Singular und Plural, so wie wir das aus Deutsch und Englisch kennen. Es gibt eine Funktion "quant", welche im Prinzip "quant2" (Singular + 1. Plural) entspricht, wenn man von der Nullform absieht. Man könnte für Locale::Maketext eine Funktion "quant3" bis "quant6" definieren. Damit müsste aber der Programmierer schon wissen, welche Textphrasen 2, 3, 4, 5 oder 6 Pluralformen benötigen. Weil er das nicht weiß, muss er dann immer "quant6" benutzen. Damit schreibt er sich die Finger wund.

Position der Worte im Satz in unterschiedlichen Sprachen

Die Position der einzelnen Worte kann in unterschiedlichen Sprachen unterschiedlich sein, d.h. in einer Sprache heißt es

 I have 2 books.

und in einer anderen

 2 books I have.

Wenn das so ist, muss man bei Locale::Maketext komplette Sätze in den Pluralformen schreiben. Das kann der Englisch programmierende nicht wissen. Der Konflikt wird also erst während der Übersetzung bekannt.

Wenn man den Konflikt umgehen möchte, schreibt man immer die kompletten Sätze.

Das funktioniert aber auch nicht immer, weil Locale::Maketext nach "quant" immer "_1" erwartet und dann kommt das implizit hinzugefügte Leerzeichen und danach der Text.

Gebraucht würde aber:

 [myplural,_1,It is _1 book.,These are _1 books.]

Das ist dann aber nichts anderes als Locale::TextDomain.

Komma in den Pluralformen oder die "join and never can split"-Falle

Durch simple Stringverkettung mit Komma darf kein Komma in verketteten Texten sein.

Gibt es einen Quotingmechanismus wie bei Text::CSV? Mir ist keiner bekannt.

 I need 1 book, computer or notebook to do this.

Hier ein dreckiger Workaround mit ";".

 I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.

Wert und Maßeinheit werden ggf. umgebrochen

Durch Stringverkettung mit Leerzeichen entstehen Zeilenumbrüche zwischen Wert und Maßeinheit.

Das ergibt je nach Zeilenlänge

 I have
 1 book.

oder

 I have 1
 book.

Für Locale::TextDomain kann man schreiben:

 I have {num}\N{NO-BREAK SPACE}book.
 I have {num}\N{NO-BREAK SPACE}books.

In Locale::Maketext ist das Leerzeichen unveränderbar im Modulcode enthalten.

Auszug aus dem PO-File für Locale::Maketext

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=2; plural=n != 1;\n"
 "..."

 msgid  "You can log out here."
 msgstr "Sie können sich hier abmelden."

 msgid  "He lives in %1, %2."
 msgstr "Er wohnt in %1, %2."

 msgid  "%quant(%1,person lives,people live) here."
 msgstr "%quant(%1,Mensch wohnt,Menschen wohnen) hier."
 
 # a bad workaround (no singular before placeholder)
 msgid  "This are %quant(%1,book,books)."
 msgstr "Das sind %quant(%1,Buch,Bücher)."

 msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
 msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."

Auszug aus dem PO-File für Locale::TextDomain

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=2; plural=n != 1;\n"
 "..."

 msgid        "You can log out here."
 msgstr       "Sie können sich hier abmelden."

 msgid        "He lives in {town}, {address}."
 msgstr       "Er wohnt in {town}, {address}."

 msgid        "{num} person lives here."
 msgid_plural "{num} people live here."
 msgstr[0]    "{num} Mensch wohnt hier."
 msgstr[1]    "{num} Menschen wohnen hier."

 msgid        "It is {num} book."
 msgid_plural "These are {num} books."
 msgstr[0]    "Es ist {num} Buch."
 msgstr[1]    "Es sind {num} Bücher."

 msgid        "He has {num} house in {town}, {address}."
 msgid_plural "He has {num} houses in {town}, {address}."
 msgstr[0]    "Er hat {num} Haus in {town}, {address}."
 msgstr[1]    "Er hat {num} Häuser in {town}, {address}."

 msgid        "{num} book is"
 msgid_plural "{num} books are"
 msgstr[0]    "{num} Buch ist"
 msgstr[1]    "{num} Bücher sind"

 msgid        " in {num} shelf."
 msgid_plural " in {num} shelves.
 msgstr[0]    " in {num} Regal."
 msgstr[1]    " in {num} Regalen."

PO-File Englisch/Russisch übersetzt

für Locale::Maketext

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
 " ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
 "..."

 msgid  "You can log out here."
 msgstr "Выход из системы."

 # Hier wäre Beugung des Stadtnamens notwendig: 
 # Москва -> в Москве
 # Киев   -> в Киеве
 # Мытищи -> в Мытищах (nicht regulär)
 msgid  "He lives in %1, %2."
 msgstr "Он живет в %1, %2"

 # This is not correctly translatable.
 # The plural form for number 2 to 4 (человека живут) is not storable.
 msgid  "%quant(%1,person lives,people live) here."
 msgstr "%quant(%1,человек живет,человек живут) здесь."

 # This is not correctly translatable.
 # The plural form for number 2 to 4 (дома) is not storable.
 msgid  "He has %quant(%1,house,houses) in %2, %3."
 msgstr "У него %quant(%1,дом,домов) в %2, %3."
 
 # This is not correctly translatable.
 # The plural form for number 2 to 4 (книги) is not storable.
 msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
 msgstr "%quant(%1,книга,книг) на %quant(%1,полке,полках)."

für Locale::TextDomain

 # header
 msgid ""
 msgstr ""
 "...\n"
 "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
 " ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
 "..."

 msgid        "You can log out here."
 msgstr       "Выход из системы."

 # Hier wäre Beugung des Stadtnamens notwendig: 
 # Москва -> в Москве
 # Киев   -> в Киеве
 # Мытищи -> в Мытищах (nicht regulär)
 msgid        "He lives in {town}, {address}."
 msgstr       "Он живет в {town}, {address}."

 msgid        "{num} person lives here."
 msgid_plural "{num} people live here."
 msgstr[0]    "{num} человек живет здесь."
 msgstr[1]    "{num} человека живут здесь."
 msgstr[2]    "{num} человек живут здесь."

 msgid        "It is {num} book."
 msgid_plural "These are {num} books."
 msgstr[0]    "Это {num} книга."
 msgstr[1]    "Это {num} книги."
 msgstr[2]    "Это {num} книг."

 msgid        "He has {num} house in {town}, {address}."
 msgid_plural "He has {num} houses in {town}, {address}."
 msgstr[0]    "У него {num} дом в {town}, {address}."
 msgstr[1]    "У него {num} дома в {town}, {address}."
 msgstr[2]    "У него {num} домов в {town}, {address}."

 # Translate this phrase together with the next one.
 msgid        "{num} book is"
 msgid_plural "{num} books are"
 msgstr[0]    "{num} книга"
 msgstr[1]    "{num} книги"
 msgstr[2]    "{num} книг"

 # Translate this phrase together with the previous one.
 msgid        " in {num} shelf."
 msgid_plural " in {num} shelves."
 msgstr[0]    " на {num} полке."
 msgstr[1]    " на {num} полках."
 msgstr[2]    " на {num} полках."

Beugen von "in {town}"

 Berlin    -> Берлин
 in Berlin -> в Берлине

Wenn man das will, muß man Platzhalterwerte auch wieder Übersetzen und dann erst einfügen.

Das geht, macht aber das automatische Übersetzen der Phrase unmöglich, in dies eingefügt werden soll. Außerdem kann man diese dann auch wieder nur schwer manuell übersetzen, weil der Zusammenhang wieder etwas verloren geht.

Es ist Bastelei.

neutral/masculin/feminin singular/plural

Beugen von Substantiven:

 maskulin singular -> Arzt
 feminin singular  -> Ärztin
 maskulin plural   -> Ärzte
 feminin plural    -> Ärztinnen

Beugen von Verben:

 Mascha ist zur Schule gegangen. -> Маша пошла в школу.
 Petja ist zur Schule gegangen.  -> Петя пошёл в школу.

Kontext

 msgid   "design"
 msgstr  "Design"

 msgctxt "automobile"
 msgid   "design"
 msgstr  "Konstruktion"

 msgctxt "verb"
 msgid   "design"
 msgstr  "zeichnen"

Locale::Maketext::TPJ13 - Artikel von Sean M. Burke über Software-Lokalisierung

Er schreibt:

Since I wrote this article in 1998, I now see that the gettext docs are now trying more to come to terms with plurality. Whether useful conclusions have come from it is another question altogether. -- SMB, May 2001

Seitdem ich diesen Artikel 1998 schrieb, sehe ich jetzt, dass sich die gettext Dokumentationen jetzt mehr mit der Mehrzahl beschäftigen. Ob nützliche Beschlüsse davon gekommen sind, ist eine andere Frage. -- SMB, Mai 2001

Wir sind jetzt wieder viele Jahre weiter und die "Eierlegende Wollmilchsau" gibt es immer noch nicht.

Software für Übersetzungsbüros

Im aktuellen mir bekannten Fall, benutzt das Übersetzungsbüro die Software "SDL Trados". Es beruht wie andere vergleichbare Software auf einem "translation memory". Das funktioniert sehr gut für statische Dokumente.

Für die Dynamik, welche durch Plural und Kontext in der Softwarelokalisation real existiert, scheint solche Software weniger geeignet. Sie geht von eine 1:1-Übersetzung aus. Man muss also damit rechnen, dass die anteilmäßig eher geringe Teil mit Kontext oder Pluralformen nicht gut softwareunterstützt erbracht werden kann.

Im aktuellen Fall musste das POT-File in XML umgewandelt werden und dann die Zielsprache mit der Quellsprache vorbelegt werden. Diese Leistung hätte man eher vom Übersetzungsbüro erwartet.

Empfehlung: Testübersetzung einer kleineren Datei durchführen lassen. Diese sollte alle typischen Konstrukte enthalten. Und das je Sprache, weil teilweise Subunternehmen eingebunden werden.

Bibliographie

Looks like some double encoding sneaked in. In particular the Russian examples are not intelligible - which is a pity because they have the interesting cases.

I was going to say it is a problem with Github’s POD formatter, since clicking the “raw” link would reveal the source to be fine. But then I remembered the =encoding utf8 directive of POD and realised neither file had them. I added them and now the documents render fine. Thanks for the impetus to figure it out!

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.