Skip to content

Instantly share code, notes, and snippets.

@ap
Last active January 13, 2023 11:35
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save ap/909197 to your computer and use it in GitHub Desktop.
Save ap/909197 to your computer and use it in GitHub Desktop.
Selecting an Internationalization Framework (GPW10)

Selecting an Internationalization Framework

Author

Steffen Winkler perl-ws@steffen-winkler.de

Bio

I’ve existed since 1960.

I've been programming Perl since late 2000, first privately and then professionally.

Currently I work for SIEMENS AG in Erlangen, primarily in the area of web programming.

I have been attending the German Perlworkshop since 2003.

Abstract

Why use Locale::TextDomain when so many frameworks on CPAN use Locale::Maketext?

Following my presentation on DBD::PO in Frankfurt/Main there was a lively discussion, both in Frankfurt as well as at Erlangen-PM.

There are 2 internationalization frameworks on CPAN, Locale::TextDomain (Perl interface to Uniform Message Translation) and Locale::Maketext (framework for localization).

What are the differences?

What are the limitations?

What I want to talk about today

From source to multilingual application in 2 ways.

No matter what internationalization framework from the CPAN you use you have to live with limitations. A good choice greatly reduces them.

It begins with the application's source code

print  'You can log out here.';
printf 'He lives in %s, %s.', $town, $address;
printf '%d people live here.', $people;
printf 'These are %d books.', $books;
printf 'He has %s houses in %s, %s.', $houses, $town, $address;
printf '%s books are in %s shelves.', $books, shelves;

PO files - what are they?

PO is an abbreviation for "portable object".

GNU gettext PO files can be used to make programs multilingual.

Along with the original text and its translation the file contains various comments and flags.

MO files are the binary version of PO files.

Rewriting to Locale::Maketext::Simple

Here we use the basic module Locale::Maketext together with a module which reads gettext PO/MO files. It is called Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exports the function "loc".

[_n] where n = 1, 2, ...

is the general notation for placeholders. Within [] a function name can be used as a prefix followed by its parameters. They are separated by ",". quant, abbreviated *, is the name of the function for plural processing.

print loc('You can log out here.');

print loc(
    'He lives in [_1], [_2].',
    $town,
    $address,
);

print loc(
    '[quant,_1,person lives,people live] here.',
    $people,
);

I have no idea how to write the following phrase with "quant". With "quant" you write something along the lines of value followed by unit. But here the plural form starts before the value. The problem is that "quant" requires omitting "_1" in the plural forms and also omitting the following space.

print loc(  
    '[myplural,_1,It is _1 book,These are _1 books].',
    # ????????    ^^^^^ ???     ^^^^^^^^^ ???
    $books, 
);

print loc(
    'He has [quant,_1,house,houses] in [_2], [_3].',
    $houses,
    $town,
    $address,
);

print loc(
    '[quant,_1,book is,books are] in [*,_2,shelf,shelves].',
    $books,
    $shelves,
);

Rewriting to Locale::TextDomain

Locale::TextDomain is part of the libintl-perl distribution. There are several exported functions. Function names follow a simple scheme.

x for a placeholder,
n for plural and
p for context.

The order of parameters, when present:

Context,
singular,
plural,
number for plural selection and
finally a hash with placeholder data.

Not all combinations of n, p and x are implemented. If you use x without a placeholder and adhere to alphabetical order then __x, __nx, __px und __npx are the possibilities left.

__('msgid')
__x(
    'msgid',
    name1 => $value1, name2 => $value2, ...
)
__n('msgid', 'msgid_plural', $count)
__nx(
    'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)
__xn(
    'msgid', 'msgid_plural',
    $count, name1 => $value1, name2 => $value2, ...
)
__p('context', 'msgid')
__px(
    'context', 'msgid',
    name1 => $value1, name2 => $value2, ...
)
__np('context', 'msgid', 'msgid_plural', $count)
__npx(
    'context', 'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)

print __('You can log out here.');

print __x(
    'He lives in {town}, {address}.',
    town    => $town,
    address => $address,
);

print __nx(
    '{num} person lives here.',
    '{num} people live here.',
    $people,
    num => $people,
);


print __nx(
    'It is {num} book.',
    'These are {num} books.',
    $books,
    num => $books,
);

print __nx(
    'He has {num} house in {town}, {address}.',
    'He has {num} houses in {town}, {address}.',
    $houses,
    num     => $houses,
    town    => $town,
    address => $address,
);

print
    __nx(
        '{num} book is',
        '{num} books are',
        $books,
        num => $books,
    ),
    __nx(
        ' in {num} shelf.',
        ' in {num} shelves.',
        $shelves,
        num => $shelves,
    );

What do you see at first glance?

Locale::Maketext has numbered parameters. If there are many, this may be confusing. All the translator can tell is that something is being included, but not what.

[_1] is a [_2] in [_3].

Locale::Maketext can handle multiple plural forms in a text phrase.

[quant,_1,book is,books are] in [*,_2,shelf,shelves].

The text in plural forms (quant) is not automatically translatable because it's contained in a kind of "or" block.

Within this "or" block, placeholders such as _1 are no longer present. Thus it is impossible to represent plural forms which start before the number.

[myplural,_1,It is _1 book,These are _1 books].

Of course this "myplural" function does not exist.

***

Locale::TextDomain has named parameters, which are easier to translate because the translator can understand the meaning of the sentence in spite of the placeholders.

{name} is a {locality} in {country}.

A text phrase containing several plural forms needs to be divided which makes it not automatically translatable.

Things you won't spot immediately

Number of plural forms

Locale Maketext:

singular
singular + plural
singular + plural + zero

Locale::Textdomain:

2 in the source language
arbitrarily many in the target language

The header of each PO/MO file contains something called "Plural-Forms". This is a calculation formula, written in C except for one thing, "OR" is allowed in place of "||". Different versions are contained in different PO/MO files depending on language. Locale::Maketext ignores this entry.

German/English:

"Plural-Forms: nplurals=2; plural=n != 1\n";

Russian:

"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

An example from the Russian language:

0          books -> книг  (Plural 2)
1          book  -> книга (Singular)
2 .. 4     books -> книги (Plural 1)
5 .. 20    books -> книг  (Plural 2)
21         books -> книга (Singular)
22 .. 30   books -> книг  (Plural 2)
...
100        books -> книг  (Plural 2)
101        books -> книга (Singular)
102 .. 104 books -> книги (Plural 1)
105 .. 120 books -> книг  (Plural 2)
121        book  -> книга (Singular)
122 .. 124 books -> книги (Plural 1)
125 .. 130 books -> книг  (Plural 2)
...

There are also 3 plural forms in e.g. Czech, Lithuanian, Polish, Romanian, Slovak. There are 4 plural forms in eg. Slovenian and Celtic. So in the EU we can get by with 4 plural forms. Arabic has 6 has plural forms.

Because Locale::Maketext ignores "Plural-Forms" in PO/MO files, it can only support languages with 2 plural forms, i.e. singular and plural, like we are familiar with in German and English. There is a function "quant" which essentially corresponds to "quant2" (singular + 1st plural) assuming we ignore the zero form. It is quite possible to imagine functions "quant3" to "quant6" for Locale::Maketext. But then the programmer would need to already know which text phrases need 2, 3, 4, 5 or 6 plural forms. Because he does not know, he would have to always use "quant6". That's a whole lot of typing.

Position of words in a sentence in different languages

The positions of individual words can differ in different languages e.g. in one language it is

I have 2 books.

and in another

2 books I have.

If that is so, then with Locale::Maketext you have to write complete sentences as the plural forms. The English-native programmer cannot know that. The conflict is thus only discovered during translation.

If you want to avoid the conflict, you always write entire sentences.

But even that doesn't always work, because Locale::Maketext always expects "quant" to be followed by "_1" and then implicitly adds a space and then the text.

Yet what's needed is:

[myplural,_1,It is _1 book.,These are _1 books.]

But that would make it nothing else than Locale::TextDomain.

Comma in plural forms, or the "join and never can split" trap

Due to the use of commas as separators, no commas may exist in enumerating texts.

Is there any simple quoting mechanism such as in Text::CSV? I know of none.

I need 1 book, computer or notebook to do this.

Here's a dirty workaround using ";".

I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.

Value and unit may get wrapped

Due to string concatenation using spaces, line breaks may occur between value and unit.

Depending on line length you get

I have
1 book.

or

I have 1
book.

With Locale::TextDomain you can write:

I have {num}\N{NO-BREAK SPACE}book.
I have {num}\N{NO-BREAK SPACE}books.

Locale::Maketext has the space hardcoded.

Excerpt from a PO file for Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid  "You can log out here."
msgstr "Sie können sich hier abmelden."

msgid  "He lives in %1, %2."
msgstr "Er wohnt in %1, %2."

msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,Mensch wohnt,Menschen wohnen) hier."

# a bad workaround (no singular before placeholder)
msgid  "This are %quant(%1,book,books)."
msgstr "Das sind %quant(%1,Buch,Bücher)."

msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."

extract from a PO file for Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid        "You can log out here."
msgstr       "Sie können sich hier abmelden."

msgid        "He lives in {town}, {address}."
msgstr       "Er wohnt in {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} Mensch wohnt hier."
msgstr[1]    "{num} Menschen wohnen hier."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "Es ist {num} Buch."
msgstr[1]    "Es sind {num} Bücher."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "Er hat {num} Haus in {town}, {address}."
msgstr[1]    "Er hat {num} Häuser in {town}, {address}."

msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} Buch ist"
msgstr[1]    "{num} Bücher sind"

msgid        " in {num} shelf."
msgid_plural " in {num} shelves.
msgstr[0]    " in {num} Regal."
msgstr[1]    " in {num} Regalen."

PO file for English/Russian translation

for Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid  "You can log out here."
msgstr "Выход из �и�темы."

# The town name should be inflected here: 
# Мо�ква -> в Мо�кве
# Киев   -> в Киеве
# Мытищи -> в Мытищах (nicht regulär)
msgid  "He lives in %1, %2."
msgstr "Он живет в %1, %2"

# This is not correctly translatable.
# The plural form for number 2 to 4 (человека живут) is not storable.
msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,человек живет,человек живут) зде�ь."

# This is not correctly translatable.
# The plural form for number 2 to 4 (дома) is not storable.
msgid  "He has %quant(%1,house,houses) in %2, %3."
msgstr "У него %quant(%1,дом,домов) в %2, %3."

# This is not correctly translatable.
# The plural form for number 2 to 4 (книги) is not storable.
msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,книга,книг) на %quant(%1,полке,полках)."

for Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid        "You can log out here."
msgstr       "Выход из �и�темы."

# The town name should be inflected here: 
# Мо�ква -> в Мо�кве
# Киев   -> в Киеве
# Мытищи -> в Мытищах (nicht regulär)
msgid        "He lives in {town}, {address}."
msgstr       "Он живет в {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} человек живет зде�ь."
msgstr[1]    "{num} человека живут зде�ь."
msgstr[2]    "{num} человек живут зде�ь."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "Это {num} книга."
msgstr[1]    "Это {num} книги."
msgstr[2]    "Это {num} книг."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "У него {num} дом в {town}, {address}."
msgstr[1]    "У него {num} дома в {town}, {address}."
msgstr[2]    "У него {num} домов в {town}, {address}."

# Translate this phrase together with the next one.
msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} книга"
msgstr[1]    "{num} книги"
msgstr[2]    "{num} книг"

# Translate this phrase together with the previous one.
msgid        " in {num} shelf."
msgid_plural " in {num} shelves."
msgstr[0]    " на {num} полке."
msgstr[1]    " на {num} полках."
msgstr[2]    " на {num} полках."

Inflecting "in {town}"

Berlin    -> Берлин
in Berlin -> в Берлине

If you want this, you need to also translate placeholder values and only then insert them.

That's doable, but it makes it impossible to automatically translate the phrase in which it is to be inserted. Moreover, that one is then also hard to translate manually because again to some extent the context is lost.

You can only tinker here.

neutral/masculine/feminine singular/plural

Inflection of nouns:

masculine singular -> Arzt
feminine  singular -> Ärztin
masculine plural   -> Ärzte
feminine  plural   -> Ärztinnen

Inflection of verbs:

Mascha ist zur Schule gegangen. -> Маша пошла в школу.
Petja ist zur Schule gegangen.  -> Пет� пошёл в школу.

Context

msgid   "design"
msgstr  "Design"

msgctxt "automobile"
msgid   "design"
msgstr  "Konstruktion"

msgctxt "verb"
msgid   "design"
msgstr  "zeichnen"

Locale::Maketext::TPJ13 - Article by Sean M. Burke about software localization

He writes:

Since I wrote this article in 1998, I now see that the gettext docs are now trying more to come to terms with plurality. Whether useful conclusions have come from it is another question altogether. -- SMB, May 2001

[repeat, translated]

It is many years later now yet a jack of all trades still does not exist.

Software for translation agencies

In the current case known to me the translation agency uses the software "SDL Trados". Like other similar software it is based on a "translation memory". This works very well for static documents.

For the dynamism in software localization caused by plural and context, such a software seems less suited. It assumes a 1:1 relation in translations. Therefor one has to expect that the relatively small portion needing context or plural forms can not well be accomplished with aid from software.

In the current case the POT file had to be converted into XML and the target language had to be filled from the source language. This seemed like it would be part of a translation agency’s services.

Recommendation: Have a translation done with a smaller test file. The test should contain all the typical constructs. Repeat per-language, because subcontractors may be involved.

Bibliography

Internationalisierungs-Framework auswählen

Autor

Steffen Winkler perl-ws@steffen-winkler.de

Bio

Seit 1960 gibt es mich.

Ich programmiere Perl seit Ende 2000, erst privat und dann auch beruflich.

Zur Zeit bin ich bei der SIEMENS AG in Erlangen beschäftigt. Dort arbeite ich vorwiegend im Bereich der Webprogrammierung.

Den Deutschen Perlworkshop besuche ich seit 2003.

Abstract

Warum Locale::TextDomain, obwohl viele Frameworks im CPAN Locale::Maketext benutzen?

Im Anschluss an meinen Vortrag DBD::PO in Frankfurt/Main gab es eine rege Diskussion, sowohl in Frankfurt als auch bei Erlangen-PM.

Es gibt im CPAN 2 Internationalisierungs-Frameworks, Locale::TextDomain (Perl Interface to Uniform Message Translation) und Locale::Maketext (framework for localization).

Was sind die Unterschiede?

Wo sind die Grenzen?

Über was ich heute sprechen möchte.

Vom Quelltext bis zur mehrsprachigen Anwendung auf 2 Wegen.

Egal welches Internationalisierungs-Framework man vom CPAN benutzt, man muss mit Einschränkungen leben. Bei guter Wahl sind diese sehr gering.

Am Anfang ist der Quelltext der Anwendung.

print  'You can log out here.';
printf 'He lives in %s, %s.', $town, $address;
printf '%d people live here.', $people;
printf 'These are %d books.', $books;
printf 'He has %s houses in %s, %s.', $houses, $town, $address;
printf '%s books are in %s shelves.', $books, shelves;

PO-Files - Was ist das?

PO ist die Abkürzung für "portable object".

GNU gettext PO-Files kann man benutzen, um Programme mehrsprachig zu machen.

Im File stehen neben dem Originaltext und der Übersetzung verschiedene Kommentare und Flags.

MO-Files sind die Binärvariante von PO-Files.

auf Locale::Maketext::Simple umschreiben

Verwendet wird dabei das Basismodul Locale::Maketext und ein Modul, welches gettext PO/MO-Files einliest. Das ist Locale::Maketext::Lexicon::Gettext. Locale::Maketext::Simple exportiert die Funktion "loc".

[_n] mit n = 1, 2, ...

ist die generelle Schreibweise für Platzhalter. In den [] kann ein Funktionsname vorangestellt werden, nachgestellt die Parameter. Das Trennzeichen ist das ",". "quant", kurz "*", ist der Funktionsname für Pluralverarbeitung.

print loc('You can log out here.');

print loc(
    'He lives in [_1], [_2].',
    $town,
    $address,
);

print loc(
    '[quant,_1,person lives,people live] here.',
    $people,
);

Ich habe keine Idee, wie man nachfolgende Phrase mit "quant" schreiben soll. Mit "quant" schreibt man so etwas wie Wert und nachfolgender Maßeinheit. Hier beginnt die Pluralform aber schon vor dem Wert. Das Problem ist, "quant" verlangt das Weglassen von "_1" in den Pluralformen und auch das Weglassen des darauf folgenden Leerzeichens.

print loc(  
    '[myplural,_1,It is _1 book,These are _1 books].',
    # ????????    ^^^^^ ???     ^^^^^^^^^ ???
    $books, 
);

print loc(
    'He has [quant,_1,house,houses] in [_2], [_3].',
    $houses,
    $town,
    $address,
);

print loc(
    '[quant,_1,book is,books are] in [*,_2,shelf,shelves].',
    $books,
    $shelves,
);

auf Locale::TextDomain umschreiben

Locale::TextDomain gehört zur Distribution libintl-perl. Es gibt mehrere exportierte Funktionen. Der Funktionsname ist einfach gebaut.

x steht für Platzhalter,
n für Pluralform und
p für Kontext.

Die Parameterreihenfolge ist, wenn vorhanden:

Kontext,
Singular,
Plural,
Anzahl für Pluralauswahl und
dann der Hash mit den Platzhalterdaten.

Nicht alle Varianten aus n, p und x sind implementiert. Wenn man x auch ohne Platzhalter benutzt und sich an die alphabetische Reihenfolge hält, bleiben __x, __nx, __px und __npx übrig.

__('msgid')
__x(
    'msgid',
    name1 => $value1, name2 => $value2, ...
)
__n('msgid', 'msgid_plural', $count)
__nx(
    'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)
__xn(
    'msgid', 'msgid_plural',
    $count, name1 => $value1, name2 => $value2, ...
)
__p('context', 'msgid')
__px(
    'context', 'msgid',
    name1 => $value1, name2 => $value2, ...
)
__np('context', 'msgid', 'msgid_plural', $count)
__npx(
    'context', 'msgid', 'msgid_plural', $count,
    name1 => $value1, name2 => $value2, ...
)

print __('You can log out here.');

print __x(
    'He lives in {town}, {address}.',
    town    => $town,
    address => $address,
);

print __nx(
    '{num} person lives here.',
    '{num} people live here.',
    $people,
    num => $people,
);


print __nx(
    'It is {num} book.',
    'These are {num} books.',
    $books,
    num => $books,
);

print __nx(
    'He has {num} house in {town}, {address}.',
    'He has {num} houses in {town}, {address}.',
    $houses,
    num     => $houses,
    town    => $town,
    address => $address,
);

print
    __nx(
        '{num} book is',
        '{num} books are',
        $books,
        num => $books,
    ),
    __nx(
        ' in {num} shelf.',
        ' in {num} shelves.',
        $shelves,
        num => $shelves,
    );

Was sieht man auf den ersten Blick?

Locale::Maketext hat durchnummerierte Parameter. Werden es viele, kann man sie verwechseln. Der Übersetzer, weiß nur, dass etwas eingefügt wird aber nicht was.

[_1] is a [_2] in [_3].

Locale::Maketext kann mit mehreren Pluralformen in einer Textphrase umgehen.

[quant,_1,book is,books are] in [*,_2,shelf,shelves].

Der Text bei Pluralformen (quant) ist nicht mehr automatisch übersetzbar, weil eine Art "oder"-Block enthalten ist.

In diesem "oder"-Block ist der Platzhalter wie z.B. _1 nicht mehr enthalten. Damit sind Pluralformen nicht darstellbar, welche bereits vor der Zahl beginnen.

[myplural,_1,It is _1 book,These are _1 books].

Die Funktion "myplural" gibt es natürlich nicht.

***

Locale::TextDomain hat benannte Parameter, welche sich besser übersetzen lassen, weil der Übersetzer den Sinn des Satzes trotz Platzhalter immer noch verstehen kann.

{name} is a {locality} in {country}.

Bei mehreren Pluralformen in einer Textphrase muss diese zerlegt werden, was nicht mehr automatisch übersetzbar ist.

Was man nicht gleich erkennt.

Anzahl der Pluralformen

Locale Maketext:

Singular
Singular + Plural
Singular + Plural + Zero

Locale::Textdomain:

2 in der Quellsprache
beliebig viele in der Zielsprache

Im Header jedes PO-/MO-Files steht "Plural-Forms". Das ist die Berechnungsvorschrift als C-Code mit einer Ausnahme, "OR" anstatt von "||" ist erlaubt. Diese ist sprachabhängig unterschiedlich in den einzelnen PO-/MO-Files gespeichert. Locale::Maketext ignoriert diesen Eintrag.

Deutsch/Englisch:

"Plural-Forms: nplurals=2; plural=n != 1\n";

Russisch:

"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

Ein Beispiel aus dem Russischen:

0          books -> книг  (Plural 2)
1          book  -> книга (Singular)
2 .. 4     books -> книги (Plural 1)
5 .. 20    books -> книг  (Plural 2)
21         books -> книга (Singular)
22 .. 30   books -> книг  (Plural 2)
...
100        books -> книг  (Plural 2)
101        books -> книга (Singular)
102 .. 104 books -> книги (Plural 1)
105 .. 120 books -> книг  (Plural 2)
121        book  -> книга (Singular)
122 .. 124 books -> книги (Plural 1)
125 .. 130 books -> книг  (Plural 2)
...

3 Pluralformen haben z.B. auch Tschechisch, Litauisch, Polnisch, Rumänisch, Slowakisch. 4 Pluralformen haben z.B. Slovenisch und Keltisch. In der EU kommen wir also mit 4 Plualformen aus. 6 Pluralformen hat Arabisch.

Weil Locale::Maketext "Plural-Forms" im PO-/MO-File ignoriert, sind damit nur Sprachen mit 2 Pluralformen möglich, also Singular und Plural, so wie wir das aus Deutsch und Englisch kennen. Es gibt eine Funktion "quant", welche im Prinzip "quant2" (Singular + 1. Plural) entspricht, wenn man von der Nullform absieht. Man könnte für Locale::Maketext eine Funktion "quant3" bis "quant6" definieren. Damit müsste aber der Programmierer schon wissen, welche Textphrasen 2, 3, 4, 5 oder 6 Pluralformen benötigen. Weil er das nicht weiß, muss er dann immer "quant6" benutzen. Damit schreibt er sich die Finger wund.

Position der Worte im Satz in unterschiedlichen Sprachen

Die Position der einzelnen Worte kann in unterschiedlichen Sprachen unterschiedlich sein, d.h. in einer Sprache heißt es

I have 2 books.

und in einer anderen

2 books I have.

Wenn das so ist, muss man bei Locale::Maketext komplette Sätze in den Pluralformen schreiben. Das kann der Englisch programmierende nicht wissen. Der Konflikt wird also erst während der Übersetzung bekannt.

Wenn man den Konflikt umgehen möchte, schreibt man immer die kompletten Sätze.

Das funktioniert aber auch nicht immer, weil Locale::Maketext nach "quant" immer "_1" erwartet und dann kommt das implizit hinzugefügte Leerzeichen und danach der Text.

Gebraucht würde aber:

[myplural,_1,It is _1 book.,These are _1 books.]

Das ist dann aber nichts anderes als Locale::TextDomain.

Komma in den Pluralformen oder die "join and never can split"-Falle

Durch simple Stringverkettung mit Komma darf kein Komma in verketteten Texten sein.

Gibt es einen Quotingmechanismus wie bei Text::CSV? Mir ist keiner bekannt.

I need 1 book, computer or notebook to do this.

Hier ein dreckiger Workaround mit ";".

I need [*_1,book; computer or notebook,books; computers or notebooks] to do this.

Wert und Maßeinheit werden ggf. umgebrochen

Durch Stringverkettung mit Leerzeichen entstehen Zeilenumbrüche zwischen Wert und Maßeinheit.

Das ergibt je nach Zeilenlänge

I have
1 book.

oder

I have 1
book.

Für Locale::TextDomain kann man schreiben:

I have {num}\N{NO-BREAK SPACE}book.
I have {num}\N{NO-BREAK SPACE}books.

In Locale::Maketext ist das Leerzeichen unveränderbar im Modulcode enthalten.

Auszug aus dem PO-File für Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid  "You can log out here."
msgstr "Sie können sich hier abmelden."

msgid  "He lives in %1, %2."
msgstr "Er wohnt in %1, %2."

msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,Mensch wohnt,Menschen wohnen) hier."

# a bad workaround (no singular before placeholder)
msgid  "This are %quant(%1,book,books)."
msgstr "Das sind %quant(%1,Buch,Bücher)."

msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,Buch ist,Bücher sind) in %quant(%2,Regal,Regalen)."

Auszug aus dem PO-File für Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"..."

msgid        "You can log out here."
msgstr       "Sie können sich hier abmelden."

msgid        "He lives in {town}, {address}."
msgstr       "Er wohnt in {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} Mensch wohnt hier."
msgstr[1]    "{num} Menschen wohnen hier."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "Es ist {num} Buch."
msgstr[1]    "Es sind {num} Bücher."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "Er hat {num} Haus in {town}, {address}."
msgstr[1]    "Er hat {num} Häuser in {town}, {address}."

msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} Buch ist"
msgstr[1]    "{num} Bücher sind"

msgid        " in {num} shelf."
msgid_plural " in {num} shelves.
msgstr[0]    " in {num} Regal."
msgstr[1]    " in {num} Regalen."

PO-File Englisch/Russisch übersetzt

für Locale::Maketext

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid  "You can log out here."
msgstr "���од из �и��ем�."

# Hier wäre Beugung des Stadtnamens notwendig: 
# �о�ква -> в �о�кве
# �иев   -> в �иеве
# ���и�и -> в ���и�а� (nicht regulär)
msgid  "He lives in %1, %2."
msgstr "�н живе� в %1, %2"

# This is not correctly translatable.
# The plural form for number 2 to 4 (�еловека жив��) is not storable.
msgid  "%quant(%1,person lives,people live) here."
msgstr "%quant(%1,�еловек живе�,�еловек жив��) зде��."

# This is not correctly translatable.
# The plural form for number 2 to 4 (дома) is not storable.
msgid  "He has %quant(%1,house,houses) in %2, %3."
msgstr "У него %quant(%1,дом,домов) в %2, %3."

# This is not correctly translatable.
# The plural form for number 2 to 4 (книги) is not storable.
msgid  "%quant(%1,book is,books are) in %quant(%2,shelf,shelves)."
msgstr "%quant(%1,книга,книг) на %quant(%1,полке,полка�)."

für Locale::TextDomain

# header
msgid ""
msgstr ""
"...\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11"
" ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"..."

msgid        "You can log out here."
msgstr       "���од из �и��ем�."

# Hier wäre Beugung des Stadtnamens notwendig: 
# �о�ква -> в �о�кве
# �иев   -> в �иеве
# ���и�и -> в ���и�а� (nicht regulär)
msgid        "He lives in {town}, {address}."
msgstr       "�н живе� в {town}, {address}."

msgid        "{num} person lives here."
msgid_plural "{num} people live here."
msgstr[0]    "{num} �еловек живе� зде��."
msgstr[1]    "{num} �еловека жив�� зде��."
msgstr[2]    "{num} �еловек жив�� зде��."

msgid        "It is {num} book."
msgid_plural "These are {num} books."
msgstr[0]    "Э�о {num} книга."
msgstr[1]    "Э�о {num} книги."
msgstr[2]    "Э�о {num} книг."

msgid        "He has {num} house in {town}, {address}."
msgid_plural "He has {num} houses in {town}, {address}."
msgstr[0]    "У него {num} дом в {town}, {address}."
msgstr[1]    "У него {num} дома в {town}, {address}."
msgstr[2]    "У него {num} домов в {town}, {address}."

# Translate this phrase together with the next one.
msgid        "{num} book is"
msgid_plural "{num} books are"
msgstr[0]    "{num} книга"
msgstr[1]    "{num} книги"
msgstr[2]    "{num} книг"

# Translate this phrase together with the previous one.
msgid        " in {num} shelf."
msgid_plural " in {num} shelves."
msgstr[0]    " на {num} полке."
msgstr[1]    " на {num} полка�."
msgstr[2]    " на {num} полка�."

Beugen von "in {town}"

Berlin    -> �е�лин
in Berlin -> в �е�лине

Wenn man das will, muß man Platzhalterwerte auch wieder Übersetzen und dann erst einfügen.

Das geht, macht aber das automatische Übersetzen der Phrase unmöglich, in dies eingefügt werden soll. Außerdem kann man diese dann auch wieder nur schwer manuell übersetzen, weil der Zusammenhang wieder etwas verloren geht.

Es ist Bastelei.

neutral/masculin/feminin singular/plural

Beugen von Substantiven:

maskulin singular -> Arzt
feminin singular  -> Ärztin
maskulin plural   -> Ärzte
feminin plural    -> Ärztinnen

Beugen von Verben:

Mascha ist zur Schule gegangen. -> �а�а по�ла в �кол�.
Petja ist zur Schule gegangen.  -> �е�� по��л в �кол�.

Kontext

msgid   "design"
msgstr  "Design"

msgctxt "automobile"
msgid   "design"
msgstr  "Konstruktion"

msgctxt "verb"
msgid   "design"
msgstr  "zeichnen"

Locale::Maketext::TPJ13 - Artikel von Sean M. Burke über Software-Lokalisierung

Er schreibt:

Since I wrote this article in 1998, I now see that the gettext docs are now trying more to come to terms with plurality. Whether useful conclusions have come from it is another question altogether. -- SMB, May 2001

Seitdem ich diesen Artikel 1998 schrieb, sehe ich jetzt, dass sich die gettext Dokumentationen jetzt mehr mit der Mehrzahl beschäftigen. Ob nützliche Beschlüsse davon gekommen sind, ist eine andere Frage. -- SMB, Mai 2001

Wir sind jetzt wieder viele Jahre weiter und die "Eierlegende Wollmilchsau" gibt es immer noch nicht.

Software für Übersetzungsbüros

Im aktuellen mir bekannten Fall, benutzt das Übersetzungsbüro die Software "SDL Trados". Es beruht wie andere vergleichbare Software auf einem "translation memory". Das funktioniert sehr gut für statische Dokumente.

Für die Dynamik, welche durch Plural und Kontext in der Softwarelokalisation real existiert, scheint solche Software weniger geeignet. Sie geht von eine 1:1-Übersetzung aus. Man muss also damit rechnen, dass die anteilmäßig eher geringe Teil mit Kontext oder Pluralformen nicht gut softwareunterstützt erbracht werden kann.

Im aktuellen Fall musste das POT-File in XML umgewandelt werden und dann die Zielsprache mit der Quellsprache vorbelegt werden. Diese Leistung hätte man eher vom Übersetzungsbüro erwartet.

Empfehlung: Testübersetzung einer kleineren Datei durchführen lassen. Diese sollte alle typischen Konstrukte enthalten. Und das je Sprache, weil teilweise Subunternehmen eingebunden werden.

Bibliographie

@zby
Copy link

zby commented Apr 8, 2011

Looks like some double encoding sneaked in. In particular the Russian examples are not intelligible - which is a pity because they have the interesting cases.

@ap
Copy link
Author

ap commented Apr 8, 2011

I was going to say it is a problem with Github’s POD formatter, since clicking the “raw” link would reveal the source to be fine. But then I remembered the =encoding utf8 directive of POD and realised neither file had them. I added them and now the documents render fine. Thanks for the impetus to figure it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment