Skip to content

Instantly share code, notes, and snippets.

@ptomato
Last active May 16, 2016 00:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ptomato/0e70598951f1ce809d5e9625a0b0dd8e to your computer and use it in GitHub Desktop.
Save ptomato/0e70598951f1ce809d5e9625a0b0dd8e to your computer and use it in GitHub Desktop.
Localize using gettext

How to localize using gettext

This document does not try to explain how to enable gettext support in an application. There are other documents that describe that process better. (Need links!)

The PO format and the PO files

Even as a developer, knowing some elementary stuff about the PO format, the format of the actual translations, is very useful.

The PO format is a really simple format, which probably at least partly explains its success and widespread use. The format is basically a hash list consisting of ''msgid'' and ''msgstr'' pairs, with the msgid being the original English string and key, and the msgstr being the translated value of it. As the English string is the key, all instances of the exact same English string in the code will be represented by exactly only one key/value pair, referred to as a ''message'', in the PO file. Usually this is not a problem, but instead a benefit of the format, as the exact same string won't have to be translated more than once by the translator. Below is an example of a message.

#: gedit/dialogs/gedit-plugin-program-location-dialog.c:78
#: gedit/dialogs/program-location-dialog.glade2.h:2
msgid "Set program location..."
msgstr "Ställ in programplats..."

In addition to the msgid and msgstr parts, a message usually also has lines starting with #: that tells what source files and what lines the string used as msgid was extracted from. These lines have no syntactic value. They are only there as a help for translators and developers to know where a message came from. For all PO parsing tools, the value of the msgid is what's used as key and what actually makes a difference for distinguishing individual messages.

A message in a PO file can be in one of essentially three different states. The message can be ''translated'', ''fuzzy'', or ''untranslated''. A message counts as translated as soon as the msgstr part of it is non-empty. In a similar manner, an untranslated message is one where the msgstr is empty. The fuzzy state is special and essentially means that there is a translation in the msgstr part, but that this translation is most likely not entirely correct, and that it thus needs manual attention by a translator. A message can become fuzzy in one of two ways:

  • The original string that the msgid represents was changed in the source code. A typo in the string may have been fixed or the string altered in some other way. The translator needs to check that the msgstr is still valid and make changes if necessary.
  • A new string has been added to the source, and the string is very similar, but not identical, to the msgid of an already existing, translated message. Then the msgstr of that message will be automatically reused for the new message, but the new message will also at the same time be marked fuzzy so that the translator knows there is some difference that he or she needs to adapt the translation to match.

There is always one special message in each valid PO file: the PO file header. It is encoded with the msgid for the empty string ("") as the key, and the actual header values are in the msgstr part. This unfortunately means that if you mark an empty string for translation, you will get the entire PO file header back as the "translation". In almost all cases this is probably not what you want. Hence, do not mark [[../Don't mark empty strings for translation|empty strings]] for translation.

More good things to know about PO files is that the validity of any particular PO file can always be checked by running msgfmt -cv file.po on it. This will also display the translation status for that particular PO file.

Running make -C po update-po (from the root directory of the repository) will refresh all PO files against the current state of the code. Just remember to not commit these altered files afterwards. Please keep in mind that the PO files themselves are the domains of the translators, and developers committing updated PO files usually just clutters the Git history and increases the danger of accidental Git conflicts. GNOME translators take care of updating their PO files themselves by always using refreshing their PO file before updating the translation itself. Thus, there's usually no need as a developer to update the PO files, even though make dist usually wants to do just that. Please do not commit PO files that have been altered by anything other than changes to the actual translation to Git. And if you need to do that, please ask translators in advance.

POTFILES.in and POTFILES.skip

The file po/POTFILES.in specifies which source files should be used for building the .pot and .po files. It should list the file names, with paths relative to the project root, each on a single line.

In a similar way, a file po/POTFILES.skip can be added that specifies the files with marked-up messages that for some reason shouldn't be translated and hence shouldn't be in POTFILES.in. The format is the same as POTFILES.in.

Since it's the developers that usually know what files are used in the project and which ones aren't (and hence shouldn't be translated even though they contain marked-up messages), it's the responsibility of the developers to keep both POTFILES.in and POTFILES.skip up-to-date. It's a rather common mistake to forget to add files to POTFILES.in.

Please remember that only files that are present in a fresh Git checkout should be listed in POTFILES.in or POTFILES.skip. This means that those files should not contain any generated files. The reason for this is that translators need to be able to work on a fresh Git checkout without having to build anything, so only files that are present in such fresh checkouts should be listed in POTFILES.in or POTFILES.skip. Listing generated files would cause errors or useless warnings on a fresh Git checkout.

Please also keep the POTFILES.in and POTFILES.skip files sorted alphabetically, using the C collating order if possible (LC_COLLATE=C). This helps catching duplicates in the listings, and it helps manual inspection when comparing the content in these files with directory listings.

<<Include(TranslationProject/DevGuidelines/FooterInclude)>>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment