Skip to content

Instantly share code, notes, and snippets.

@nucleogenesis
Last active July 31, 2019 21:49
Show Gist options
  • Save nucleogenesis/119280cd4cf19f07c3806f7cdb28a8d1 to your computer and use it in GitHub Desktop.
Save nucleogenesis/119280cd4cf19f07c3806f7cdb28a8d1 to your computer and use it in GitHub Desktop.
Using CSV with Crowdin & Managing Context

Crowdin - Moving to CSV from JSON

Definitions

Identifier - Refers to a Namespace.key combination such as UserTable.headingLabel or SignInPage.userName. Crowdin sees this sort of as a UID for a translation.

Source string - Refers to the string of text, particularly the English string of text, that is defined in Kolibri in relation to a specific Identifier. For example, the UserTable.headingLabel identifier could refer to the Source string "Users".

Context string - Refers to text added in the Crowdin UI by someone. This is used to give translators meaningful context for a string that might not be obvious just by seeing the Source string.

Overview

So that we can transfer context between releases, we must move from JSON to CSV for uploading our strings between releases.

There are a few important steps and I will refer to them by their CLI commands.

  1. yarn makemessages - This processes the entire Kolibri codebase for definitions of Identifiers mapped to Source strings for each front-end module (ie, Coach, Facility, Device, etc). Previously, this would create JSON files mapping Identifiers to Source strings - however, they will now instead be parsed into a CSV file.
  2. make i18n-upload branch={branch} - This process initially calls yarn makemessages - then it takes the files generated from that command and uploads it to Crowdin to the specified branch. Previously, this pushed up the JSON files, but now this uploads the CSV files.
  3. make i18n-download branch={branch} - This process downloads all of the files on Crowdin associated with the specified branch. Previously, this downloaded a zip for each language, extracted that zip file to the properly namespaced locale directory (eg, {kolibri_root}/kolibri/locale/ar/LC_MESSAGES/*-messages.json). I have implemented a change that instead only downloads one set of CSVs and places them in their own directory. Those CSV files include every translation that has been approved on Crowdin for every language - so this command process then uses those CSV files to produce the same language-code namespaced JSON files as were previously downloaded directly from Crowdin.

Complexities & Questions

Directory naming and git tracking for CSV files

  1. Downloaded CSV Files from Crowdin

These files effectively serve as our "single source of truth" for exactly what translations are going to be available to Kolibri. When they are downloaded, they are immediately parsed into the proper expected language-code namespaced directories, where our front-end i18n system expects to find proper translations.

Here is an example of a CSV that includes translations - downloaded from Crowdin.

Question #1: Should the "convert from CSV to JSON" process become its own separate command? Are there any reasons why someone might want to directly manipulate the "single source of truth" CSV files - then re-parse them into the JSON files used by the front-end manually so that the local "single source of truth" differs from what would be on Crowdin?

Question #2: Should these files be tracked in git? If we do - then our entire set of translations would live in our repo. Note that this would include all context information that we downloaded from Crowdin.

  1. CSV Files Created by yarn makemessages

These files are generated and represent the Identifiers and Source strings defined in the Kolibri code base. Very importantly - these are the files that are uploaded to Crowdin when updating a current branch or creating a new branch altogether.

Question #1: Is there any value in storing these files in the English locale directory? Functionally, these files are just a representation of the state of Kolibri's Identifier => Source string mapping at the time that yarn makemessages is run. Perhaps we could have a new CSV-specific directory structure where we store two separate sets of CSVs. One is the files generated here - the others being those downloaded from Crowdin (which include context and all translations).

Question #1.5: I don't imagine that these files need to be tracked in git since we don't track the English JSON files in the previous process, but I figure it is worth considering again whether we change the directory placement or not.

Question #1.75: In any case - should we also generate a set of English JSON files in the {kolibri_root}/kolibri/locale/en/LC_MESSAGES directory for any reason?

Context Transfer

Given that we have files downloaded from Crowdin that have the Context strings mapped to their associated Identifiers & Source strings, the transfer of context should be a relatively simple.

This could very well be done during the yarn makemessages command. Since we are storing the CSV files downloaded from Crowdin - we could load the CSV for the same module, map the Context string to its Identifier in a dictionary - then when we go to write the CSV file during yarn makemessages - we insert the context from the previous release into that CSV file.

Question #1: When mapping context, should we only check against the Identifier - or should we only transfer context when the Identifier from a previous release maps to the exact same Source string? Meaning - perhaps if UserTable.fullNameLabel mapped to Full name in a previous release - then in the next release we decided to upcase both words so that UserTable.fullNameLabel maps to Full Name - should we still transfer the context? Obviously, this is an example where the change is trivial in regard to context, so the question really is whether or not we should be strict and err on the side of caution to avoid mistakenly transfering context in case we decide that whatever Namespace.key combination should instead refer to a totally different string than it did in the previous release.

Question #2: Whenever we decide that we are going to i18n-upload - it seems that we should always precede that with a i18n-download so that we avoid overwriting context that has been added since the last i18n-download with an empty string. Consider the following possible flow of events:

  1. Download branch release-1.
  2. Crowdin users add context to various Identifiers - so a the Context string for the Component.thingLabel Identifier is changed from '' to 'For labeling a single thing'.
  3. Make changes to the defined Identifiers and Source strings - not changing Component.thingLabel or its Source string.
  4. We i18n-upload or i18n-update - which runs yarn makemessages which would only have access to the Context strings which we downloaded in step 1 - which would then have '' for the Context string associated to Component.thingLabel - which would then push up to Crowdin a value which would overwrite the work done on Crowdin by a user adding context to that Identifier.

So - would this cause any unnecessary complications that I am not seeing?

Next Steps: Managing Screenshots between releases

This is a bit more involved, however, it won't depend on or interfere with any of the above and could likely be written as it's own command process that would gather all screenshots and move them between branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment