Skip to content

Instantly share code, notes, and snippets.

@SphinxKnight
Last active February 19, 2020 07:12
Show Gist options
  • Save SphinxKnight/df892b41e93045d3dfad07335e2c3be5 to your computer and use it in GitHub Desktop.
Save SphinxKnight/df892b41e93045d3dfad07335e2c3be5 to your computer and use it in GitHub Desktop.
Stumptown - l10n thoughts
Hello all,
Sorry for the delay*, thanks for the pointers to stumptown-experiment, I'm glad to see this idea coming to life :) I took some time to dig into the repo and will continue to do so before Whistler.
Some "structure" thoughts
As for localization, I'd think, one way to "prepare" for it would be to somehow create a "locales" directory with an "en-US" subdirectory in it. This would allow for some kind of basic pseudo-localization. I would somehow also prefer to have all of the prose"s" of a given section under the same directory.
Some "user stories" thoughts
As a new localizer I want to know "what there is to be done" so that I can contribute efficiently (ala Doc status page, I know they are not perfect but they are still a great tools :))
As a maintainer, I want to know which content needs updating so that localized content does not stale
As a "locale" reviewer, I want to have a way to review new translations to guide newcomers
Some tools/processes/others that might inspire what to use/build
https://gitlocalize.com
Seems pretty neat in terms of workflow. I've tested it against my fork and I managed to create a french version of the content. After updating the original English file, I received an email alerting me an update was necessary :)
Closed source though :s (the docs are on GH)
Translation of React docs : https://github.com/reactjs/reactjs.org-translation (thx Jérémie for the pointer :))
I will try to contribute to it in the coming days, I did not dig too much here. I'm particularly interested in the "maintenance" phase.
Crowdin https://crowdin.com/
Several projects are here, I'm not a user but I find the localization UI more cluttered.
I'll check Pontoon to see how it could handle this.
If you know some other stuff around localization and docs, please add them :)
Migration thoughts
As was expected at the beginning of BCD project, we should find a way (automated or not...) to migrate the current content to this format.
I'm aware this is not yet the goal of this experiment but I cannot help but think about the volume of content :s
With the resulting dataset (targeting HTML / CSS / JavaScript / Mozilla docs),
1914 localized pages having the {{languages}} macro without update for the last two years while their english counterpart has been updated in this timeframe
87% (1668p) of all of these are under /Mozilla
among which 509 have not been updated for 2 years when english has been edited in the last year
Targeting Mozilla section specifically:
5387 pages in English with
52% (2834p) not being translated in any language
6444 of localized pages
57 locales
"Top" locales being ja (1811p - 28% of 6444), fr (1306p - 20%), pl (917p - 14,2%), zh-CN (583p - 9%), es (324p - 5%), ko (196p - 3%), de (195p), ru (178p), pt-PT (154p), pt-BR (133p), zh-TW (119p)
(no "freshness" taken into account here)
Locales with less than (or equal to) 10 pages : el, he, ms, ka, sq, fi, af, bn-IN, my, hr, wo, bg, ml, az, kab, bm, ee, ff, fy-NL, ga-IE, ha, ig, ln, son, sw, ta, tl, xh, yo, zu
Overall maintenance
66% (4279p) have not been updated in the last three years
71% (4592p) have not been updated in the last two years
81% (5024p) have not been updated in the last year
Maintenance per locale
28 Locales without any edit in this section for the last year: fi, af, bn-IN, ca, hi-IN, kab, ro, hr, sq, az, bm, ee, ff, fy-NL, ga-IE, ha, ig, ln, my, son, sw, ta, tl, xh, yo, zu, te, sr-Latn
21 Locales without any edit in this section for the last two years:fi, af, ca, bm, ee, ff, fy-NL, ga-IE, ha, ig, ln, my, son, sw, ta, tl, xh, yo, zu, te, sr-Latn
18 Locales without any edit in this section for the last three years:ca, bm, ee, ff, fy-NL, ga-IE, ha, ig, ln, son, sw, ta, tl, xh, yo, zu, te, sr-Latn
Target CSS section specifically:
989 pages in English
4292 of localized pag
31 locales
Top locales being fr 989, ja 685, zh-CN 481, es 460, de 414, ru 254, pt-BR 202, ko 196, pl 152, ca 102
(no "freshness" taken into account here)
Locales with less than (or equal to) 10 pages : he, ro, th, cs, hu, ka, az, hi-IN, kab
28 locales without any page: fi, af, bn-IN, el, ms, sv-SE, hr, sq, bm, ee, ff, fy-NL, ga-IE, ha, ig, ln, ml, my, son, sw, ta, tl, wo, xh, yo, zu, te, sr-Latn
Target JavaScript section specifically:
876 pages in English
7090 of localized page
38 locales
"Top" locales being fr 876, ja 813, zh-CN 788, ru 653, de 578, ko 493, es 461, pt-BR 423, pl 323, ca 317
(no "freshness" taken into account here)
Locales with less than (or equal to) 10 pages: sv-SE, el, ka, my, sq, sr-Latn, af, te, kab
21 locales without any page: fi, bn-IN, ms, hr, az, ee, ff, fy-NL, ga-IE, ha, ig, ln, ml, son, sw, ta, tl, wo, xh, yo, zu
Target HTML section specifically:
211 pages in English
1802 of localized pages
32 locales
"Top" locales being fr 211, ja 206, zh-CN 197, es 173, ca 163, ko 139, pt-BR 120, de 115, ru 95, it 59
(no "freshness" taken into account here)
Locales with less than (or equal to) 10 pages: id, fa, fi, ar, tr, vi, bg, hu, cs, ka, ms
27 Locales without any page: af, bn-IN, el, hi-IN, th, hr, sq, az, bm, ee, ff, fy-NL, ga-IE, ha, ig, ln, ml, son, sw, ta, tl, wo, xh, yo, zu, te, sr-Latn
Following actions, that need consensus, could be:
Disable obsolete locales (18): Call for localizers across various channels/l10n individuals listed on wikimoz then remove the 18 locales without any page under HTML/CSS/JavaScript : bn-IN, hr, ee, ff, fy-NL, ga-IE, ha, ig, ln, ml, son, sw, ta, tl, wo, xh, yo, zu
Remove Mozilla Internal locale "islands": Remove (delete or archive if at all possible) all localized pages of the /Mozilla section for the locales with less than 10 pages
Remove Mozilla Internal useless for audience: Remove all page of the /Mozilla section for the locales which have not been maintained for two years and that have "very few" (to be defined) pageviews
Warn "at-great-risk" locales (26): Call for localizers/l10n individuals listed on wikimoz to improve the state of HTML/CSS/JavaScript for at least l10n-priority flagged pages
Needs to be done for the 26 following locales which have 0 or under 10 pages in any of HTML/CSS/JavaScript : he, ro, th, cs, hu, ka, az, hi-IN, kab, fi, af, el, ms, sv-SE, sq, bm, my, te, sr-Latn, zu, id, fa, ar, tr, vi, bg
Respectfully,
Julien

Notes from session about potential localization structure for Stumptown

  • Objectives: Draft ideas for potential "locations" of localized content under Stumptown
  • Participants: Will / Julien

The current structure of content on Stumptown is:

  • stumptown-experiment (root)
    • content
      • css
      • html
    • scripts
    • recipes
    • packaged
      • html
      • css

I'll use fr as an example for the following structures but it is only for illustration purpose. Any locale (TRL / vertical / etc.) could be substituted.

The hypotheses are sorted by "probability"/"usefulness".

TL;DR: Most probably locales should be stored under content with folders for en-US, fr,… (Hypothesis 1) Or locales as git submodule (Hypothesis 2).

Hypothesis #1 - Locale dir. under content

Have a directory for each locale under content.

Structure

  • stumptown-experiment (root)
    • content
      • en-US
        • css
        • html
      • fr
        • css
        • html
    • scripts
    • recipes
    • packaged
      • en-US
        • css
        • html
      • fr
        • css
        • html

Advantages

  • Quite easy to build a "health report"/to do stats for a given locale
  • Clear structure, easy to understand
  • Easy to build "localization doc status" pages for maintenance (answering the question "What has changed in English that needs update in French)
  • Less friction for migrating existing content with links (since the structure approx. matches Kuma structure)
  • Possible to have mass edits across docs for all locales
  • Easy to adapt the build process to create the JSON files under packaged

Drawbacks

  • Duplication of some files depending on the content of the YAML file. This could be mitigated by some linting step.
  • Governance: giving access to a trusted "peer" for a locale gives access to the whole repo. Technically, any person which could merge/commit under content/fr could also merge/commit under scripts.

Hypothesis #2 - Have a git submodule for content and for each locale

The "logic" part of content is isolated under a main git module and each locale (incl. en-US) is stored under another repo which is a git submodule.

Structures

Modules

  • stumptown - Git repo
    • stumptown-content-en-US - Git submodule
    • stumptown-content-fr - Git submodule

Files

  • stumptown-experiment

    • scripts
    • recipes
    • submodule imported for en-US
    • submodule imported for fr
  • Root of Git submodule for en-US

    • css
    • html
  • Root of Git submodule for fr

    • css
    • html

Advantages

  • Governance issue of first hypothesis is solved. Each locale community is responsible for its own submodule.
  • Each locale has the same "level", English is not a "privileged" locale. Making it easier for the maintenance/editing tooling to chose another locale than en-US for reference content (though not really useful today)
  • Quite easy to build a "health report"/to do stats for a given locale
  • Less friction for migrating existing content with links (since the structure approx. matches Kuma structure)
  • The build process will be locale-agnostic
  • One can add specific CI tooling (ex. locale specific spellchecking) for each locale.

Drawbacks

  • Maintenance tooling ("l10n doc status pages") needs to rely on two repos. One needs to gather commit info somewhere.
  • Makes it harder for mass edits across all of the locales. One "en-US" peer must be aware some changes must be applied globally.
  • Linting tooling/CI may be more complex

Hypothesis #3 - Locale dir. under each section

Have a directory for each locale under each section.

Structure

  • stumptown-experiment (root)
    • content
      • css
        • en-US
        • fr
      • html
        • en-US
        • fr
    • scripts
    • recipes
    • packaged
      • css
        • en-US
        • fr
      • html
        • en-US
        • fr

Advantages

  • Easier to reduce the scope of maintenance to a given section.

Drawbacks

  • More difficult to get the state of a whole locale

Hypothesis #4 - Git branches for each locale

Each locale is managed under a different git branch, en-US content being on the master branch.

Structure

  • master git branch
    • stumptown-experiment
      • content (English content)
        • css
        • html
      • scripts
      • recipes
      • packaged (English built content)
        • css
        • html
  • fr git branch
    • stumptown-experiment
      • content (French content)
        • css
        • html
      • scripts
      • recipes
      • packaged (French built content)
        • css
        • html

Advantages

  • Sounds "interesting" when drawing histories of revisions in a graphical way

Drawbacks

  • How to distinguish "locale" branches and development branches
  • How to mass edit across locales.
  • Apart from using a git concept, does it help with anything or just add more complexity?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment