msoutopico/pisa-tt-handover.md

## pisa-tt-handover.md

      
    Raw
  

              pisa-tt-handover.md
            
          
    PISA 2025 -- list of tech tasks


Period: 18 December 2023 -- 7 January 2024

Backups: Gergoe, Kos, Adrien

Updates: check revisions of this document

History


Date
Task
Comment


2024-01-29
Task 2a
Updated instructions to update the en-ZZ base TM


2024-03-19
Task 2b
Updated instructions to update the fr-ZZ base TM


2024-03-23
Task 2a
Updated instructions to update the en-ZZ base TM


Table of contents

General info

Batches
Workflow steps
Team projects and repos
PB adaptation project

Tasks

Source files technical signoff -- Manuel
Update base versions -- Adrien

English master version
French master version
Chinese common reference version


Source updates
Initialization and setup of OmegaT projects -- Gergoe
Testing MoM -- Manuel
UI Translations -- Kos/Adrien
Trend Transfer -- Kos
Move target files to final repo -- Gergoe
Helpdesk -- @all
Set up reconciliation project -- Kos/Gergoe
Add files to batch folders -- Manuel
Arrange TMs after batch transition -- Gergoe

General info

Batches

Files are organized and travel through workflows in batches. Batches are defined in this monitoring sheet PISA2025ft-batches. This ile must be considered as the source of truth about how files and batches are named as well as what batch each unit belongs to.
File source/files.yaml in the common repo should reflect the information in the monitoring sheet, and should be updated if any changes are made to the above (updates are done manually, for now -- script to automate it welcome).
Workflow steps

There are different workflow types and they have different steps. They are defined in any of these two:

220202_PISA25_Workflow_master_CONS
https://github.com/capstanlqc/mk-omegat-team-projs/blob/master/config/workflow_steps.yaml

Again, the file workflow_steps.yaml (which is a config file for our app) must be in sync with the monitoring above.
Team projects and repos

Production (FT) team projects are hosted in AWS CodeCommit, in domain https://git-codecommit.eu-central-1.amazonaws.com/v1/repos/. Repo names start with pisa_2025ft_ and file names start with PISA_2025FT_.
Testing (staging) team projects are hosted in AWS CodeCommit, in the same domain as production team projects. Unlike in production, repo names in staging start with pisa_2025stg_ and file names start with PISA_2025STG_. If any files from production (FT) must be used in staging for testing, they must be renamed accordingly.
There is one main repo for each step for each locale. They have the following URL template: pisa_2025ft_translation_{LOCALE}_{STEP}.git .
Each main repository hosts an OmegaT team project, which pulls source files, config files and language assets from the common repo: pisa_2025ft_translation_common.git.
For the purposes of persistent previews, final target files are to be pushed to the final repo: pisa_2025ft_translation_final.git.
For the inital phase in the trend transfer task, we use our own team projects, hosted on Github (organization: capstanlqc-pisa).
PB adaptation project

This project is a bit special in a few regards:

the target language is not really a language (pb stands for "paper based")
the source text is the computer-based version (CBA) and the target version is the paper-based version (PBA)
files have been added one by one (one repository mapping per file) rather than by adding a full batch (in order to remove a file from the project, the repository mapping for that file must be removed or commented out in the project settings file, i.e. omegat.project)
Dara requests to move target files from this project often to the final repo.


Tasks

TASK 1. Source files technical signoff

Responsible person: Manuel
Whenever there are updates in the source folder of the common repo, a number of actions are required for the technical signoff of the source files. In this context, "updates" means files being pushed to that repo, under /source/batch1, and those files can be new files that are being released to the repo after being authored, or a new version of already released files.
The technical signoff involves the following steps:

reviewing the new files or the new parts in re-released files
fixing (or linting) the identified issues (scripts are ready for that)
copying the files to the correct batch folder

We can skip step 1 in this handover, and be confident that there will be no more issues other than the ones that have been already identified.
Then, to fix the issues that have already been identified in previous reviews of already released files, we have a script that runs a series of string subtitutions based on the selected configuration file. All the necessary code and config files are available here: https://github.com/capstanlqc/source-xml-linter


If there are updates in files belonging to batch 05_QQA_N, the linting script must be run with config config_qqa_zwsp.xlsx


If there are updates in files belonging to any "new" batch (i.e. any batch starting with 01 .. 06 and ending with _N), the normal config config.xlsx must be used.


If there are updates in "trend" files, the config config_trend.xlsx must be used. Trend files belong to batches ending with _T and can be recognized because they normally have a unit ID that follows pattern _P?[RMS]\d{3}
where the optional P indicates that it's the PBA version and R/M/S is the initial of the domain (reading, math, science).


What I normally do to run the script above is:

Create a folder and copy there only the files that I want to lint, e.g. tolint
Run the script using the path to the tolint folder as the input argument
As output argument, use the path to the folder where I want to write the linted files, e..g linted
Review the action of the script just to make sure no unexpected damage happened (the best way to do this check is to open the file in OmegaT, and if it's a new version of an already released file, then it's handing to add the file to a project containing the translations of all segments in the previous version to see what changes and becomes untranslated -- other than that, a diff comparison is useful)
If everything is okay in the linted files, copy them or move them from the linted folder into the corresponding batch folder (according to the info in files.yaml).

In other words, after activating the virtual environment and installing dependencies, I do:
app=/path/to/local/repo
tolint=/path/to/the/files/tolint
linted=/path/to/the/files/linted
python $app/str_subs.py -i $tolint -o $linted -c $app/config.xlsx
Additionally, there's a separate script for a different kind of issue with with named entities and escaped hex entity references:

If there are updates in "trend" files, any eventual entity issues must be removed with script decode_entities.sh (which uses entities.json as config)

In the video below, to avoid confusion please skip or ignore the part between 11:05 and 14:37. Sorry about that.


TASK 2. Update base versions

This task also includes generating the target files from the prepare-files step of adapting versions (en-*, fr-*, zh-*).
a) en-ZZ

This must be done when a new batch is released or the files in an already released batch are updated.

Action point for @Eli or @Tanya: mention in our Skype's PISA25 TWG chat group that a new batch is released and tag @Adrien and @Manuel

Some countries which have English locales adapt the English master, e.g. en-PS. en-* projects have a repository mapping in their settings that adds file tm/auto/base/en-ZZ.tmx. The remote version of the file is in the common repo, on path assets/base/en-ZZ.tmx.zip. This file needs to be updated with every new batch released.
So, every time a new batch is released to countries:

Add the new batch (mapping) to the pisa_2025ft_translation_en-ZZ_prepare-files project (note: from the capstanlqc-pisa github organization).
Pack the project as pisa_2025ft_translation_en-ZZ_prepare-files_OMT to have an offline version. Unpack the offline version of the projec and close it.
Run the following command:
java -jar /path/to/omegat/build/install/OmegaT/OmegaT.jar /path/to/omegat/project --config-dir=/path/to/config/dir --mode=console-createpseudotranslatetmx --pseudotranslatetmx=/path/to/omegat/project/tm/auto/en-ZZ.tmx --pseudotranslatetype=equal


Re-open the project to confirm that all segments are pre-translated with the source text. You can search for regex ^(.+)\ue000(?!\1).+$ in both source and target to find any segments where the translation is different from the source.
Zip en-ZZ.tmx and commit the new base TM en-ZZ.tmx.zip to pisa_2025ft_translation_common/assets/base/en-ZZ.tmx.zip overwriting the file there (use commit message "Update English master base TM").

Finally, run code/commit_target_files.sh for en-* locales and for the new/updated batch.
Caveats


This project is hosted in the capstanlqc-pisa github organization but is pulling all files (except QQ units) from the pisa_2025ft_translation_common repo on AWS.


Questionnaire batches are added to pisa_2025ft_translation_en-ZZ_prepare-files directly in the source folder rather than through a mapping because the files in the pisa_2025ft_translation_common repo on AWS have filtering properties that would affect what is exposed for en-ZZ. If the en-ZZ base TM must be updated, the files in the common repo must be copied to the pisa_2025ft_translation_en-ZZ_prepare-files > source and modified to remove all filtering properties (e.g. remove everything matched by regex  its:(localeFilterList|localeFilterType)="[^"]+").


b) fr-ZZ


Add new batch to project: https://git-codecommit.eu-central-1.amazonaws.com/v1/repos/pisa_2025ft_translation_fr-ZZ_signoff.git
Make sure that all segments are pre-translated and press Ctrl+D to generate the master TM
Remove any changeid, changedate, creationid and/or creationdate properties from entries in the new master TM.

Tip: replace (<tuv lang="(?:en|fr|zh-Hant)-ZZ")[^>]+ with $1 (first captured group)


Rename pisa_2025ft_translation_fr-ZZ_signoff-omegat.tmx as fr-ZZ.tmx and zip it. You need both fr-ZZ.tmx and fr-ZZ.tmx.zip.
Replace both fr-ZZ.tmx and fr-ZZ.tmx.zip in pisa_2025ft_translation_common/assets/base/ with the files generated in the preview step.

Finally, run code/commit_target_files.sh for fr-* locales.
c) zh-Hant-ZZ


Add new batch to project: https://github.com/capstanlqc-pisa/pisa_2025ft_translation_zh-Hant-ZZ_signoff.git
Make sure that all segments are pre-translated and press Ctrl+D to generate the master TM
Remove any changeid, changedate, creationid and/or creationdate properties from entries in the new master TM.

Tip: replace (<tuv lang="(?:en|fr|zh-Hant)-ZZ")[^>]+ with $1 (first captured group)


Rename pisa_2025ft_translation_zh-Hant-ZZ_signoff-omegat.tmx as zh-Hant-ZZ.tmx and zip it.
Replace pisa_2025ft_translation_common/assets/base/zh-Hant-ZZ.tmx.zip with the version generated in the preview step.

Finally, run code/commit_target_files.sh for `zh-* locales.
TASK 3. Source updates

Source files might be updated for whatever reason during the project -- that means that a new file is pushed from TAO to the common repo inside source/batch1, overwriting a previous version if it exists. For example, this may happen after errata are fixed.
Any new files need to be linted and signed off again as described in task 1 above, just as it was done with the original version. Then the base versions need to be updated too as explained in task 2.

Signoff / lint source files [task #1]
Update en-ZZ base version [task #2]
Add batch again to fr-ZZ final-proofreading and zh-Hant-ZZ proofreading projects
Update fr-ZZ and zh-Hant-ZZ base versions [task #2]

Step 3 above is done by adding the repository mapping in the project settings file (e.g. omegat.project) of those two projects and it's necessary if the batch containing the updated files was already proofread some time ago and therefore removed from those two projects. Only if the batch is added will the proofreader have access to the files and be able to edit the translations.
TASK 4. Initialization and setup of OmegaT projects

This is an application to create and/or set up OmegaT team projects according to the information indicated in the translation workflow monitoring sheet.

Main repo: https://github.com/capstanlqc/mk-omegat-team-projs
translation workflow monitoring sheet

The readme file in the repo explains how to use it. The docs folder contains links that explain how to set up setting up package git-remote-codecommit.
TASK 5. Testing MoM


Responsible: @Kos

If this task is requested, let's discuss it.
https://github.com/capstanlqc/its-filter-validation/
TASK 6. UI Translations


Responsible: @Kos

Nothing else to do unless there are issues or any unforeseen additional request.
https://rentry.org/ui_translation_repos
TASK 7. Trend transfer


Responsible: @Gergoe / @Kos?

Our linguists are working on our team projects hosted on Github. When the trend transfer is complete for one locale, we must transfer those translations to the AWS repos.
General info:

The URLs of our repos on Github are listed here: https://rentry.org/github-trend-transfer-repos (created by Kos with this script).
The URLs of the repos on AWS have the following name template: https://git-codecommit.eu-central-1.amazonaws.com/v1/repos/pisa_2025ft_translation_{LOCALE}_trend-prepare-files.git
The linguists follow these instructions: https://capps.capstan.be/doc/pisa2025_trend-transfer_guide.php

ACER has created the repos and we must create the OmegaT projects in them. That has been done only for 5 locales: ar-IL, de-AT, ja-JP, ru-UZ, th-TH.

pisa_2025ft_translation_ar-IL_trend-prepare-files
pisa_2025ft_translation_de-AT_trend-prepare-files
pisa_2025ft_translation_ja-JP_trend-prepare-files
pisa_2025ft_translation_ru-UZ_trend-prepare-files
pisa_2025ft_translation_th-TH_trend-prepare-files

When the trend transfer is completed for one version, the steps now are (with examples on the command line for ja-JP):


Clone the target AWS repo:
git clone https://git-codecommit.eu-central-1.amazonaws.com/v1/repos/pisa_2025ft_translation_ja-JP_trend-prepare-files.git


If the team project is not created in the AWS repo, create it with OmegaT CLI and set it up correctly (the 5 projects above can be used as templates/models):
/opt/omegat/OmegaT_5.7.2/jre/bin/java -jar /opt/omegat/OmegaT_5.7.2/OmegaT.jar team init en ja-JP

Then, make the appropriate changes in the omegat.project file (e.g. addding repository mappings, etc. just like in the five projects that were created first, mentioned above).


Download the github repo
gh repo clone capstanlqc-pisa/pisa_2025ft_transfer_ja-JP_trend-prepp


Copy the working TM (omegat/project_save.tmx) from the github repo to the AWS project as tm/auto/PISA_{LOCALE}_MS2022_trend25.tmx and commit changes on the AWS repo:
cp pisa_2025ft_transfer_ja-JP_trend-prepp/omegat/project_save.tmx pisa_2025ft_translation_ja-JP_trend-prepare-files/tm/auto/PISA_ja-JP_MS2022_trend25.tmx
cd pisa_2025ft_translation_ja-JP_trend-prepare-files
git add . && git commit -m "Added TM with transferred trend version" && git push


Download the team project on AWS and commit target files:

Download https://git-codecommit.eu-central-1.amazonaws.com/v1/repos/pisa_2025ft_translation_ja-JP_trend-prepare-files.git in OmegaT
Commit target files


Copy target files to the final repo (see below how to do this)


TASK 8. Move target files to final repo

Steps:

Go to your local copy of the project repo
Sync with remote version (git pull)
Make a copy of the target folder, eg. final
Change directory to that folder
Flatten the structure of files so that all files are now at the same level
Remove directories (which are now empty)
Remove the locale extesion from all files
Move all the renamed files to the location /translations/{LOCALE}/batch1/ in the final repo
Go to the final repo and push the new files

To move target files to the final repo, you can use this script:
https://gist.github.com/msoutopico/4bbe0ac90b71f709a4f5d8fc3bdf91c1
For other language versions, you can adapt the script above accordingly, or just read through it to confirm what steps are necessary.
TASK 9. Helpdesk

We might get tickets from users who get the upgrade wrong. Get familiar with the upgrading instructions:
https://capstanlqc.github.io/omegat-guides/verification/install-and-setup/
New: no manual customization is needed for users who install OmegaT 5.7.2 from scratch. The configuration script is included in the installer and runs automatically when OmegaT is run if the scripts folder hasn't been customized yet and set to the user config folder.
TASK 10. Set up reconciliation project (zh-Hant-ZZ)


Responsible PM: Tanya Sonolenko

Resposible TT: @Kos (when he has creds, otherwise Gergeo to push the two TMs)

Two translators are producing zh-Hant-ZZ translations of a certain batch in offline projects. When they are done, they will hand back project packages.
Steps:

Unpack those two projects
In each of them, press Ctrl+D to produce the master TM
Rename those two TMs as {BATCH}_zh-Hant-ZZ_T1.tmx and {BATCH}_zh-Hant-ZZ_T2.tmx respectively.
Commit those two TMs to folder tm/rec/in the reconciliation project: https://git-codecommit.eu-central-1.amazonaws.com/v1/repos/pisa_2025ft_translation_zh-Hant-ZZ_reconciliation.git

TASK 11. Add files to batch folders


Responsible person: Manuel

This only needs to be done after there are source updates and they go through technical signoff. The script below can be run to sort files in their correct batch folders:
Script: https://gist.github.com/msoutopico/72cee9a221860fedb9f876372ffc8e80
Improvements todo:

Parameters source_dir, root and config are currently hardcoded. It would be nice to add source_dir as a CLI argument (the other two parameters are based on that one) so that the script doesn't need to be edited when running by different people.

TASK 12. Arrange TMs after batch transition


Responsible person: Gergoe

This is an action that is expected from ACER, but they don't have a working implementation yet. In the meantime, we can do this manually. The sequence of steps must be the following:

A batch transition (a batch is added to or remove from a certain step)
We (TTT) arrange TMs at that step according to the new batches at that step
The user can then download the project with the new TM arrangement

To performe step 2 above, follow these steps:

After the batch transition, clone or sync the repository where the omegat project for that step is hosted
Run the script arrange_tmx_files_with_extension on the repository, and push changes.

The script is available in two flavours: Python and Node.js (both have been provided to ACER but James' team will base their feature in the Node.js version).
Run as:
python arrange_tmx_files_with_extension.py /path/to/local/clone

or
node arrange_tmx_files_with_extension.js /path/to/local/clone

Remember to install dependencies first.
Date	Task	Comment
2024-01-29	Task 2a	Updated instructions to update the en-ZZ base TM
2024-03-19	Task 2b	Updated instructions to update the fr-ZZ base TM
2024-03-23	Task 2a	Updated instructions to update the en-ZZ base TM