Skip to content

Instantly share code, notes, and snippets.

@sevein
Last active November 21, 2016 15:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sevein/75732d85e129348dc32e6c4b15982bf8 to your computer and use it in GitHub Desktop.
Save sevein/75732d85e129348dc32e6c4b15982bf8 to your computer and use it in GitHub Desktop.
Internationalization of Archivematica workflow data

Internationalization of Archivematica workflow data

Redmine: https://projects.artefactual.com/issues/8161

Intro

There is a piece of work particularly challenging in the Archivematica I18N project that this document tries to break down into smaller units of work. Archivematica's workflow data is currently kept in a relational database managed by the Dashboard. It includes information about the microservices, including human-readable strings (labels) that are shown in the user interface. These strings need to be translated into other languages other than English.

The workflow data is currently considered static data not editable by the user. Keeping it in the database makes it hard to maintain (e.g. changes require database migrations) and hard to translate. This document suggests pulling this workflow data our of the database and store it in a simple JSON file - simplest way I can think of but with the goal of making much easier to introduce new way to describe our workflow rules in the future. This is considered an implementation detail specific to MCPServer that we want to encapsulate - in other words, we want to hide this mechanism from the consumers: MCPClient and Dashboard. MCPServer will expose a new API based on the gRPC protocol.

gRPC's Python implementation offers us the possibility to build scalable API client and servers without extra gateway interfaces like gunicorn or Apache/mod_wsgi. gRPC implements its own C shared library used in many of their platform-specific libraries like C++, Ruby, Python, PHP or NodeJS.

Index

Project plan

Groups 1) and 2) could be developed in parallel.

1️⃣ MCPServer API

🕙 Estimation: 40h

  • MCPServer: implement gRPC-based MCPServer
  • archivematicaCommon: implement gRPC-based MCPServer client
  • Dashboard: replace contrib.mcp.client.MCPClient with new gRPC-based MCPServer client
  • Dashboard: replace uses for MCPServer models, more here: workflow-models-in-consumers.md

2️⃣ JSON Archivematica Workflow with i18n support

JAW reader

🕙 Estimation: 30h

  • Define schema based in jsonschema
  • JAW: Build json-based workflow reader library (requirement: i18n-aware human readable strings)
  • JAW: [OPTIONAL] Add extra tooling: e.g. links-graph, links-edit, ...
Transifex integration

🕙 Estimation: 30 hours

  • JAW: add message extraction + Transifex publishing
  • JAW: add Translation pull task

3️⃣ Update Archivematica

🕙 Estimation: 20h

  • MCPServer: include workflow.json seed in distribution (see chain-links.json)
  • MCPServer: take new --workflow-file (should default to the one provided in dist)
  • MCPServer: update task managers (hard) and directory watcher
  • Dashboard: delete models, migrate database (eliminate unused tables)

Development notes

Workflow models used outside MCPServer

Model name Coupling
TaskType Not shared
MicroServiceChain Dashboard (low)
MicroServiceChainChoice Dashboard (low)
MicroServiceChainLink Dashboard (low)
MicroServiceChoiceReplacementDic Dashboard (low), MCPClient (low)
StandardTaskConfig Dashboard (low)
MicroServiceChainLinkExitCode Not shared
TaskConfigAssignMagicLink Not shared
TaskConfigSetUnitVariable Not shared
TaskConfigUnitVariableLinkPull Not shared
WatchedDirectory Not shared

chain-links.json (generated with devtools) includes the data from all the models listed above. It's a proof of concept.

Workflow models used in MCPServer

Once the workflow data is encoded in JSON, we'll need a thin layer to read from it. The following is the list of cases in the MCPServer where workflow data is accessed. Understanding how the current models are used will help to shape the API of the module replacing it.

archivematicaMCP.py

  • WatchedDirectory: simple all()

jobChain.py

  • MicroServiceChain: simple get() without FKL

jobChainLink.py

  • MicroServiceChainLink: simple get(id=?) with FKL(currenttask=>TaskConfig)

  • TaskType: simple get(id=?)

  • MicroServiceChainLinkExitCode: simple get(microservicechainlink_id=?, exitcode=?)

linkTaskManagerAssignMagicLink.py

  • TaskConfigAssignMagicLink: simple get(id=?)

linkTaskManagerChoice.py

  • MicroServiceChainChoice: simple filter(choiceavailableatlink_id=?)

linkTaskManagerDirectories.py

  • StandardTaskConfig: simple get(id=?)

linkTaskManagerFiles.py

  • StandardTaskConfig: simple get(id=?)

linkTaskManagerGetMicroserviceGeneratedListInStdOut.py

  • StandardTaskConfig: simple get(id=?)

linkTaskManagerGetUserChoiceFromMicroserviceGeneratedList.py

  • StandardTaskConfig: simple get(id=?)

linkTaskManagerReplacementDicFromChoice.py

  • MicroServiceChoiceReplacementDic: simple filter(choiceavailableatlink=?)

linkTaskManagerSetUnitVariable.py

  • TaskConfigSetUnitVariable: simple get(id=?)

linkTaskManagerUnitVariableLinkPull.py

  • TaskConfigUnitVariableLinkPull: simple get(id=?)

Processing models used in MCPServer

They are all models frequently used from Dashboard and/or MCPClient.

  • UserProfile
  • UnitVariable
  • File
  • Job
  • SIP
  • Task
  • Transfer

Database functions databaseFunctions.{logTaskCreatedSQL,logTaskCompletedSql,lobJobCreatedSQL,createSIP} (imported from archivematicaCommon) used in the MCPServer interact with the models above.

Fields in MCPServer models needing i18n

From workflow models:

  • MicroServiceChain.description (e.g. "Skip quarantine", "Store AIP", "Yes", "No", "Approve transfer")
  • MicroServiceChainLink.microserviceGroup (e.g. "Quarantine", "Prepare AIP")
  • MicroServiceChoiceReplacementDic.description (e.g. "Identify using Fido", "7 - maximum compression")
  • MicroServiceChainLinkExitCode.exitMessage (e.g. "Completed successfully")

The Job processing model keeps copies of seome of these values in Job.jobType and Job.microserviceGroup. We'll look into that later. The Job should likely instead reference to the corresponding chain link.

Usage of workflow models in other microservices

Notice that we're ignoring cases where processing models are used.

Dashboard: components/administration/{forms,views}.py

This is where we generate the processing configuration form, which involves mostly the MicroServiceChoiceReplacementDic and MicroserviceChainChoice models.

Solution: use new RPCs to populate forms. I'll build a RPC to retrieve details from any chain link (MSCL) regardless its type and configuration. The RPC should be able to serialize embedding different message types.

Also, this is the place where we use MicroServiceChoiceReplacementDic to store the configuration of DIP upload (AT, ATK). StandardTaskConfig is used to store AtoM's configuration.

Solution: use DashboardSetting.objects.{get,set}_dict

Dashboard: components/api/views.py

The MicroServiceChainChoice model is used to extract the available choices for a specific transfer that needs to be approved. Once the right choice is determined, this view asks MCPServer to proceed with the choice via MCPClient.

Solution: use new RPC to approve transfers.

Dashboard: components/ingest/views_as.py

The MicroServiceChoiceReplacementDic model is used to extract the configuration parameters to set up the AS client.

Solution: use DashboardSetting.objects.{get,set}_dict

Dashboard: components/ingest/views_atk.py

The MicroServiceChoiceReplacementDic model is used to extract the configuration parameters to set up the ATK client.

Solution: use DashboardSetting.objects.{get,set}_dict

Dashboard: main/forms.py

Dead code, not needed.

Solution: artefactual/archivematica#529.

MCPClient: lib/clientScripts/dip_generation_helper.py

The MicroServiceChoiceReplacementDic model is used to extract the configuration parameters to set up the AS client.

Solution: use DashboardSetting.objects.{get,set}_dict

{
"mcp": true,
"objects": [
{
"directory": "data",
"id": "ad33890c-55c7-42e6-87e9-cd4366a039b1",
"jobs": [
{
"currentstep": "Completed successfully",
"microservicegroup": "Failed transfer",
"subjobof": "",
"timestamp": "1477053122.9002850000",
"type": "Move to the failed directory",
"uuid": "070238dd-5e0f-43e9-8b62-263f76db1db3"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Failed transfer",
"subjobof": "",
"timestamp": "1477053119.1954610000",
"type": "Email fail report",
"uuid": "fc8ad4f3-299c-4fa5-bdac-85b5677c1948"
},
{
"currentstep": "Failed",
"microservicegroup": "Assign file UUIDs and checksums",
"subjobof": "",
"timestamp": "1477052091.8755110000",
"type": "Assign checksums and file sizes to objects",
"uuid": "8faf6425-42e4-4ce2-a853-dbc965b69c0b"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Assign file UUIDs and checksums",
"subjobof": "",
"timestamp": "1477049895.5044030000",
"type": "Assign file UUIDs to objects",
"uuid": "15f2ad74-6ffe-4d3a-b909-fb429811f346"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Assign file UUIDs and checksums",
"subjobof": "",
"timestamp": "1477049894.3064290000",
"type": "Set file permissions",
"uuid": "506889ec-6285-4862-a449-b9ec819706c1"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Include default Transfer processingMCP.xml",
"subjobof": "",
"timestamp": "1477049893.8868000000",
"type": "Include default Transfer processingMCP.xml",
"uuid": "10203b96-e266-44f2-a068-9b2ce9d5b207"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Rename with transfer UUID",
"subjobof": "",
"timestamp": "1477049892.7750450000",
"type": "Rename with transfer UUID",
"uuid": "a252de5b-2fc1-4d67-9b3d-6ccd7a0b60be"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Verify transfer compliance",
"subjobof": "",
"timestamp": "1477049892.1042560000",
"type": "Verify mets_structmap.xml compliance",
"uuid": "ac61c061-00f4-4fb7-9d20-4a910fa6f3d1"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Verify transfer compliance",
"subjobof": "",
"timestamp": "1477049891.8154010000",
"type": "Verify transfer compliance",
"uuid": "688a4d86-2747-4713-8ead-d0d5a10b44dd"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Verify transfer compliance",
"subjobof": "",
"timestamp": "1477049890.7011870000",
"type": "Attempt restructure for compliance",
"uuid": "7c11923a-bbd4-4385-893c-751762dbdc6d"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Verify transfer compliance",
"subjobof": "",
"timestamp": "1477049094.9984680000",
"type": "Remove unneeded files",
"uuid": "4d897307-7b6e-4493-b433-a5fd063aea98"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Verify transfer compliance",
"subjobof": "",
"timestamp": "1477049094.8226930000",
"type": "Remove hidden files and directories",
"uuid": "c475aaf1-611b-4479-af6c-ab2c51b6e251"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Verify transfer compliance",
"subjobof": "",
"timestamp": "1477049093.7722900000",
"type": "Set transfer type: Standard",
"uuid": "fe1e6432-e318-4749-9963-edc00b6d7be3"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Verify transfer compliance",
"subjobof": "",
"timestamp": "1477049092.3934260000",
"type": "Move to processing directory",
"uuid": "4773549b-e266-4019-a5aa-346657c57800"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Verify transfer compliance",
"subjobof": "",
"timestamp": "1477049091.4555820000",
"type": "Set file permissions",
"uuid": "498d2437-e286-4eee-ad55-5c3062cd03db"
},
{
"currentstep": "Completed successfully",
"microservicegroup": "Approve transfer",
"subjobof": "",
"timestamp": "1477035177.4652960000",
"type": "Approve standard transfer",
"uuid": "3e9769b4-dabd-4f82-902e-82ebe2e1d9b7"
}
],
"timestamp": 1477053122,
"uuid": "ad33890c-55c7-42e6-87e9-cd4366a039b1"
}
]
}
"""
This example is being drafted after the use cases described in: Workflow
models used in MCPServer (above). The jaw module aims to replace the current
code that makes use of the ORM to access the workflow data.
~~ WORK IN PROGRESS ~
Class index:
class TaskBase
├── class StandardTaskBase
│ ├── class OneInstanceTask
│ ├── class ForEachFileTask
│ ├── class GetMicroserviceGeneratedListInStdOutTask
│ └── class GetUserChoiceFromMicroserviceGeneratedList
├── class ChoiceTask
├── class ReplacementDictFromChoiceTask
├── class SetUnitVariableTask
├── class PullUnitVariableTask
├── class AssignMagicLinkTask
└── class LoadMagicLinkTask
class WatchedDirectory
"""
import logging
from jaw import v1 as jawreader
# Read workflow data from a file.
wow = jawreader.fromfile('my_custom_workflow.json')
# English is the default, but you can change it at any time.
wow.set_preferred_language('en')
# Our current workflow data has this notion of watched directories as a
# mechanism to wire different processing chains but also a way to list the
# different locations from where you can start a transfer depending on the
# type of package you are submitting. This should be cleaned up in the future.
for item in wow.watched_directories():
logging.debug('Watched directory '%s' starts processing in task '%s' (%s).', item.path, item.task.name)
# Watched directory 'activeTransfers/standardTransfer' starts processing in task 'Approve standard transfer' (0c94e6b5-4714-4bec-82c8-e187e0c04d77)
task_uuid = '0c94e6b5-4714-4bec-82c8-e187e0c04d77'
task = wow.task_by_uuid(task_uuid) # This could raise a jawreader.TaskNotFound
# task.uuid => "0c94e6b5-4714-4bec-82c8-e187e0c04d77"
# task.description (i18n) => "Approve standard transfer"
# task.group (i18n) => "Approve transfer"
# task.fallback_task => It could be None or some lazy-loaded Task you can follow
# task.__class__.__name__ =>
# isinstance(task, TaskBase) => True
# isinstance(task, StandardTaskBase) => False
# isinstance(task, ChoiceTask) => True
if isinstance(task, ChoiceTask):
for choice in task.choices:
logging.debug('Choice %s => task %s', choice.name, choice.task_uuid)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment