Skip to content

Instantly share code, notes, and snippets.

@sevein
Last active January 29, 2019 17:27
Show Gist options
  • Save sevein/63e751acd37bde5590d680344ad8b3d5 to your computer and use it in GitHub Desktop.
Save sevein/63e751acd37bde5590d680344ad8b3d5 to your computer and use it in GitHub Desktop.

Introduction

Issues found in the JSON-encoded workflow during CR: artefactual/archivematica#1251.

Also consider archivematica/Issues#45.

Smaller issues (less than 30m each)

Version the document

E.g., the document could include a new attribute indicating the version of the schema.

It could use a semver defintion, e.g.:

{
  "type": "string",
  "pattern": "^(0|[1-9][0-9]*)\\.(0|[1-9][0-9]*)\\.(0|[1-9][0-9]*)(-([0-9A-Za-z-]+\\.)*[0-9A-Za-z-]+)?(\\+([0-9A-Za-z-]+\\.)*[0-9A-Za-z-]+)?$"
}

JSON-Schema document uses draft-6 instead of draft-7

The jsonschema library does not support draft-7 yet but it seems to work out of the box, e.g. I've tried using the new @comment attribute and validation still works.

Add dev docs to schema

Use @comment fields when possible. The latter is only available in draft-7.

See artefactual/archivematica#1251 (comment).

Significant changes

Stop using null values

Estimate: 8h

Instead, make fields non required. This is a big change but feasible. It requires:

  • Changes in json_links so fields with null values are excluded
  • Changes in JSON-schema document so these fields are not listed as required
  • Changes in MCPServer to avoid AttributeError

Null values take up a lot of space in the workflow.json when the schema can dictate what is there, and what can be implied to be null. It would be good to consider making a change that can reduce null values in the json itself in future.

Create typed task manager configs

Estimate: 4h

Chain link config attributes are not typed properly. It should be typed after the manager, i.e. @manager. This is a big change but feasible.

  • Changes in json_links to generate the suggested structure
  • Changes in JSON-Schema document to match the new structure
  • Changes in MCPServer to access the new structure

Normalize translations

Estimate: ?

It's nonoptimal in that the same messages may appear more than once when the same labels are used in more than one entity, e.g. "Approve transfer" is probably used in more than one chain link. Also, it happens in microservice groups where the same message dictionary is used across many chain links.

It's unclear to me how this could be done at this time, it will require further analysis. There's been other attempts to encode translations in JSON (e.g. Chrome's format: https://developer.chrome.com/extensions/i18n-messages) but it's spread across multiple files.

This will require changes in json_links, JSON-Schema document and MCPServer.

Watched directories don't describe whether they are used to start transfers

Estimate: 6h

We need determine how this is going to be expressed in the workflow data. In our initial convo, we discussed the option of adding new attributes to watched_directory so we can tell which of them are used to approve transfers. Solving this problem would remove coupling between the code and the workflow data.

It affects existing functionality that will need to be refactor to take advantage of the new knowledge added to the workflow data, e.g. creation of transfers via package.py or api/views.py contains hard-coded references to workflow.

This will require changes in json_links, JSON-Schema document, MCPServer and Dashboard.

Convert "chains" and "links" into lists

Estimate: 30m-4h (depending on the approach)

It was suggested that we could convert "chains" and "links" (currently in the form of JSON keyed objects) into lists with the purpose of being able to validate the IDs are UUIDs. This will require changes in json_links, JSON-Schema document and MCPServer.

An alternative could be to use patternProperties (JSON-Schema) to validate keys

linkTaskManagerSetUnitVariable should use link_id not chain_id

Estimate: 1h

This is a small issue in the field name, it reads chain_id but it's actually referring to links so it should read link_id instead. Same problem in linkTaskManagerUnitVariableLinkPull?

Find more intuitive name for exit_code.link_id

Estimate: 1h

See https://github.com/artefactual/archivematica/pull/1251/files/e68c925920027ca066a9e200f3b38e5a9a8d06b8#diff-aa6e573d3fff1a051b3b09e2094aba59.

Make English ("en") required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment