Skip to content

Instantly share code, notes, and snippets.

View Ndpnt's full-sized avatar

Nicolas Dupont Ndpnt

  • France
View GitHub Profile
@Ndpnt
Ndpnt / README.md
Last active November 14, 2022 14:07
[POC] Open Terms Archive versions history cleaning script

⚠️ This script is an experimental proof of concept for versions history cleaning

Along the life of an instance, unsatisfactory versions of documents might be extracted from snapshots. For example, they might be changes unrelated to terms, or empty documents, or change language… Such unsatisfactory versions decrease the value of the dataset: it becomes impossible to measure the actual number of changes, for example.

Reviewing and cleaning the dataset entails correcting the history of declarations, identifying some snapshots to skip, and extracting new versions from the snapshots based on this information. In the end, the whole versions history will be rewritten and overwritten. The declarations will be completed. All the original snapshots are left unchanged and the previous state of the versions is still available, allowing auditability.

This script recreates a history of versions from existing snapshots and declarations, based on the current configuration.

It allows to review generated version

Test

Keybase proof

I hereby claim:

  • I am ndpnt on github.
  • I am ndpnt (https://keybase.io/ndpnt) on keybase.
  • I have a public key ASAZynNJpqF9tHYBYKgVzzcfEwukAID5THtqf-eiUl4gkQo

To claim this, I am signing this object: