Skip to content

Instantly share code, notes, and snippets.

@jamieparkinson
Created October 7, 2020 13:55
Show Gist options
  • Save jamieparkinson/f49e97c673af1604d7de8052cbd7edc3 to your computer and use it in GitHub Desktop.
Save jamieparkinson/f49e97c673af1604d7de8052cbd7edc3 to your computer and use it in GitHub Desktop.
Ingest/merger versioning

Desired behaviour for ingest versioning

Works

Given a work already in the index:

  • The same work with a higher transformer version always overrides it
  • The same work with the same transformer version and more sources merged in overrides it
  • The same work with the same transformer version which have been unlinked from some/all sources overrides it
  • The same work with the same transformer version which has been redirected overrides it

Images

Given an image already in the index:

  • The same image with a higher transformer version always overrides it
  • The same image with the same transformer version but a higher transformer version of any source always overrides it
  • The same image with the same transformer version and the same transformer version of all sources, but where any of the sources have more sources merged in, overrides it
  • The same image with the same transformer version and a new redirected source overrides it

All but the first point are really just applying the works rules to the works in image.source, but:

  1. This can't be done "literally" as the image source works are just work data and do not contain the work state directly
  2. The image.source can contain 2 works (canonical and redirected) - but perhaps we can just apply the rules to the canonical work as this will (?) contain the information about redirects?
@alicefuzier
Copy link

alicefuzier commented Oct 7, 2020

The same work with the same transformer version and more sources merged in overrides it

Why is that though? The matcher sends works that may be merged with their versions, so I don't think we add anything by enforcing that the number of sources goes up. If anything we make the unlinking case harder

@jamieparkinson
Copy link
Author

Good point - I'll try to work through the example which led me to that originally and see what the "real" issue is:

  • A Miro work comes in via the transformer. It reaches the matcher and passes straight through as the Sierra work that refers to it hasn't been seen yet.
  • The merger creates a work and an image from that Miro source work. The image has source.canonicalWork referring to the work created at this stage.
  • This image and work are ingested. All is well.
  • The Sierra work comes in and the matcher picks up that it's linked to the Miro work. Both are passed to the merger.
  • The merger merges the Miro work into the Sierra work and creates a redirect for the Miro work. It also creates a new image with source.canonicalWork referring to the merged Sierra work and source.redirectedWork referring to the Miro work that will now be redirected.
  • The redirect and the merged Sierra work are ingested. However, in order for the new image to be ingested it needs the changed source state to be reflected in its ingest version.
  • Extending this, if the canonical (Sierra) work is then merged with something else (METS or whatever), this needs to be reflected as well - this is where I got the number of sources thing from.

@alicefuzier
Copy link

ah, I see. I think that too could be solved by state though. It means complicating the infrastructure but if we could have a store, just plain key/value, where we keep the latest version emitted for every sourceidentifier.
The first time the merger emits a work, the version is 0. Any time after that it increments the version that it sees in the store. Regardless of whether the operation was linking or unliking. This would solve the image version issue too I believe?

@jamieparkinson
Copy link
Author

I think you're right, yes. Do you think that's possible to do within the matcher store (since it already exists)?

@alicefuzier
Copy link

Uhm don't think so.. the matcher doesn't know if something gets merged and it also doesn't keep a history. It keeps the current state of the graph but it doesn't keep a record of how it changed, so it doesn't know how many versions of a work have been emitted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment