Skip to content

Instantly share code, notes, and snippets.

@mwiencek
Created March 28, 2024 16:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mwiencek/1fead215ac810ee519b6e799fa967e38 to your computer and use it in GitHub Desktop.
Save mwiencek/1fead215ac810ee519b6e799fa967e38 to your computer and use it in GitHub Desktop.
artwork-indexer Deployment Plan

artwork-indexer Deployment Plan

Switching services

  • Create the artwork_indexer schema and install the new triggers. This will cause events to be inserted into the artwork_indexer.event_queue table that duplicate existing CAA-indexer events in RabbitMQ.

  • Drop the CAA-indexer functions/triggers. This will cause new events to cease being pushed into RabbitMQ.

  • Allow the caa-indexer container to process all remaining events in RabbitMQ.

  • Once all events are processed, we can make note of the last event that was processed (available in the docker logs) and delete any already-handled events prior to and including it from the artwork_indexer.event_queue table.

  • A new artwork-indexer container can be started to continue processing events from the artwork_indexer.event_queue table.

Cleanup

Once all CAA-indexer queues are empty (including those for retries), we can drop them from RabbitMQ and remove the caa-indexer container.

@yvanzo
Copy link

yvanzo commented Mar 28, 2024

Create the artwork_indexer schema

If that is just for the CAA, shouldn’t it be cover_art_indexer or caa_indexer instead?
Otherwise, I thought that we were going to use images in general circumstances?

Once all events are processed, we can make note of the last event that was processed (available in the docker logs) and delete any already-handled events prior to and including it from the artwork_indexer.event_queue table.

How does that go practically? Are all the event numbers in logs? In ascending order? From where to delete [prior] events? By include do you mean an INSERT statement?

Also, is there any issue with processing twice an event? When event for retries are processed?

Cleanup
Once all CAA-indexer queues are empty (including those for retries), we can drop them from RabbitMQ and remove the caa-indexer container.

Maybe wait for a few days to see if the new indexer isn’t experiencing major production issues?

@mwiencek
Copy link
Author

mwiencek commented Mar 28, 2024

If that is just for the CAA, shouldn’t it be cover_art_indexer or caa_indexer instead?

It's actually for both the CAA and EAA. Events for both projects are stored in the same table: artwork_indexer.event_queue.

Otherwise, I thought that we were going to use images in general circumstances?

Well, the schema is just named after the project, artwork-indexer. It's all created by https://github.com/metabrainz/artwork-indexer/blob/master/sql/create_schema.sql

How does that go practically? Are all the event numbers in logs? In ascending order? From where to delete [prior] events? By include do you mean an INSERT statement?

The CAA-indexer logs primarily only show index events and the contents of index.json files that are produced:

[debug] Produced {"images":[{"approved":false,"back":false,"comment":"","edit":110222291,"front":true,"id":38411743322,"image":"http://coverartarchive.org/release/832935ec-2df3-4823-a794-b42144e9e5c2/38411743322.jpg","thumbnails":{"1200":"http://coverartarchive.org/release/832935ec-2df3-4823-a794-b42144e9e5c2/38411743322-1200.jpg","250":"http://coverartarchive.org/release/832935ec-2df3-4823-a794-b42144e9e5c2/38411743322-250.jpg","500":"http://coverartarchive.org/release/832935ec-2df3-4823-a794-b42144e9e5c2/38411743322-500.jpg","large":"http://coverartarchive.org/release/832935ec-2df3-4823-a794-b42144e9e5c2/38411743322-500.jpg","small":"http://coverartarchive.org/release/832935ec-2df3-4823-a794-b42144e9e5c2/38411743322-250.jpg"},"types":["Front"]}],"release":"https://musicbrainz.org/release/832935ec-2df3-4823-a794-b42144e9e5c2"}
[info] Upload of index.json succeeded
[info] Upload of metadata.xml succeeded

However, this contains enough information (the release MBID) to identify the same index event in the artwork_indexer.event_queue table.

We'd delete duplicate events from that same table. I meant that all events would be deleted "prior to and including [the last logged CAA-indexer event]," but I constructed the sentence poorly. :) So no insertions needed.

Also, is there any issue with processing twice an event? When event for retries are processed?

There's no issue with processing index events twice. Other types of events may fail, but won't cause any harm either outside of having to clean up the failed events.

For the CAA-indexer, events in the retry queue are processed four hours after insertion IIRC.

Maybe wait for a few days to see if the new indexer isn’t experiencing major production issues?

Yes, no hurry to clean up everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment