Skip to content

Instantly share code, notes, and snippets.

@yuvalif
Last active September 13, 2023 14:14
Show Gist options
  • Save yuvalif/15af72d2abf9493cc83f56b94ae5f9cb to your computer and use it in GitHub Desktop.
Save yuvalif/15af72d2abf9493cc83f56b94ae5f9cb to your computer and use it in GitHub Desktop.

notification retries

  • limit number of reties (ephemeral)
  • notification TTL (persistent)
  • spacing the retries (ephemeral)
  • global config via options

phase 1

  • per topic config via REST
  • notification TTL migration
  • documentation
  • integration tests
  • retry migration

use cls FIFO

  • replace cls 2pc queue with cls FIFO
  • support maximum size for cls FIFO
  • implement simple reserve/commit mechanism (that allows for one FIFO node overshoot)
  • migration
  • support topic stats (approximate) for topic with cls FIFO
  • unit tests
  • integration tests should agnostic of underlying queue
  • performance tests

topic stats

  • via radosgw-admin
  • number of pending reservations
  • number of entries
  • size in bytes
  • integration tests
  • fix to the phase 0 PR

phase 1

  • via labeled perf counters
  • end2end demo through prometheus and grafana
@yuvalif
Copy link
Author

yuvalif commented Aug 9, 2023

inputs from the RGW refactoring call (9-aug-23):

notification retries

dead-letter queue

  • add a dead-letter queue that would hold only failed notifications
  • to avoid performance degradation of the system when persistent notifications must use a fast media pool
  • this often has limited size, so we might want to use a slower media pool for the dead-letter queue and allow for bigger queue
  • would probably be a followup project

use cls FIFO

migration

  • we can add a per bucket operation that would disconnect the existing topic from the bucket, create a new topic and associate with the bucket
  • we would need to find a way to use the same topic name at amqp/kafka level, since the subscribers of the topic would not know when to subscribe to the new topic (might be a good idea to allow this extra freedom in topic creation regrdless)
  • another option would be "seamless" migration, where the code would handle the two types of queues for the same topic. this would be more complex to implement and hard to debug. and would also need additional conf parameter to indicate when this mode is not needed anymore

FIFO limit

  • we should add a limit (in bytes), since unlike TTL and retires, going over the limit would push back to the client, and won't "slinetly" drop
  • limit does not have to be enforced accuratly, and could be implemented by limiting the number of nodes in the queue
  • this should preserve the "reserve/commit" semantics we currently have, so we don't have an issue with the atomicity of the RADOS operation and the operation with the queue

topic stats

  • the size of the FIFO could be estimated by the number of nodes X size of a node + the actual size of the last node
  • the number of entries in the FIFO will have to be maintained in each node, and the command to fetch it will have to traverse all the nodes in the FIFO

topic stats

  • add the user information (the user that created the topic) into the topic data
  • when listing all topics, allow filtering by user (both in radosgw-admin and REST)
  • add topic information to the bucket stats. should be possible, similarly to how we do cascade deletion of all notifications of a buket, when a bucket is deleted
  • add bucket information to the topic stats (may not be possible without maintaining a new list inside an object)

@yuvalif
Copy link
Author

yuvalif commented Aug 10, 2023

@yuvalif
Copy link
Author

yuvalif commented Sep 7, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment