Persistent bucket notifications are a very useful and powerful feature. To learn more about it, you can look at this tech talk and usecase example.
Persistent notifications are usually better that synchronous notification, due to several reasons:
- the queue they are using is, in fact, a RADOS object. This gives the queue the reliability level of RADOS
- they do not add the delay of sending the notification to the broker to the client request round trip time
- they allow for temporary disconnects with the broker or broker restarts without affecting the service
- they have a time and attempts retry mechanism
However, they can pose a performance issue - the notifications regarding a specific bucket are written to a single RADOS queue, and therefore handled by a single OSD.
While the actual objects are written to RADOS object that are sharded across multiple OSD. So, even though the notification objects are relatively small (usually under 1K) they do not enjoy the parallelism of writing the objects.
Which mean that in case that small objects are written to the bucket, the overhead of the notifications is considerable. In this project, our goal would be to create a sharded bucket notifications queue, to allow for better performance of sending persistent bucket notifications.
First would be to have a Linux based development environment, as a minimum you would need a 4 CPU machine, with 8G RAM and 50GB disk. Unless you already have a Linux distro you like, I would recommend choosing from:
- Fedora (40/41) - my favorite!
- Ubuntu (22.04 LTS)
- WSL (Windows Subsystem for Linux), though it would probably take much longer...
- RHEL9/Centos9
- Other Linux distros - try at your own risk :-)
Once you have that up and running, you should clone the Ceph repo from github (https://github.com/ceph/ceph). If you don_t know what github and git are, this is the right time to close these gaps :-) And yes, you should have a github account, so you can later share your work on the project.
Install any missing system dependencies use:
./install-deps.sh
Note that, the first build may take long time, so the following cmake
parameter could be used to minimize the build time.
With a fresh ceph clone use the following:
./do_cmake.sh -DBOOST_J=$(nproc) -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_MGR_DASHBOARD_FRONTEND=OFF \
-DWITH_DPDK=OFF -DWITH_SPDK=OFF -DWITH_SEASTAR=OFF -DWITH_CEPHFS=OFF -DWITH_RBD=OFF -DWITH_KRBD=OFF -DWITH_CCACHE=OFF
if the build
directory already exists, you can rebuild the ninja files by using (from build
):
cmake -DBOOST_J=$(nproc) -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_MGR_DASHBOARD_FRONTEND=OFF \
-DWITH_DPDK=OFF -DWITH_SPDK=OFF -DWITH_SEASTAR=OFF -DWITH_CEPHFS=OFF -DWITH_RBD=OFF -DWITH_KRBD=OFF -DWITH_CCACHE=OFF ..
Then invoke the build process (using ninja) from within the build
directory (created by do_cmake.sh
).
Assuming the build was completed successfully, you can run the unit tests (see: https://github.com/ceph/ceph#running-unit-tests).
Now you are ready to run the ceph processes, as explained here: https://github.com/ceph/ceph#running-a-test-cluster You probably would also like to check the developer guide (https://docs.ceph.com/docs/master/dev/developer_guide/) and learn more on how to build Ceph and run it locally (https://docs.ceph.com/docs/master/dev/quick_guide/). Ceph's bucket notification documentation:
- https://docs.ceph.com/en/latest/radosgw/notifications/
- notification as part of the bucket operations API: https://docs.ceph.com/en/latest/radosgw/s3/bucketops/#create-notification
- S3 compatibility: https://docs.ceph.com/en/latest/radosgw/s3-notification-compatibility/ [4/4636]
Run bucket notification tests for persistent notifications using an HTTP endpoint:
- start the vtsart cluster:
$ MON=1 OSD=1 MDS=0 MGR=0 RGW=1 ../src/vstart.sh -n -d
- on a separate terminal start an HTTP endpoint:
$ wget https://gist.githubusercontent.com/mdonkers/63e115cc0c79b4f6b8b3a6b797e485c7/raw/a6a1d090ac8549dac8f2bd607bd64925de997d40/server.py
$ python server.py 10900
- install the awc cli tool
- configure the tool according to the access and secret keys showing in the output of the
vstart.sh
command - set the region to
default
- create a persistent topic pointing to the above HTTP endpoint:
$ aws --endpoint-url http://localhost:8000 sns create-topic --name=fishtopic \
--attributes='{"push-endpoint": "http://localhost:10900", "persistent": "true"}'
- create a bucket:
$ aws --endpoint-url http://localhost:8000 s3 mb s3://fish
- create a notification on that bucket, pointing to the above topic:
$ aws --endpoint-url http://localhost:8000 s3api put-bucket-notification-configuration --bucket fish \
--notification-configuration='{"TopicConfigurations": [{"Id": "notif1", "TopicArn": "arn:aws:sns:default::fishtopic", "Events": []}]}'
Leaving the event list empty is equivalent to setting it to
["s3:ObjectCreated:*", "s3:ObjectRemoved:*"]
- create a file, and upload it:
$ head -c 512 </dev/urandom > myfile
$ aws --endpoint-url http://localhost:8000 s3 cp myfile s3://fish
- on the HTTP terminal, see the JSON output of the notifications
Try to address one of these (relatively) small features:
Please provide a draft PR with your code (does not have to be a complete implementation of the feature).
- sharded implementation of persistent topic queue
- stretch goal: performance test proving performance improvement
- shards creation should happen when the topic and queue are created
- shard ownership should be implemented similarly to how queue ownership is implemented
- should a single RGW own all shards of a queue, or show we allow split ownership?
- we should find the right shards when making the reservation
- we should hash an identifier from the notification into a number and calculate modulo of that number by the number of shards
- hashing should distribute uniformly regardless of the values of the identifier
- we should decide which field(s) to use for the hash. At a minimum we should avoid reordering of notifications of a single object
- the number of shards should be a config option
- what should we do when this number is changed
- do we want to allow changes to existing queues or only new ones?
- should we act differently when the number increase/decrease?
- should we handle migration of existing queues? Or apply sharding only to new queues?
Hello there, I have made an attempt at addressing issue #68788.
This is the link to the draft PR.