Skip to content

Instantly share code, notes, and snippets.

Basic Bucket Logging Testing

  • to enable our extension to the API when using python (boto3 or aws CLI) the following file has to be placed under: ~/.aws/models/s3/2006-03-01/ (the directory should be created if it dioes not exist)
  • currently there is no generic solution for other client SDKs
  • start a vstart cluster
  • create a bucket:
aws --endpoint-url http://localhost:8000 s3 mb s3://fish
  • create a log bucket:

how much time did this refasctoring work took?

_ Looking at the timestamps in our conversation, the refactoring work took approximately 45-50 minutes from start to finish.

Timeline Breakdown:

Phase 1: Analysis & Planning (5-10 minutes)

  • Understanding the codebase structure
  • Identifying the RGWOp class hierarchy
  • Finding all canonical_name() overrides

this is a followup on: https://gist.github.com/yuvalif/d50bcc3d27121732ffbbafe7b7146112 for an RGW of version "squid" or above, where bucket notifications and topics are synced between zones

  • deploy a zonegroup with 2 zones:
MON=1 OSD=1 MDS=0 MGR=0 ../src/test/rgw/test-rgw-multisite.sh 2
  • export credentials:
export AWS_ACCESS_KEY_ID=1234567890
-- Lua script to auto-tier S3 object PUT requests
-- based on this: https://ceph.io/en/news/blog/2024/auto-tiering-ceph-object-storage-part-2/
-- exit script quickly if it is not a PUT request
if Request == nil or Request.RGWOp ~= "put_obj" then
return
end
local threshold = 1024*1024 -- 1MB
local debug = true
  • start a vstart cluster
  • created a tenanted user:
bin/radosgw-admin user create --display-name "Ka Boom" --tenant boom --uid ka --access_key ka --secret_key boom
  • create a bucket on that tenant
AWS_ACCESS_KEY_ID=ka AWS_SECRET_ACCESS_KEY=boom aws --endpoint-url http://localhost:8000 s3 mb s3://fish
  • create a log bucket with no tenant

Warm and Fuzzy

Background

The RGW's frontend is an S3 REST API server, and in this project we would like to use a REST API fuzzer to test the RGW for security issues (and other bugs). Would recommend exploring the Restler tool. Very good intro in this video. Feed it with the AWS S3 OpenAPI spec, and see what happens when we let it connect to the RGW.

Project

Initial (evaluation) Phase

  • run Ceph with a radosgw. you can use cephadm to install and run ceph in containers or build it from source and run it a vstart cluster

The More the Merrier

Background

Persistent bucket notifications are a very useful and powerful feature. To learn more about it, you can look at this tech talk and usecase example.

Persistent notifications are usually better that synchronous notification, due to several reasons:

  • the queue they are using is, in fact, a RADOS object. This gives the queue the reliability level of RADOS
  • they do not add the delay of sending the notification to the broker to the client request round trip time
  • they allow for temporary disconnects with the broker or broker restarts without affecting the service
  • they have a time and attempts retry mechanism
################################################################################################################
# Define the settings for the rook-ceph cluster with common settings for a small test cluster.
# All nodes with available raw devices will be used for the Ceph cluster. One node is sufficient
# in this example.

Test

this test assumes ceph cluster with RGW is deployed via vstart

  • create the "log' bucket:
aws --endpoint-url http://localhost:8000 s3 mb s3://all-logs

Standard Mode

  • create a bucket for standard logging:

Phase 0

  • draft PR
  • initial PR
  • initial test PR

Phase 1

  • add "flush" REST API call to fix the issue of lazy-commit. use POST /<bucket name>/?logging as the command
  • add admin command to get bucket logging info: radosgw-admin bucket logging get
  • handle copy correctly:
  • in "Journal" mode, we should just see the "PUT" of the new object (existing behavior)