Skip to content

Instantly share code, notes, and snippets.

@Matan-B
Created November 23, 2023 10:07
Show Gist options
  • Save Matan-B/83157a6119587eadbf9998f960827cef to your computer and use it in GitHub Desktop.
Save Matan-B/83157a6119587eadbf9998f960827cef to your computer and use it in GitHub Desktop.

Reproduce the scenario (Steps 1-3) and apply the fix (Step 4):

1) Create clone object:

Create pool (single pg, no autoscale):

ceph osd pool create <pool_id> 1 1 --autoscale-mode=off

Put an object:

rados -p <pool_id> put objectone <obj_1>

Make a snapshot:

rados -p <pool_id> mksnap <snap_name>

Re-put the object (create clone object):

rados -p <pool_id> put objectone <obj_2>

Sleep 5 (let the changes take place)

A clone object is now created (See rados df) and a correspoding SnapMapper entry in the db.

2) Leak the clone object:

Stop the OSD.

Find the relevant entry keys (with the URL encoded prefix):

ceph-kvstore-tool bluestore-kv <store path> list p |  grep SNA_

Store the value in any place (Use the entire key, with the URL prefix included):

ceph-kvstore-tool bluestore-kv <store path> get p <key-listed-in-previous-command> out <val-path>

Remove the entry:

ceph-kvstore-tool bluestore-kv <store path> rm  p <key-listed-in-previous-command>

Start the OSD.

Remove the snapshot:

rados -p <pool_id> rmsnap <snap_name>

Sleep 5 (let the changes take place)

Verify that the clone object is leaked (rados df).

Stop the Cluster:

3) Omit OSD purged_snaps keys

Find PSN keys (same for all OSDs):

ceph-kvstore-tool bluestore-kv <store path> list p 2>&1 | grep PSN

Remove the PSN_ key from the OSDs (for each OSD):

ceph-kvstore-tool bluestore-kv <store path> rm p <PSN_ key>

Restore the SNA_ key in the OSDs (for each OSD):

ceph-kvstore-tool bluestore-kv <store path> set p <key-listed-in-previous-command> in <val-path>

Optional:

Verify scenario:

ceph-monstore-tool <mon store path> dump-keys 2>&1 | grep osd_snap
ceph-kvstore-tool bluestore-kv <osd store path> list p 2>&1 | grep PSN

Purged snaps keys should exist in the monitor but not in the OSDs.


4) Apply Fix (1/2)

Start the cluster:

Run Scrub purged Snaps (for each OSD):

ceph daemon osd.<id> scrub_purged_snaps

Sleep 5 (let the changes take place)

Verify that the clone object is still leaked (rados df).


Reset purged snaps epoch (for each OSD):

ceph tell osd.<id> reset_purged_snaps_last

Restart OSDs to let the fix take place

The monitor should share the purged snaps keys with the OSDs.


Optional (offline cluster):

Verify that purged_snap keys exist in the OSDs after the applying the fix:

ceph-kvstore-tool bluestore-kv <osd store path> list p 2>&1 | grep PSN

4) Apply Fix (2/2)

Run Scrub purged Snaps (for each OSD):

ceph daemon osd.<id> scrub_purged_snaps

Sleep 5 (let the changes take place)

Verify that the clone object is now deleted leaked (rados df) and the leak is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment