Skip to content

Instantly share code, notes, and snippets.

@amathuria
Last active April 27, 2023 07:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amathuria/26f5e9ecfc3f04a70c9795039fdf0c35 to your computer and use it in GitHub Desktop.
Save amathuria/26f5e9ecfc3f04a70c9795039fdf0c35 to your computer and use it in GitHub Desktop.
osdmap pruning tracker
Failed osdmap trimming:
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to explicit mon_osd_force_trim_to = 11
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to trim_to = 11
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim 1~11
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim trimming to 11, 10 states
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) trim from 1 to 11
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 1
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim full_1
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 2
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 3
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 4
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 5
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 6
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 7
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 8
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 9
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 10
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) compacting prefix osdmap
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 encode_trim_extra including full map for e 11
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028
2023-03-31T06:50:26.528+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos updating c 3013..3711) trigger_propose not active, will propose later
Trimming successfully takes place:
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to explicit mon_osd_force_trim_to = 11
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to trim_to = 11
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim 1~11
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim trimming to 11, 10 states
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) trim from 1 to 11
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 1
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim full_1
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 2
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 3
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 4
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 5
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 6
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 7
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 8
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 9
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 10
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) compacting prefix osdmap
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 encode_trim_extra including full map for e 11
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 _prune_update_trimmed first 11 last_pinned 921
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos active c 3013..3715) trigger_propose active, proposing now
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos active c 3013..3715) propose_pending 3716 4037 bytes
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos updating c 3013..3715) begin for 3716 4037 bytes
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos updating c 3013..3715) sending begin to mon.1
2023-03-31T06:50:41.529+0000 7fca17031700 1 -- v1:172.21.15.132:6789/0 send_to--> mon v1:172.21.15.151:6789/0 -- paxos(begin lc 3715 fc 0 pn 12800 opn 0) v4 -- ?+0 0x5628ee866e00
2023-03-31T06:50:41.529+0000 7fca17031700 1 -- v1:172.21.15.132:6789/0 --> v1:172.21.15.151:6789/0 -- paxos(begin lc 3715 fc 0 pn 12800 opn 0) v4 -- 0x5628ee866e00 con 0x5628ed447800
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).log v1535 log
2023-03-31T06:50:41.530+0000 7fca12027700 10 mon.a@0(leader).paxosservice(osdmap 11..1028) refresh
2023-03-31T06:50:41.530+0000 7fca12027700 20 mon.a@0(leader).osd e1028 load_osdmap_manifest osdmap manifest detected in store; reload.
2023-03-31T06:50:41.530+0000 7fca12027700 10 mon.a@0(leader).osd e1028 load_osdmap_manifest store osdmap manifest pinned (11 .. 921)
void OSDMonitor::_prune_update_trimmed(
MonitorDBStore::TransactionRef tx,
version_t first)
{
dout(10) << __func__
<< " first " << first
<< " last_pinned " << osdmap_manifest.get_last_pinned()
<< dendl;
osdmap_manifest_t manifest = osdmap_manifest;
if (!manifest.is_pinned(first)) {
manifest.pin(first);
}
set<version_t>::iterator p_end = manifest.pinned.find(first);
set<version_t>::iterator p = manifest.pinned.begin();
manifest.pinned.erase(p, p_end);
ceph_assert(manifest.get_first_pinned() == first);
if (manifest.get_last_pinned() == first+1 ||
manifest.pinned.size() == 1) {
// we reached the end of the line, as pinned maps go; clean up our
// manifest, and let `should_prune()` decide whether we should prune
// again.
tx->erase(get_service_name(), "osdmap_manifest");
return;
}
bufferlist bl;
manifest.encode(bl);
tx->put(get_service_name(), "osdmap_manifest", bl);
}
void PaxosService::maybe_trim()
{
dout(10) << __func__ << " trimming to " << trim_to << ", " << to_remove << " states" << dendl;
MonitorDBStore::TransactionRef t = paxos.get_pending_transaction();
trim(t, first_committed, trim_to);
put_first_committed(t, trim_to);
cached_first_committed = trim_to;
// let the service add any extra stuff
encode_trim_extra(t, trim_to);
paxos.trigger_propose();
}
@amathuria
Copy link
Author

In the function maybe_trim(), cached_first_committed is updated to trim_to. The _prune_update_trimmed function (called as part of encode_trim_extra) is the one that updates the osdmap_manifest's first_pinned value.
However if Paxos is not active and the proposal does not happen, the cached_first_committed has been updated without a trim happening.
get_first_committed() returns the cached_first_committed value.
In the test_mon_osdmap_prune test, we determine that a trim is complete by checking that osdmap_first_committed = number of maps trimmed.
We also expect osdmap_first_committed to be equal to osdmap_manifest.first_pinned, which would be true if the trim had gone through.
However when the trim proposal fails, we receive the new value for osdmap_first_committed (as cached_first_committed has been updated) and the old value for osdmap_manifest.first_pinned since the trim never happened.

NOTE: cached_first_committed gets updated to the first_committed value stored in MonDB if there is a refresh. When the proposal goes through later and trimming actually takes place, osdmap_manifest.first_pinned and get_first_committed() return the same value.
@amathuria

@amathuria
Copy link
Author

@rzarzynski
Copy link

Thanks for the root cause analysis, @amathuria. It looks very sane!
It also brings a new question: is this an actual code defect or just a unit test's bug, basically a wrong assumption?

void PaxosService::maybe_trim()
{
  cached_first_committed = trim_to;
...
  paxos.trigger_propose();
  /**
   * Get the first committed version
   *
   * @returns Our first committed version (that is available)
   */
  version_t get_first_committed() const{
    return cached_first_committed;
  }

The getter has plenty of users across the monitor's code and I wonder whether we aren't updating cached_first_committed too early. I don't have a conclusive answer yet. For now I'm linking the RCA to the ticket. Thanks!


Just for ease of reading applied some markdown formatting (`code identifier`).

In the function maybe_trim(), cached_first_committed is updated to trim_to. The _prune_update_trimmed function (called as part of encode_trim_extra) is the one that updates the osdmap_manifest's first_pinned value.
However if Paxos is not active and the proposal does not happen, the cached_first_committed has been updated without a trim happening.
get_first_committed() returns the cached_first_committed value.
In the test_mon_osdmap_prune test, we determine that a trim is complete by checking that osdmap_first_committed = number of maps trimmed.
We also expect osdmap_first_committed to be equal to osdmap_manifest.first_pinned, which would be true if the trim had gone through.
However when the trim proposal fails, we receive the new value for osdmap_first_committed (as cached_first_committed has been updated) and the old value for osdmap_manifest.first_pinned since the trim never happened.

NOTE: cached_first_committed gets updated to the first_committed value stored in MonDB if there is a refresh. When the proposal goes through later and trimming actually takes place, osdmap_manifest.first_pinned and get_first_committed() return the same value.
@amathuria

@amathuria
Copy link
Author

@rzarzynski yeah even I'm not sure if we should be updating cached_first_committed after the transaction goes through. I am going to look around a little more to see if there is a reason we update cached_first_committed first.
Then we can decide whether we should change the point where cached_first_committed is updated or change the test to wait for osdmap_manifest.first_pinned to get updated https://github.com/ceph/ceph/blob/main/qa/workunits/mon/test_mon_osdmap_prune.sh#L35 instead of osdmap_first_committed.

@rzarzynski
Copy link

yeah even I'm not sure if we should be updating cached_first_committed after the transaction goes through. I am going to look around a little more to see if there is a reason we update cached_first_committed first.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment