-
-
Save amathuria/26f5e9ecfc3f04a70c9795039fdf0c35 to your computer and use it in GitHub Desktop.
Failed osdmap trimming: | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to explicit mon_osd_force_trim_to = 11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to trim_to = 11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim 1~11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim trimming to 11, 10 states | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) trim from 1 to 11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 1 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim full_1 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 2 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 3 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 4 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 5 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 6 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 7 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 8 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 9 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 10 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) compacting prefix osdmap | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 encode_trim_extra including full map for e 11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 | |
2023-03-31T06:50:26.528+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos updating c 3013..3711) trigger_propose not active, will propose later | |
Trimming successfully takes place: | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to explicit mon_osd_force_trim_to = 11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to trim_to = 11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim 1~11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim trimming to 11, 10 states | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) trim from 1 to 11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 1 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim full_1 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 2 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 3 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 4 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 5 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 6 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 7 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 8 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 9 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 10 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) compacting prefix osdmap | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 encode_trim_extra including full map for e 11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 _prune_update_trimmed first 11 last_pinned 921 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos active c 3013..3715) trigger_propose active, proposing now | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos active c 3013..3715) propose_pending 3716 4037 bytes | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos updating c 3013..3715) begin for 3716 4037 bytes | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos updating c 3013..3715) sending begin to mon.1 | |
2023-03-31T06:50:41.529+0000 7fca17031700 1 -- v1:172.21.15.132:6789/0 send_to--> mon v1:172.21.15.151:6789/0 -- paxos(begin lc 3715 fc 0 pn 12800 opn 0) v4 -- ?+0 0x5628ee866e00 | |
2023-03-31T06:50:41.529+0000 7fca17031700 1 -- v1:172.21.15.132:6789/0 --> v1:172.21.15.151:6789/0 -- paxos(begin lc 3715 fc 0 pn 12800 opn 0) v4 -- 0x5628ee866e00 con 0x5628ed447800 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).log v1535 log | |
2023-03-31T06:50:41.530+0000 7fca12027700 10 mon.a@0(leader).paxosservice(osdmap 11..1028) refresh | |
2023-03-31T06:50:41.530+0000 7fca12027700 20 mon.a@0(leader).osd e1028 load_osdmap_manifest osdmap manifest detected in store; reload. | |
2023-03-31T06:50:41.530+0000 7fca12027700 10 mon.a@0(leader).osd e1028 load_osdmap_manifest store osdmap manifest pinned (11 .. 921) |
void OSDMonitor::_prune_update_trimmed( | |
MonitorDBStore::TransactionRef tx, | |
version_t first) | |
{ | |
dout(10) << __func__ | |
<< " first " << first | |
<< " last_pinned " << osdmap_manifest.get_last_pinned() | |
<< dendl; | |
osdmap_manifest_t manifest = osdmap_manifest; | |
if (!manifest.is_pinned(first)) { | |
manifest.pin(first); | |
} | |
set<version_t>::iterator p_end = manifest.pinned.find(first); | |
set<version_t>::iterator p = manifest.pinned.begin(); | |
manifest.pinned.erase(p, p_end); | |
ceph_assert(manifest.get_first_pinned() == first); | |
if (manifest.get_last_pinned() == first+1 || | |
manifest.pinned.size() == 1) { | |
// we reached the end of the line, as pinned maps go; clean up our | |
// manifest, and let `should_prune()` decide whether we should prune | |
// again. | |
tx->erase(get_service_name(), "osdmap_manifest"); | |
return; | |
} | |
bufferlist bl; | |
manifest.encode(bl); | |
tx->put(get_service_name(), "osdmap_manifest", bl); | |
} |
void PaxosService::maybe_trim() | |
{ | |
dout(10) << __func__ << " trimming to " << trim_to << ", " << to_remove << " states" << dendl; | |
MonitorDBStore::TransactionRef t = paxos.get_pending_transaction(); | |
trim(t, first_committed, trim_to); | |
put_first_committed(t, trim_to); | |
cached_first_committed = trim_to; | |
// let the service add any extra stuff | |
encode_trim_extra(t, trim_to); | |
paxos.trigger_propose(); | |
} |
Thanks for the root cause analysis, @amathuria. It looks very sane!
It also brings a new question: is this an actual code defect or just a unit test's bug, basically a wrong assumption?
void PaxosService::maybe_trim()
{
cached_first_committed = trim_to;
...
paxos.trigger_propose();
/**
* Get the first committed version
*
* @returns Our first committed version (that is available)
*/
version_t get_first_committed() const{
return cached_first_committed;
}
The getter has plenty of users across the monitor's code and I wonder whether we aren't updating cached_first_committed
too early. I don't have a conclusive answer yet. For now I'm linking the RCA to the ticket. Thanks!
Just for ease of reading applied some markdown formatting (`code identifier`).
In the function maybe_trim()
, cached_first_committed
is updated to trim_to
. The _prune_update_trimmed
function (called as part of encode_trim_extra
) is the one that updates the osdmap_manifest
's first_pinned
value.
However if Paxos
is not active and the proposal does not happen, the cached_first_committed
has been updated without a trim happening.
get_first_committed()
returns the cached_first_committed
value.
In the test_mon_osdmap_prune
test, we determine that a trim is complete by checking that osdmap_first_committed
= number of maps trimmed.
We also expect osdmap_first_committed
to be equal to osdmap_manifest.first_pinned
, which would be true if the trim had gone through.
However when the trim proposal fails, we receive the new value for osdmap_first_committed
(as cached_first_committed
has been updated) and the old value for osdmap_manifest.first_pinned
since the trim never happened.
NOTE: cached_first_committed
gets updated to the first_committed
value stored in MonDB if there is a refresh. When the proposal goes through later and trimming actually takes place, osdmap_manifest.first_pinned
and get_first_committed()
return the same value.
@amathuria
@rzarzynski yeah even I'm not sure if we should be updating cached_first_committed after the transaction goes through. I am going to look around a little more to see if there is a reason we update cached_first_committed first.
Then we can decide whether we should change the point where cached_first_committed is updated or change the test to wait for osdmap_manifest.first_pinned to get updated https://github.com/ceph/ceph/blob/main/qa/workunits/mon/test_mon_osdmap_prune.sh#L35 instead of osdmap_first_committed.
yeah even I'm not sure if we should be updating cached_first_committed after the transaction goes through. I am going to look around a little more to see if there is a reason we update cached_first_committed first.
👍
In the function maybe_trim(), cached_first_committed is updated to trim_to. The _prune_update_trimmed function (called as part of encode_trim_extra) is the one that updates the osdmap_manifest's first_pinned value.
However if Paxos is not active and the proposal does not happen, the cached_first_committed has been updated without a trim happening.
get_first_committed() returns the cached_first_committed value.
In the test_mon_osdmap_prune test, we determine that a trim is complete by checking that osdmap_first_committed = number of maps trimmed.
We also expect osdmap_first_committed to be equal to osdmap_manifest.first_pinned, which would be true if the trim had gone through.
However when the trim proposal fails, we receive the new value for osdmap_first_committed (as cached_first_committed has been updated) and the old value for osdmap_manifest.first_pinned since the trim never happened.
NOTE: cached_first_committed gets updated to the first_committed value stored in MonDB if there is a refresh. When the proposal goes through later and trimming actually takes place, osdmap_manifest.first_pinned and get_first_committed() return the same value.
@amathuria