-
-
Save amathuria/26f5e9ecfc3f04a70c9795039fdf0c35 to your computer and use it in GitHub Desktop.
osdmap pruning tracker
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Failed osdmap trimming: | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to explicit mon_osd_force_trim_to = 11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to trim_to = 11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim 1~11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim trimming to 11, 10 states | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) trim from 1 to 11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 1 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim full_1 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 2 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 3 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 4 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 5 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 6 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 7 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 8 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 9 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 10 | |
2023-03-31T06:50:26.527+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) compacting prefix osdmap | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 encode_trim_extra including full map for e 11 | |
2023-03-31T06:50:26.527+0000 7fca17031700 10 mon.a@0(leader).osd e1028 | |
2023-03-31T06:50:26.528+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos updating c 3013..3711) trigger_propose not active, will propose later | |
Trimming successfully takes place: | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to explicit mon_osd_force_trim_to = 11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 get_trim_to trim_to = 11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim 1~11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) maybe_trim trimming to 11, 10 states | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxosservice(osdmap 1..1028) trim from 1 to 11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 1 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim full_1 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 2 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 3 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 4 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 5 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 6 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 7 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 8 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 9 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) trim 10 | |
2023-03-31T06:50:41.529+0000 7fca17031700 20 mon.a@0(leader).paxosservice(osdmap 1..1028) compacting prefix osdmap | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 encode_trim_extra including full map for e 11 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).osd e1028 _prune_update_trimmed first 11 last_pinned 921 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos active c 3013..3715) trigger_propose active, proposing now | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos active c 3013..3715) propose_pending 3716 4037 bytes | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos updating c 3013..3715) begin for 3716 4037 bytes | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).paxos(paxos updating c 3013..3715) sending begin to mon.1 | |
2023-03-31T06:50:41.529+0000 7fca17031700 1 -- v1:172.21.15.132:6789/0 send_to--> mon v1:172.21.15.151:6789/0 -- paxos(begin lc 3715 fc 0 pn 12800 opn 0) v4 -- ?+0 0x5628ee866e00 | |
2023-03-31T06:50:41.529+0000 7fca17031700 1 -- v1:172.21.15.132:6789/0 --> v1:172.21.15.151:6789/0 -- paxos(begin lc 3715 fc 0 pn 12800 opn 0) v4 -- 0x5628ee866e00 con 0x5628ed447800 | |
2023-03-31T06:50:41.529+0000 7fca17031700 10 mon.a@0(leader).log v1535 log | |
2023-03-31T06:50:41.530+0000 7fca12027700 10 mon.a@0(leader).paxosservice(osdmap 11..1028) refresh | |
2023-03-31T06:50:41.530+0000 7fca12027700 20 mon.a@0(leader).osd e1028 load_osdmap_manifest osdmap manifest detected in store; reload. | |
2023-03-31T06:50:41.530+0000 7fca12027700 10 mon.a@0(leader).osd e1028 load_osdmap_manifest store osdmap manifest pinned (11 .. 921) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
void OSDMonitor::_prune_update_trimmed( | |
MonitorDBStore::TransactionRef tx, | |
version_t first) | |
{ | |
dout(10) << __func__ | |
<< " first " << first | |
<< " last_pinned " << osdmap_manifest.get_last_pinned() | |
<< dendl; | |
osdmap_manifest_t manifest = osdmap_manifest; | |
if (!manifest.is_pinned(first)) { | |
manifest.pin(first); | |
} | |
set<version_t>::iterator p_end = manifest.pinned.find(first); | |
set<version_t>::iterator p = manifest.pinned.begin(); | |
manifest.pinned.erase(p, p_end); | |
ceph_assert(manifest.get_first_pinned() == first); | |
if (manifest.get_last_pinned() == first+1 || | |
manifest.pinned.size() == 1) { | |
// we reached the end of the line, as pinned maps go; clean up our | |
// manifest, and let `should_prune()` decide whether we should prune | |
// again. | |
tx->erase(get_service_name(), "osdmap_manifest"); | |
return; | |
} | |
bufferlist bl; | |
manifest.encode(bl); | |
tx->put(get_service_name(), "osdmap_manifest", bl); | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
void PaxosService::maybe_trim() | |
{ | |
dout(10) << __func__ << " trimming to " << trim_to << ", " << to_remove << " states" << dendl; | |
MonitorDBStore::TransactionRef t = paxos.get_pending_transaction(); | |
trim(t, first_committed, trim_to); | |
put_first_committed(t, trim_to); | |
cached_first_committed = trim_to; | |
// let the service add any extra stuff | |
encode_trim_extra(t, trim_to); | |
paxos.trigger_propose(); | |
} |
@rzarzynski yeah even I'm not sure if we should be updating cached_first_committed after the transaction goes through. I am going to look around a little more to see if there is a reason we update cached_first_committed first.
Then we can decide whether we should change the point where cached_first_committed is updated or change the test to wait for osdmap_manifest.first_pinned to get updated https://github.com/ceph/ceph/blob/main/qa/workunits/mon/test_mon_osdmap_prune.sh#L35 instead of osdmap_first_committed.
yeah even I'm not sure if we should be updating cached_first_committed after the transaction goes through. I am going to look around a little more to see if there is a reason we update cached_first_committed first.
👍
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for the root cause analysis, @amathuria. It looks very sane!
It also brings a new question: is this an actual code defect or just a unit test's bug, basically a wrong assumption?
The getter has plenty of users across the monitor's code and I wonder whether we aren't updating
cached_first_committed
too early. I don't have a conclusive answer yet. For now I'm linking the RCA to the ticket. Thanks!Just for ease of reading applied some markdown formatting (`code identifier`).
In the function
maybe_trim()
,cached_first_committed
is updated totrim_to
. The_prune_update_trimmed
function (called as part ofencode_trim_extra
) is the one that updates theosdmap_manifest
'sfirst_pinned
value.However if
Paxos
is not active and the proposal does not happen, thecached_first_committed
has been updated without a trim happening.get_first_committed()
returns thecached_first_committed
value.In the
test_mon_osdmap_prune
test, we determine that a trim is complete by checking thatosdmap_first_committed
= number of maps trimmed.We also expect
osdmap_first_committed
to be equal toosdmap_manifest.first_pinned
, which would be true if the trim had gone through.However when the trim proposal fails, we receive the new value for
osdmap_first_committed
(ascached_first_committed
has been updated) and the old value forosdmap_manifest.first_pinned
since the trim never happened.NOTE:
cached_first_committed
gets updated to thefirst_committed
value stored in MonDB if there is a refresh. When the proposal goes through later and trimming actually takes place,osdmap_manifest.first_pinned
andget_first_committed()
return the same value.@amathuria