Skip to content

Instantly share code, notes, and snippets.

@Matan-B
Last active April 15, 2024 12:01
Show Gist options
  • Save Matan-B/02e88c375760c1cb29f87d39de9122c9 to your computer and use it in GitHub Desktop.
Save Matan-B/02e88c375760c1cb29f87d39de9122c9 to your computer and use it in GitHub Desktop.

https://pulpito.ceph.com/matan-2024-04-11_11:23:27-crimson-rados-wip-matanb-crimson-replicated-testing-distro-crimson-smithi/7652234/

Error racing read got wrong version on smithi11232367-65:

2024-04-11T12:39:31.043 INFO:tasks.rados.rados.0.smithi112.stdout:65: write initial oid 65
2024-04-11T12:39:31.043 INFO:tasks.rados.rados.0.smithi112.stdout:65:  seq_num 64 ranges {675648=546760,2012024=568349}
2024-04-11T12:39:31.052 INFO:tasks.rados.rados.0.smithi112.stdout:65:  writing smithi11232367-65 from 675648 to 1222408 tid 1
2024-04-11T12:39:31.064 INFO:tasks.rados.rados.0.smithi112.stdout:65:  writing smithi11232367-65 from 2012024 to 2580373 tid 2
2024-04-11T12:39:31.210 INFO:tasks.rados.rados.0.smithi112.stdout:65:  finishing write tid 1 to smithi11232367-65
2024-04-11T12:39:31.237 INFO:tasks.rados.rados.0.smithi112.stdout:65:  finishing write tid 2 to smithi11232367-65

2024-04-11T12:39:31.250 INFO:tasks.thrashosds.thrasher:Killing osd 3, live_osds are [0, 1, 2, 3]

2024-04-11T12:39:36.322 INFO:tasks.thrashosds.thrasher:Reviving osd 3

2024-04-11T12:39:43.108 INFO:tasks.rados.rados.0.smithi112.stdout: waiting on 16
2024-04-11T12:39:43.108 INFO:tasks.rados.rados.0.smithi112.stdout:65:  finishing write tid 3 to smithi11232367-65
2024-04-11T12:39:43.111 INFO:tasks.rados.rados.0.smithi112.stdout:65:  finishing write tid 4 to smithi11232367-65
2024-04-11T12:39:43.111 INFO:tasks.rados.rados.0.smithi112.stdout:65:  oid 65 updating version 0 to 11
2024-04-11T12:39:43.111 INFO:tasks.rados.rados.0.smithi112.stdout:65:  oid 65 version 11 is already newer than 0
2024-04-11T12:39:43.111 INFO:tasks.rados.rados.0.smithi112.stdout:65:  oid 65 version 11 is already newer than 10
2024-04-11T12:39:43.111 INFO:tasks.rados.rados.0.smithi112.stdout:update_object_version oid 65 v 11 (ObjNum 64 snap 0 seq_num 64) dirty exists
2024-04-11T12:39:43.111 INFO:tasks.rados.rados.0.smithi112.stderr:Error: racing read on 65 returned version 12 rather than version 11

From osd3 (was killed and revived):

DEBUG 2024-04-11 12:39:40,193 [shard 0:main] osd - read_log_and_missing 18'10 (0'0) modify   3:414dc9d4:::smithi11232367-65:head by client.4281.0:266 2024-04-11T12:39:31.055978+0000 0 ObjectCleanRegions clean_offsets: [(0, 675648), (1222408, 18446744073708329207)], clean_omap: true, new_object: false
DEBUG 2024-04-11 12:39:40,193 [shard 0:main] osd - read_log_and_missing 18'11 (18'11) modify   3:414dc9d4:::smithi11232367-65:head by client.4281.0:267 2024-04-11T12:39:31.065106+0000 0 ObjectCleanRegions clean_offsets: [(0, 2012024), (2580373, 18446744073706971242)], clean_omap: true, new_object: false

DEBUG 2024-04-11 12:39:42,901 [shard 0:main] osd - merge_log log((18'11,18'12], crt=18'13) from osd.1 into log((0'0,18'11], crt=18'11)
DEBUG 2024-04-11 12:39:42,901 [shard 0:main] osd - merge_log extending head to 18'12
DEBUG 2024-04-11 12:39:42,901 [shard 2:main] osd - 0x603000387080 PeeringEvent<T>::complete_rctx: peering_event(id=115, detail=PeeringEvent(from=0 pgid=3.1 sent=21 requested=21 evt=epoch_sent: 21 epoch_requested: 21 MNotifyRec 3.1 from 0 notify: (query:21 sent:21 3.1( v 18'7 (0'0,18'7] local-lis/les=15/16 n=0 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=20) ([15,19] all_participants={0, 1, 3} intervals=[([15,18] acting={0, 1, 3})])) features: 0x3f03cffffffdffff +create_info)): submitting ctx
DEBUG 2024-04-11 12:39:42,901 [shard 2:main] osd - OSDSingletonState::dispatch_context_transaction: empty transaction
DEBUG 2024-04-11 12:39:42,902 [shard 0:main] osd -   ? 18'12 (18'12) modify   3:414dc9d4:::smithi11232367-65:head by client.4281.0:268 2024-04-11T12:39:31.065171+0000 0 ObjectCleanRegions clean_offsets: [(0, 2580373), (3075648, 18446744073706475967)], clean_omap: true, new_object: false
DEBUG 2024-04-11 12:39:42,902 [shard 0:main] osd - merge_log cut point (usually last shared) is 18'11
DEBUG 2024-04-11 12:39:42,902 [shard 0:main] osd - merge_log original_crt = 18'11
DEBUG 2024-04-11 12:39:42,902 [shard 0:main] osd - update missing, append 18'12 (18'12) modify   3:414dc9d4:::smithi11232367-65:head by client.4281.0:268 2024-04-11T12:39:31.065171+0000 0 ObjectCleanRegions clean_offsets: [(0, 2580373), (3075648, 18446744073706475967)], clean_omap: true, new_object: false
INFO  2024-04-11 12:39:42,902 [shard 0:main] osd - merge_log_dups log.dups.size()=0olog.dups.size()=11
DEBUG 2024-04-11 12:39:42,902 [shard 0:main] osd - merge_log copying olog dups to log 18'1 to 18'11
DEBUG 2024-04-11 12:39:42,902 [shard 0:main] osd - merge_log removed dups overlapping log entries (0'0,18'11]
INFO  2024-04-11 12:39:42,902 [shard 0:main] osd - end of merge_log_dups changed=1 log.dups.size()=0 olog.dups.size()=11
DEBUG 2024-04-11 12:39:42,902 [shard 0:main] osd - merge_log result log((0'0,18'12], crt=18'12) missing(1 may_include_deletes = 1) changed=1
DEBUG 2024-04-11 12:39:42,902 [shard 0:main] osd -  pg_epoch 21 pg[3.2( v 18'12 lc 18'11 (0'0,18'12] local-lis/les=15/16 n=4 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=20) [1,0,3] r=2 lpr=20 pi=[15,20)/1 crt=18'12 lcod 0'0 mlcod 0'0 unknown NOTIFY PeeringState::proc_lease pg_lease(ru 0.000000000s ub 51.259567261s int 16.000000000s)

DEBUG 2024-04-11 12:39:43,087 [shard 0:main] osd - background_recovery_sub(id=182, detail=MOSDPGPush(3.2 21/20 {PushOp(3:414dc9d4:::smithi11232367-65:head, version: 18'13, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 3, recovery_info: ObjectRecoveryInfo(3:414dc9d4:::smithi11232367-65:head@18'13, size: 3075648, copy_subset: [(2580373, 495275)], clone_subset: {}, snapset: 0=[]:{}, object_exist: true), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 3075648, data_complete: true, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: true, error: false))})): starting start_pg_operation
DEBUG 2024-04-11 12:39:43,089 [shard 0:main] osd - background_recovery_sub(id=182, detail=MOSDPGPush(3.2 21/20 {PushOp(3:414dc9d4:::smithi11232367-65:head, version: 18'13, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 3, recovery_info: ObjectRecoveryInfo(3:414dc9d4:::smithi11232367-65:head@18'13, size: 3075648, copy_subset: [(2580373, 495275)], clone_subset: {}, snapset: 0=[]:{}, object_exist: true), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 3075648, data_complete: true, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: true, error: false))})): have_pg
DEBUG 2024-04-11 12:39:43,090 [shard 0:main] osd -  pg_epoch 21 pg[3.2( v 18'12 lc 18'11 (0'0,18'12] local-lis/les=20/21 n=4 ec=15/15 lis/c=20/15 les/c/f=21/16/0 sis=20) [1,0,3] r=2 lpr=20 pi=[15,20)/1 luod=0'0 crt=18'13 lcod 0'0 mlcod 0'0 active PeeringState::recover_got got missing 3:414dc9d4:::smithi11232367-65:head v 18'13
DEBUG 2024-04-11 12:39:43,097 [shard 0:main] osd - 0x603000346ca0 RecoverySubRequest::with_pg: background_recovery_sub(id=182, detail=MOSDPGPush(3.2 21/20 {PushOp(3:414dc9d4:::smithi11232367-65:head, version: 18'13, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 3, recovery_info: ObjectRecoveryInfo(3:414dc9d4:::smithi11232367-65:head@18'13, size: 3075648, copy_subset: [(2580373, 495275)], clone_subset: {}, snapset: 0=[]:{}, object_exist: true), after_progress: ObjectRecoveryProgress(!first, data_recovered_to: 3075648, data_complete: true, omap_recovered_to: , omap_complete: true, error: false), before_progress: ObjectRecoveryProgress(first, data_recovered_to: 0, data_complete: false, omap_recovered_to: , omap_complete: true, error: false))})): complete
DEBUG 2024-04-11 12:40:00,417 [shard 0:main] osd -  pg_epoch 29 pg[3.2( v 18'12 (0'0,18'12] local-lis/les=20/21 n=4 ec=15/15 lis/c=20/20 les/c/f=21/21/0 sis=20) [1,0,3] r=2 lpr=20 luod=0'0 crt=18'13 mlcod 0'0 active  ScrubScan::scan_object: obj: #3:414dc9d4:::smithi11232367-65:head#

PeeringState::recover_got got missing 3:414dc9d4:::smithi11232367-65:head v 18'13

From osd.1 (serving the read):

DEBUG 2024-04-11 12:39:41,936 [shard 0:main] osd - OSD::handle_peering_op: 3.2 from 3
DEBUG 2024-04-11 12:39:41,936 [shard 0:main] osd - peering_event(id=381, detail=PeeringEvent(from=3 pgid=3.2 sent=20 requested=20 evt=epoch_sent: 20 epoch_requested: 20 MLogRec from 3 log log((0'0,18'11], crt=18'11) pi ([0,0] all_participants={} intervals=[]) +create_info)): starting start_pg_operation
DEBUG 2024-04-11 12:39:41,937 [shard 2:main] osd - peering_event(id=381, detail=PeeringEvent(from=3 pgid=3.2 sent=20 requested=20 evt=epoch_sent: 20 epoch_requested: 20 MLogRec from 3 log log((0'0,18'11], crt=18'11) pi ([0,0] all_participants={} intervals=[]) +create_info)): have_pg

DEBUG 2024-04-11 12:39:41,937 [shard 2:main] osd - 0x6030002dd5e0 PeeringEvent<T>::with_pg: 0x6030002dd5e0 peering_event(id=381, detail=PeeringEvent(from=3 pgid=3.2 sent=20 requested=20 evt=epoch_sent: 20 epoch_requested: 20 MLogRec from 3 log log((0'0,18'11], crt=18'11) pi ([0,0] all_participants={} intervals=[]) +create_info)): pg present
DEBUG 2024-04-11 12:39:41,937 [shard 2:main] osd - do_peering_event handling epoch_sent: 20 epoch_requested: 20 MLogRec from 3 log log((0'0,18'11], crt=18'11) pi ([0,0] all_participants={} intervals=[]) +create_info for pg: 3.2
DEBUG 2024-04-11 12:39:41,937 [shard 2:main] osd -  pg_epoch 20 pg[3.2( v 18'12 (0'0,18'12] local-lis/les=15/16 n=4 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=20) [1,0,3] r=0 lpr=20 pi=[15,20)/1 crt=18'13 lcod 18'11 mlcod 0'0 peering PeeringState::proc_replica_log proc_replica_log for osd.3: 3.2( v 18'11 (0'0,18'11] local-lis/les=15/16 n=0 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=20) log((0'0,18'11], crt=18'11) missing(0 may_include_deletes = 1)
DEBUG 2024-04-11 12:39:41,937 [shard 2:main] osd - proc_replica_log for osd.3: 3.2( v 18'11 (0'0,18'11] local-lis/les=15/16 n=0 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=20) log((0'0,18'11], crt=18'11) missing(0 may_include_deletes = 1)
DEBUG 2024-04-11 12:39:41,938 [shard 2:main] osd - merge_log point (usually last shared) is 18'11 (18'11) modify   3:414dc9d4:::smithi11232367-65:head by client.4281.0:267 2024-04-11T12:39:31.065106+0000 0 ObjectCleanRegions clean_offsets: [(0, 2012024), (2580373, 18446744073706971242)], clean_omap: true, new_object: false
DEBUG 2024-04-11 12:39:41,938 [shard 2:main] osd -  pg_epoch 20 pg[3.2( v 18'12 (0'0,18'12] local-lis/les=15/16 n=4 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=20) [1,0,3] r=0 lpr=20 pi=[15,20)/1 crt=18'13 lcod 18'11 mlcod 0'0 peering PeeringState::proc_replica_log  peer osd.3 now 3.2( v 18'11 (0'0,18'11] local-lis/les=15/16 n=0 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=20) missing(0 may_include_deletes = 1)
DEBUG 2024-04-11 12:39:41,938 [shard 2:main] osd -  pg_epoch 20 pg[3.2( v 18'12 (0'0,18'12] local-lis/les=15/16 n=4 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=20) [1,0,3] r=0 lpr=20 pi=[15,20)/1 crt=18'13 lcod 18'11 mlcod 0'0 peering state<Started/Primary/Peering/GetMissing>:  still need up_thru update before going active
INFO  2024-04-11 12:39:41,938 [shard 2:main] osd -  pg_epoch 20 pg[3.2( v 18'12 (0'0,18'12] local-lis/les=15/16 n=4 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=20) [1,0,3] r=0 lpr=20 pi=[15,20)/1 crt=18'13 lcod 18'11 mlcod 0'0 peering exit Started/Primary/Peering/GetMissing 0.045601 3 0.003369
INFO  2024-04-11 12:39:41,938 [shard 2:main] osd - Exiting state: Started/Primary/Peering/GetMissing, entered at 1712839181.892836, 0.003369422 spent on 3 events

DEBUG 2024-04-11 12:39:42,986 [shard 2:main] osd - do_recover_missing check for recovery, 3:414dc9d4:::smithi11232367-65:head
DEBUG 2024-04-11 12:39:42,986 [shard 2:main] osd - do_recover_missing need to wait for recovery, 3:414dc9d4:::smithi11232367-65:head

DEBUG 2024-04-11 12:39:42,986 [shard 2:main] osd - 0x6030004dd440  pg_epoch 21 pg[3.2( v 18'12 (0'0,18'12] local-lis/les=20/21 n=4 ec=15/15 lis/c=20/15 les/c/f=21/16/0 sis=20) [1,0,3] r=0 lpr=20 pi=[15,20)/1 crt=18'13 lcod 18'11 mlcod 0'0 active+degraded : UrgentRecovery::do_recovery: do_recovery
DEBUG 2024-04-11 12:39:42,986 [shard 0:main] osd - peering_event(id=430, detail=PeeringEvent(from=0 pgid=3.d sent=21 requested=21 evt=epoch_sent: 21 epoch_requested: 21 MLeaseAck epoch 21 from osd.0 pg_lease_ack(ruub 51.333808899s))): starting start_pg_operation
INFO  2024-04-11 12:39:42,986 [shard 2:main] osd - recover_missing 3:414dc9d4:::smithi11232367-65:head v 18'12
INFO  2024-04-11 12:39:42,987 [shard 2:main] osd - recover_missing 3:414dc9d4:::smithi11232367-65:head v 18'12, new recovery
DEBUG 2024-04-11 12:39:42,987 [shard 2:main] osd - recover_object: 3:414dc9d4:::smithi11232367-65:head, 18'12
DEBUG 2024-04-11 12:39:42,987 [shard 2:main] osd - maybe_pull_missing_obj: 3:414dc9d4:::smithi11232367-65:head, 18'12

DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - prep_push_to_replica: 3:414dc9d4:::smithi11232367-65:head, 18'12
DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - calc_head_subsets: 3:414dc9d4:::smithi11232367-65:head clone_overlap {} 
DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - calc_head_subsets 3:414dc9d4:::smithi11232367-65:head data_subset [(2580373, 495275)]
DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - calc_head_subsets 3:414dc9d4:::smithi11232367-65:head -- no clones
DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - prep_push: 3:414dc9d4:::smithi11232367-65:head, 18'12
DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - prep_push: 3:414dc9d4:::smithi11232367-65:head, 18'12
DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - build_push_op 3:414dc9d4:::smithi11232367-65:head @18'13
DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - build_push_op 3:414dc9d4:::smithi11232367-65:head @18'13
DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - read_metadata_for_push_op, 3:414dc9d4:::smithi11232367-65:head
DEBUG 2024-04-11 12:39:43,006 [shard 2:main] osd - read_metadata_for_push_op, 3:414dc9d4:::smithi11232367-65:head

DEBUG 2024-04-11 12:39:43,100 [shard 0:main] osd - background_recovery_sub(id=461, detail=MOSDPGPushReply(3.2 21/20 {PushReplyOp(3:414dc9d4:::smithi11232367-65:head)})): starting start_pg_operation
DEBUG 2024-04-11 12:39:43,102 [shard 2:main] osd - background_recovery_sub(id=461, detail=MOSDPGPushReply(3.2 21/20 {PushReplyOp(3:414dc9d4:::smithi11232367-65:head)})): have_pg
DEBUG 2024-04-11 12:39:43,102 [shard 2:main] osd - 0x6030002fbdf0 RecoverySubRequest::with_pg: RecoverySubRequest::with_pg: background_recovery_sub(id=461, detail=MOSDPGPushReply(3.2 21/20 {PushReplyOp(3:414dc9d4:::smithi11232367-65:head)}))
DEBUG 2024-04-11 12:39:43,102 [shard 2:main] osd - _handle_push_reply, soid 3:414dc9d4:::smithi11232367-65:head, from 3
DEBUG 2024-04-11 12:39:43,102 [shard 2:main] osd - on_peer_recover: 3:414dc9d4:::smithi11232367-65:head, 18'13 on 3
DEBUG 2024-04-11 12:39:43,103 [shard 2:main] osd - 0x6030002fbdf0 RecoverySubRequest::with_pg: background_recovery_sub(id=461, detail=MOSDPGPushReply(3.2 21/20 {PushReplyOp(3:414dc9d4:::smithi11232367-65:head)})): complete
INFO  2024-04-11 12:39:43,103 [shard 2:main] osd - on_global_recover 3:414dc9d4:::smithi11232367-65:head
DEBUG 2024-04-11 12:39:43,103 [shard 2:main] osd - background_recovery_sub(id=461, detail=MOSDPGPushReply(3.2 21/20 {PushReplyOp(3:414dc9d4:::smithi11232367-65:head)})): exit

DEBUG 2024-04-11 12:39:43,105 [shard 2:main] osd - do_recover_missing check for recovery, 3:414dc9d4:::smithi11232367-65:head
DEBUG 2024-04-11 12:39:43,107 [shard 2:main] osd - do_osd_ops_execute: object 3:414dc9d4:::smithi11232367-65:head - handling op read
DEBUG 2024-04-11 12:39:43,107 [shard 2:main] osd - handling op read on object 3:414dc9d4:::smithi11232367-65:head
DEBUG 2024-04-11 12:39:43,111 [shard 2:main] osd - do_osd_ops_execute: object 3:414dc9d4:::smithi11232367-65:head all operations successful

  • osd.1 calls do_recover_missing check for recovery, 3:414dc9d4:::smithi11232367-65:head and schedules UrgentRecovery with version 12 (recover_missing 3:414dc9d4:::smithi11232367-65:head v 18'12).

  • osd.1 serves the unexpected read (handling op read on object 3:414dc9d4:::smithi11232367-65:head).

  • This may be caused due to incrementing first_non_divergent:

    from proc_replica_log for osd.3: 3.2( v 18'11 (0'0,18'11] which precedes by merge_log point (usually last shared) is 18'11 (18'11) modify:

    auto first_non_divergent = log.log.rbegin();
    while (1) {
     if (first_non_divergent == log.log.rend())
       break;
     if (first_non_divergent->version <= olog.head) {
       dout(20) << "merge_log point (usually last shared) is "
            << *first_non_divergent << dendl;
       break;
     }
     ++first_non_divergent;
    }
    
  • osd.3 also extends to version 12 merge_log extending head to 18'12

@Matan-B
Copy link
Author

Matan-B commented Apr 14, 2024

Reproduced few times, another similar example:
https://pulpito.ceph.com/matan-2024-04-11_11:23:27-crimson-rados-wip-matanb-crimson-replicated-testing-distro-crimson-smithi/7652230

Error racing read got wrong version on smithi09032091-55:

2024-04-11T12:36:57.142 INFO:tasks.rados.rados.0.smithi090.stdout:55: write initial oid 55
2024-04-11T12:36:57.142 INFO:tasks.rados.rados.0.smithi090.stdout:55:  seq_num 54 ranges {507578=477917,1543557=550004,2833665=73913}
2024-04-11T12:36:57.147 INFO:tasks.rados.rados.0.smithi090.stdout:55:  writing smithi09032091-55 from 507578 to 985495 tid 1
2024-04-11T12:36:57.156 INFO:tasks.rados.rados.0.smithi090.stdout:55:  writing smithi09032091-55 from 1543557 to 2093561 tid 2
2024-04-11T12:36:57.164 INFO:tasks.rados.rados.0.smithi090.stdout:55:  writing smithi09032091-55 from 2833665 to 2907578 tid 3

2024-04-11T12:36:58.102 INFO:tasks.rados.rados.0.smithi090.stdout:55:  finishing write tid 1 to smithi09032091-55
2024-04-11T12:36:58.104 INFO:tasks.rados.rados.0.smithi090.stdout:55:  finishing write tid 2 to smithi09032091-55
2024-04-11T12:36:58.106 INFO:tasks.rados.rados.0.smithi090.stdout:55:  finishing write tid 3 to smithi09032091-55


2024-04-11T12:36:57.430 INFO:tasks.thrashosds.thrasher:Killing osd 2, live_osds are [0, 1, 2, 3]
2024-04-11T12:37:02.524 INFO:tasks.thrashosds.thrasher:Reviving osd 2

2024-04-11T12:37:08.228 INFO:tasks.rados.rados.0.smithi090.stdout:55:  finishing write tid 4 to smithi09032091-55
2024-04-11T12:37:08.233 INFO:tasks.rados.rados.0.smithi090.stdout:55:  finishing write tid 5 to smithi09032091-55
2024-04-11T12:37:08.233 INFO:tasks.rados.rados.0.smithi090.stdout:55:  oid 55 updating version 0 to 13
2024-04-11T12:37:08.233 INFO:tasks.rados.rados.0.smithi090.stdout:55:  oid 55 updating version 13 to 14
2024-04-11T12:37:08.233 INFO:tasks.rados.rados.0.smithi090.stdout:55:  oid 55 version 14 is already newer than 0
2024-04-11T12:37:08.233 INFO:tasks.rados.rados.0.smithi090.stdout:55:  oid 55 updating version 14 to 15
2024-04-11T12:37:08.234 INFO:tasks.rados.rados.0.smithi090.stdout:update_object_version oid 55 v 15 (ObjNum 54 snap 0 seq_num 54) dirty exists
2024-04-11T12:37:08.234 INFO:tasks.rados.rados.0.smithi090.stderr:Error: racing read on 55 returned version 16 rather than version 15
2024-04-11T12:37:08.234 INFO:tasks.rados.rados.0.smithi090.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-2467-g885c5fc4/rpm/el9/BUILD/ceph-19.0.0-2467-g885c5fc4/src/test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7fb1b6b99640 time 2024-04-11T12:37:08.236495+0000
2024-04-11T12:37:08.234 INFO:tasks.rados.rados.0.smithi090.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-2467-g885c5fc4/rpm/el9/BUILD/ceph-19.0.0-2467-g885c5fc4/src/test/osd/RadosModel.h: 1076: ceph_abort_msg("racing read got wrong version")

From osd.2 (was killed and revived):


DEBUG 2024-04-11 12:37:05,697 [shard 1:main] osd - read_log_and_missing 19'13 (0'0) modify   3:3c38bf7a:::smithi09032091-55:head by client.4258.0:219 2024-04-11T12:36:57.150963+0000 0 ObjectCleanRegions clean_offsets: [(0, 507578), (985495, 18446744073708566120)], clean_omap: true, new_object: false
DEBUG 2024-04-11 12:37:05,698 [shard 1:main] osd - read_log_and_missing 19'14 (19'14) modify   3:3c38bf7a:::smithi09032091-55:head by client.4258.0:220 2024-04-11T12:36:57.159351+0000 0 ObjectCleanRegions clean_offsets: [(0, 1543557), (2093561, 18446744073707458054)], clean_omap: true, new_object: false
DEBUG 2024-04-11 12:37:05,698 [shard 1:main] osd - read_log_and_missing 19'15 (19'15) modify   3:3c38bf7a:::smithi09032091-55:head by client.4258.0:221 2024-04-11T12:36:57.165338+0000 0 ObjectCleanRegions clean_offsets: [(0, 2833665), (2907578, 18446744073706644037)], clean_omap: true, new_object: false

DEBUG 2024-04-11 12:37:06,981 [shard 0:main] osd - OSD::handle_peering_op: 3.c from 0

DEBUG 2024-04-11 12:37:07,990 [shard 1:main] osd -   ? 19'16 (19'16) modify   3:3c38bf7a:::smithi09032091-55:head by client.4258.0:222 2024-04-11T12:36:57.165438+0000 0 ObjectCleanRegions clean_offsets: [(0, 18446744073709551615)], clean_omap: true, new_object: false
DEBUG 2024-04-11 12:37:07,990 [shard 1:main] osd - update missing, append 19'16 (19'16) modify   3:3c38bf7a:::smithi09032091-55:head by client.4258.0:222 2024-04-11T12:36:57.165438+0000 0 ObjectCleanRegions clean_offsets: [(0, 18446744073709551615)], clean_omap: true, new_object: false

DEBUG 2024-04-11 12:37:08,208 [shard 1:main] osd -  pg_epoch 22 pg[3.c( v 19'16 lc 19'15 (0'0,19'16] local-lis/les=21/22 n=5 ec=16/16 lis/c=21/16 les/c/f=22/17/0 sis=21) [0,3,2] r=2 lpr=21 pi=[16,21)/1 luod=0'0 crt=19'17 lcod 0'0 mlcod 0'0 active PeeringState::recover_got got missing 3:3c38bf7a:::smithi09032091-55:head v 19'17

From osd.0 (serving the read):

DEBUG 2024-04-11 12:37:07,048 [shard 1:main] osd - merge_log point (usually last shared) is 19'15 (19'15) modify   3:3c38bf7a:::smithi09032091-55:head by client.4258.0:221 2024-04-11T12:36:57.165338+0000 0 ObjectCleanRegions clean_offsets: [(0, 2833665), (2907578, 18446744073706644037)], clean_omap: true, new_object: false

DEBUG 2024-04-11 12:37:08,009 [shard 1:main] osd - do_recover_missing check for recovery, 3:3c38bf7a:::smithi09032091-55:head
DEBUG 2024-04-11 12:37:08,009 [shard 1:main] osd - do_recover_missing check for recovery, 3:3c38bf7a:::smithi09032091-55:head
DEBUG 2024-04-11 12:37:08,009 [shard 1:main] osd - do_recover_missing need to wait for recovery, 3:3c38bf7a:::smithi09032091-55:head
INFO  2024-04-11 12:37:08,009 [shard 1:main] osd - PerShardState::start_operation, background_recovery(id=16777501, detail=UrgentRecovery(3.c, 3:3c38bf7a:::smithi09032091-55:head, v19'16, epoch_started: 22))
INFO  2024-04-11 12:37:08,009 [shard 1:main] osd - recover_missing 3:3c38bf7a:::smithi09032091-55:head v 19'16
INFO  2024-04-11 12:37:08,009 [shard 1:main] osd - recover_missing 3:3c38bf7a:::smithi09032091-55:head v 19'16, new recovery
INFO  2024-04-11 12:37:08,009 [shard 1:main] osd - recover_missing 3:3c38bf7a:::smithi09032091-55:head v 19'16, new recovery
DEBUG 2024-04-11 12:37:08,009 [shard 1:main] osd - recover_object: 3:3c38bf7a:::smithi09032091-55:head, 19'16
DEBUG 2024-04-11 12:37:08,009 [shard 1:main] osd - maybe_pull_missing_obj: 3:3c38bf7a:::smithi09032091-55:head, 19'16
DEBUG 2024-04-11 12:37:08,009 [shard 1:main] osd - maybe_pull_missing_obj: 3:3c38bf7a:::smithi09032091-55:head, 19'16

DEBUG 2024-04-11 12:37:08,230 [shard 1:main] osd - do_osd_ops_execute: object 3:3c38bf7a:::smithi09032091-55:head - handling op read

Change from 15 to 16:

DEBUG 2024-04-11 12:36:57,442 [shard 1:main] osd -  pg_epoch 19 pg[3.c( v 19'15 (0'0,19'15] local-lis/les=16/17 n=5 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [0,3,2] r=0 lpr=16 luod=19'16 lua=0'0 crt=19'16 mlcod 19'15 active+clean  ObjectContextLoader::with_head_obc: object 3:3c38bf7a:::smithi09032091-55:head
DEBUG 2024-04-11 12:36:57,442 [shard 1:main] osd -  pg_epoch 19 pg[3.c( v 19'15 (0'0,19'15] local-lis/les=16/17 n=5 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [0,3,2] r=0 lpr=16 luod=19'16 lua=0'0 crt=19'16 mlcod 19'15 active+clean PeeringState::calc_trim_to_aggressive limit = 19'15
DEBUG 2024-04-11 12:36:57,443 [shard 1:main] osd -  pg_epoch 19 pg[3.c( v 19'15 (0'0,19'15] local-lis/les=16/17 n=5 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [0,3,2] r=0 lpr=16 luod=19'16 lua=0'0 crt=19'16 mlcod 19'15 active+clean PeeringState::append_log append_log log((0'0,19'15], crt=19'16) {19'16 (19'16) modify   3:3c38bf7a:::smithi09032091-55:head by client.4258.0:222 2024-04-11T12:36:57.165438+0000 0 ObjectCleanRegions clean_offsets: [(0, 18446744073709551615)], clean_omap: true, new_object: false}
DEBUG 2024-04-11 12:36:57,443 [shard 1:main] osd -  pg_epoch 19 pg[3.c( v 19'16 (0'0,19'16] local-lis/les=16/17 n=5 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [0,3,2] r=0 lpr=16 lua=0'0 crt=19'16 lcod 19'15 mlcod 19'15 active+clean PeeringState::add_log_entry add_log_entry 19'16 (19'16) modify   3:3c38bf7a:::smithi09032091-55:head by client.4258.0:222 2024-04-11T12:36:57.165438+0000 0 ObjectCleanRegions clean_offsets: [(0, 18446744073709551615)], clean_omap: true, new_object: false
DEBUG 2024-04-11 12:36:57,443 [shard 1:main] osd -  pg_epoch 19 pg[3.c( v 19'16 (0'0,19'16] local-lis/les=16/17 n=5 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [0,3,2] r=0 lpr=16 lua=0'0 crt=19'17 lcod 19'15 mlcod 19'15 active+clean PeeringState::append_log approx pg log length =  16
DEBUG 2024-04-11 12:36:57,443 [shard 1:main] osd -  pg_epoch 19 pg[3.c( v 19'16 (0'0,19'16] local-lis/les=16/17 n=5 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [0,3,2] r=0 lpr=16 lua=0'0 crt=19'17 lcod 19'15 mlcod 19'15 active+clean PeeringState::append_log dups pg log length =  0
DEBUG 2024-04-11 12:36:57,443 [shard 1:main] osd -  pg_epoch 19 pg[3.c( v 19'16 (0'0,19'16] local-lis/les=16/17 n=5 ec=16/16 lis/c=16/16 les/c/f=17/17/0 sis=16) [0,3,2] r=0 lpr=16 lua=0'0 crt=19'17 lcod 19'15 mlcod 19'15 active+clean PeeringState::append_log transaction_applied = 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment