ceph_test_rados --balance-reads --max-ops 400000 --objects 1024 --max-in-flight 64 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 600 --op read 100 --op write 50 --op delete 50 --op snap_create 50 --op snap_remove 0 --op rollback 0 --op setattr 25 --op rmattr 25 --op copy_from 0 --op write_excl 50 --pool unique_pool_0
update_object_version oid 785 v 593 (ObjNum 2048 snap 305 seq_num 2048) dirty exists
3327: left oid 785 (ObjNum 2048 snap 305 seq_num 2048)
3330: finishing write tid 2 to folio031017823-831
3329: oid 103 version is 35 and expected 381
../src/test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7efd96ad4700 time 2023-02-13T15:26:02.548504+0000
../src/test/osd/RadosModel.h: 1544: FAILED ceph_assert(version == old_value.version)
ceph version 18.0.0-2159-g0da3dc0d192 (0da3dc0d19225d3efce468dbb73110213e2bbad5) reef (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x11c) [0x7efda4764deb]
2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7efda4764ff2]
3: (ReadOp::_finish(TestOp::CallbackInfo*)+0x426) [0x440f0e]
4: (TestOp::finish(TestOp::CallbackInfo*)+0xa) [0x457052]
5: (read_callback(void*, void*)+0x11) [0x457068]
103: write initial oid 103
103: seq_num 102 ranges {514314=787987,1768928=405354,2768204=509510}
103: writing folio031017823-103 from 514314 to 1302301 tid 1
103: writing folio031017823-103 from 1768928 to 2174282 tid 2
103: writing folio031017823-103 from 2768204 to 3277714 tid 3
103: finishing write tid 1 to folio031017823-103
103: finishing write tid 2 to folio031017823-103
103: finishing write tid 3 to folio031017823-103
103: finishing write tid 4 to folio031017823-103
103: finishing write tid 5 to folio031017823-103
103: oid 103 updating version 0 to 35
103: oid 103 version 35 is already newer than 35
103: oid 103 version 35 is already newer than 35
103: oid 103 version 35 is already newer than 32
update_object_version oid 103 v 35 (ObjNum 102 snap 0 seq_num 102) dirty exists
103: left oid 103 (ObjNum 102 snap 0 seq_num 102)
1525: read oid 103 snap 4
1525: expect (ObjNum 102 snap 0 seq_num 102)
2532: setattr oid 103 current snap is 202
update_object_version oid 103 v 381 (ObjNum 102 snap 0 seq_num 102) dirty exists
3329: read oid 103 snap -1
3329: expect (ObjNum 102 snap 0 seq_num 102)
3329: oid 103 version is 35 and expected 381
2:abb5cdf3:::folio031017823-103:head
is still referenced and not being removed.
DEBUG 2023-02-13 15:24:13,326 [shard 0] osd - handle_rep_op: osd_repop(client.4151.0:8507 2.5 e221/15 2:abb5cdf3:::folio031017823-103:head v 221'382, mlcod=221'382) v3
DEBUG 2023-02-13 15:24:13,326 [shard 0] osd - log_operation
DEBUG 2023-02-13 15:24:13,326 [shard 0] osd - replica_clear_repop_obc clearing 2 entries
DEBUG 2023-02-13 15:24:13,326 [shard 0] osd - 2:abb5cdf3:::folio031017823-103:head get_object_boundary(from): 2:abb5cdf3:::folio031017823-103:0 head version(to): 2:abb5cdf3:::folio031017823-103:head
DEBUG 2023-02-13 15:24:13,326 [shard 0] osd - removed: 0. skipped (aka refrenced): 1
- Later on, we have a
cache hit on 2:abb5cdf3:::folio031017823-103:head
serving the (failing) sparse-read.
DEBUG 2023-02-13 15:26:02,414 [shard 0] osd - client_request(id=11366, detail=m=[osd_op(client.4151.0:10575 2.5 2:abb5cdf3:::folio031017823-103:head {sparse-read 0~3714314, omap-get-vals-by keys in=371b, omap-get-keys in=12b, omap-get-vals in=16b, omap-get-header, getxattrs} snapc 0={} ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e324) v8]): in get_obc stage
TRACE 2023-02-13 15:26:02,414 [shard 0] osd - pg_epoch 324 pg[2.5( v 318'455 (0'0,318'455] local-lis/les=15/16 n=0 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [3,0,1] r=2 lpr=15 luod=0'0 ua=0'0 crt=318'455 mlcod 318'455 active ObjectContextLoader::with_head_obc: object 2:abb5cdf3:::folio031017823-103:head
TRACE 2023-02-13 15:26:02,414 [shard 0] osd - pg_epoch 324 pg[2.5( v 318'455 (0'0,318'455] local-lis/les=15/16 n=0 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [3,0,1] r=2 lpr=15 luod=0'0 ua=0'0 crt=318'455 mlcod 318'455 active ObjectContextLoader::get_or_load_obc: cache hit on 2:abb5cdf3:::folio031017823-103:head
DEBUG 2023-02-13 15:26:02,417 [shard 0] osd - do_process: serving replica read on oid 2:abb5cdf3:::folio031017823-103:head
DEBUG 2023-02-13 15:26:02,417 [shard 0] osd - do_osd_ops_execute: object 2:abb5cdf3:::folio031017823-103:head - handling op sparse-read
DEBUG 2023-02-13 15:26:02,417 [shard 0] osd - handling op sparse-read on object 2:abb5cdf3:::folio031017823-103:head
When osd.1 should have removed the the obc entry in replica_clear_repop_obc
it was still refrenced in osd.3 (at 15:24:13,307
TRACE 2023-02-13 15:24:13,307 [shard 0] osd - pg_epoch 221 pg[2.5( v 217'380 (0'0,217'380] local-lis/les=15/16 n=106 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [3,0,1] r=0 lpr=15 lua=0'0 crt=217'380 mlcod 217'380 active+clean ObjectContextLoader::with_head_obc: object 2:abb5cdf3:::folio031017823-103:head
And only released at: 15:24:13,356
TRACE 2023-02-13 15:24:13,356 [shard 0] osd - pg_epoch 221 pg[2.5( v 221'382 (0'0,221'382] local-lis/les=15/16 n=107 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [3,0,1] r=0 lpr=15 lua=0'0 crt=221'382 mlcod 221'382 active+clean ObjectContextLoader::with_head_obc: released object 2:abb5cdf3:::folio031017823-103:head
osd.1 attempt to remove was in 15:24:13,326
and therfore was skipped!
(New run)
Both osd.1 and osd.3 are in
get_obc
on the same pg(2.a
)? (2:5954f7d4:::folio031142590-146:head
)osd.1:
osd.3: