Skip to content

Instantly share code, notes, and snippets.

@ideepika

ideepika/lrc.md Secret

Last active February 10, 2021 15:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ideepika/16e5255932d2c4dc113f57cc9eadd6a5 to your computer and use it in GitHub Desktop.
Save ideepika/16e5255932d2c4dc113f57cc9eadd6a5 to your computer and use it in GitHub Desktop.

/ceph/teuthology-archive/teuthology-2021-02-03_03:31:04-rados-pacific-distro-basic-smithi/5852263/remote/smithi191/log/ceph-osd.11.log

first observation of state: active+recovery_unfound+undersized+degraded+remapped
2021-02-05T17:07:20.059+0000 7f05d5a29700 10 osd.11 pg_epoch: 14974 pg[3.1as0( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) async=[1(1),10(2)] r=0 lpr=542 pi=[418,542)/2 crt=854'670 lcod 854'669 mlcod 463'631 active+recovery_unfound+undersized+degraded+remapped mbc={0={(1+0)=2},1={(0+0)=2},2={(0+0)=2},3={(1+0)=2},4={(0+0)=2},5={(1+0)=2}} trimq=209] handle_advance_map: 14975

no prior crt reported; but since this is the epoch where state went wrong and crt still got updated, this should confirm crt handling needs update.

op_wq happened: 2 epoch acting set changes had, 
2021-02-05T17:07:20.059+0000 7f05d5a29700 20 osd.11 op_wq(2) _process OpSchedulerItem(3ed: 14975 MNotifyRec 3.1as0 from 8(2) notify: (query:14975 sent:14975 3.1as2( v 453'629 (0'0,453'629] local-lis/les=418/419 n=10 ec=248/27 lis/c=537/418 les/c/f=538/419/0 sis=542) 2->0 ([418,541] all_participants=0(4),1(1),5(3),7(1),7(3),8(2),9(5),10(2),11(0) intervals=([456,460] acting 0(4),5(3),7(1),9(5),11(0)),([468,522] acting 1(1),5(3),9(5),10(2),11(0)))) features: 0x3f01cfb9fffdffff +create_info) prio 255 cost 10 e14975) pg 0x5590faa9d000.1as0 PGPeeringEvent(epoch_sent: 14975 epoch_request

roll_forward_to=823'668 penaltimate to the dirty crt

<pre>
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.204+0000 7f9bbf1d3700 10 osd.5 854 dequeue_op 0x56294738caa0 prio 127 cost 0 latency 0.000166 MOSDECSubOpWrite(3.1as3 854/542 ECSubWrite(tid=5606, reqid=client.4548.0:9098, at_version=854'670, trim_to=0'0, roll_forward_to=823'668)) v2 pg pg[3.1as3( v 854'669 (0'0,854'669] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=823'668 lcod 823'668 mlcod 823'668 active+remapped mbc={}]
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.204+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'669 (0'0,854'669] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=823'668 lcod 823'668 mlcod 823'668 active+remapped mbc={}] _handle_message: MOSDECSubOpWrite(3.1as3 854/542 ECSubWrite(tid=5606, reqid=client.4548.0:9098, at_version=854'670, trim_to=0'0, roll_forward_to=823'668)) v2
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.204+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'669 (0'0,854'669] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=823'668 lcod 823'668 mlcod 823'668 active+remapped mbc={}] append_log log((0'0,854'669], crt=823'668) [854'670 (854'669) modify   3:5a96ee3e:::smithi05121792-2:head by client.4548.0:9098 2021-02-05T04:53:51.202789+0000 0 ObjectCleanRegions clean_offsets: [0~3358720,3358720~18446744073706192895], clean_omap: 1, new_object: 0]

log((0'0,854'669] crt=823'668

./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.204+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 lua=854'669 crt=823'668 lcod 823'668 mlcod 823'668 active+remapped mbc={}] add_log_entry 854'670 (854'669) modify   3:5a96ee3e:::smithi05121792-2:head by client.4548.0:9098 2021-02-05T04:53:51.202789+0000 0 ObjectCleanRegions clean_offsets: [0~3358720,3358720~18446744073706192895], clean_omap: 1, new_object: 0

in append_log >> add_log_entry >> didn't check transaction_applied which skips roll_forward from docs ObjectCleanRegions plays with partial recovery

add_log_entry 854'670 means we added pglog entry tells that we are not going in PGLog append_missing instead log add happens from :

void PeeringState::add_log_entry(const pg_log_entry_t& e, bool applied)
{
  // raise last_complete only if we were previously up to date
  if (info.last_complete == info.last_update)
    info.last_complete = e.version;

  // raise last_update.
  ceph_assert(e.version > info.last_update);
  info.last_update = e.version;

  // raise user_version, if it increased (it may have not get bumped
  // by all logged updates)
  if (e.user_version > info.last_user_version)
    info.last_user_version = e.user_version;

  // log mutation
  pg_log.add(e, applied);
  psdout(10) << "add_log_entry " << e << dendl;
}

we should add should_rollforward here as well. let's see what went wrong...

./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.204+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 lua=854'669 crt=823'668 lcod 823'668 mlcod 823'668 active+remapped mbc={}] append_log approx pg log length =  670
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.204+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 lua=854'669 crt=823'668 lcod 823'668 mlcod 823'668 active+remapped mbc={}] append_log transaction_applied = 1
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.204+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 lua=854'669 crt=823'668 lcod 823'668 mlcod 823'668 active+remapped mbc={}] op_applied version 854'670 
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.236+0000 7f9bbf1d3700 10 osd.5 854 dequeue_op 0x56294738d440 prio 127 cost 0 latency 0.000095 MOSDECSubOpWrite(3.1as3 854/542 ECSubWrite(tid=5607, reqid=client.4548.0:9098, at_version=0'0, trim_to=0'0, roll_forward_to=854'670)) v2 pg pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=823'668 mlcod 823'668 active+remapped mbc={}]

**updated: roll_forward_to=854'670 crt=823'668

./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.236+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=823'668 mlcod 823'668 active+remapped mbc={}] _handle_message: MOSDECSubOpWrite(3.1as3 854/542 ECSubWrite(tid=5607, reqid=client.4548.0:9098, at_version=0'0, trim_to=0'0, roll_forward_to=854'670)) v2
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.236+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=823'668 mlcod 823'668 active+remapped mbc={}]  scrubber pg(3.1as3) unreg_next_scrub existing-0.000000. was registered? 0
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.236+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=24
8/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=823'668 mlcod 823'668 active+remapped mbc={}] append_log log((0'0,854'670], crt=823'668) []
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.236+0000 7f9bbf1d3700 20 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=854'670 mlcod 823'668 active+remapped mbc={}] rollforward: entry=854'669 (823'668) modify   3:5a96ee3e:::smithi05121792-2:head by client.4548.0:9097 2021-02-05T04:53:51.202769+0000 0 ObjectCleanRegions clean_offsets: [3358720~18446744073706192895], clean_omap: 0, new_object: 0

this point we updated crt value

./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.236+0000 7f9bbf1d3700 20 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=854'670 mlcod 823'668 active+remapped mbc={}] rollforward: entry=854'670 (854'669) modify   3:5a96ee3e:::smithi05121792-2:head by client.4548.0:9098 2021-02-05T04:53:51.202789+0000 0 ObjectCleanRegions clean_offsets: [0~3358720,3358720~18446744073706192895], clean_omap: 1, new_object: 0
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.236+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=854'670 mlcod 823'668 active+remapped mbc={}] append_log approx pg log length =  670
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:51.236+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=854'670 mlcod 823'668 active+remapped mbc={}] append_log transaction_applied = 1
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:52.532+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=854'670 mlcod 854'670 active+remapped mbc={}] handle_advance_map: 855
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:52.532+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 854 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=854'670 mlcod 854'670 active+remapped mbc={}] handle_advance_map [11,1,10,5,0,9]/[11,2147483647,2147483647,5,0,9] -- 11/11
./remote/smithi134/log/ceph-osd.5.log.gz:2021-02-05T04:53:52.532+0000 7f9bbf1d3700 10 osd.5 pg_epoch: 855 pg[3.1as3( v 854'670 (0'0,854'670] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) r=3 lpr=542 pi=[418,542)/2 luod=0'0 crt=854'670 mlcod 854'670 active+remapped mbc={}
@ideepika
Copy link
Author

ideepika commented Feb 9, 2021

see teuthology: /home/ideepika/crt.log

./remote/smithi191/log/ceph-osd.11.log.gz:2021-02-05T04:53:51.207+0000 7f05d5a29700 10 osd.11 pg_epoch: 854 pg[3.1as0( v 854'669 (0'0,854'669] local-lis/les=542/543 n=22 ec=248/27 lis/c=542/418 les/c/f=543/419/0 sis=542) [11,1,10,5,0,9]/[11,NONE,NONE,5,0,9]p11(0) async=[1(1),10(2)] r=0 lpr=542 pi=[418,542)/2 luod=823'668 crt=823'668 lcod 811'666 mlcod 463'631 active+recovery_unfound+undersized+degraded+remapped mbc={0={(1+0)=2},1={(0+0)=2},2={(0+0)=2},3={(1+0)=2},4={(0+0)=2},5={(1+0)=2}} trimq=193] execute_ctx 3:5a96ee3e:::smithi05121792-2:head [append 0~0] ov 854'669 av 854'670 snapc 304=[304,2ff,2fc,2fb,2fa,2f6,2f2,2f0,2ef,2ee,2eb,2e9,2e8,2e7,2e3,2e1,2e0,2dc,2d8,2d4,2d1,2cb,2c7,2c1,2bd,2bb,2b2,2af,2ab,2a0,298,290,255,216,1f0,183] snapset 304=[]:{125=[116,115,10f,10b,106,f6],142=[13f,13b,133,12d],154=[153,151,150],158=[158,157],16c=[16b,166,165,161],189=[189,188,187,186,185,183,182,180,17d,17c,17b,17a,178,174,173,172,171,170],21c=[21c,21a,218,216,210,20a,203,200,1fe,1f8,1f1,1f0,1ea,1dc,1ca,1c0,1bf,1bc,1b8,1b6,1b0,1ae,1a8,1a5,19d,196],230=[230,22d,22c,22b,22a,226,224,221,21f,21e],23e=[23e,23c,23b,239,237,236,235,234,232,231],255=[255,254,251,24f,24d],263=[263,262,261,25f,25e,25d,25a,257],278=[278,275,274,273,270,269,268,267,266],2a7=[2a7,2a5,2a3,2a2,2a0,29f,29e,29c,29b,298,295,290,28d,28a,288,287,282,27e,27c],2bb=[2bb,2b8,2b7],2bd=[2bd],2e3=[2e3,2e1,2e0,2dc]}

lowest epoch where the state was registered

@ideepika
Copy link
Author

@ideepika
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment