Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rzarzynski/c6810ce0198793ad53ba9543138b3852 to your computer and use it in GitHub Desktop.
Save rzarzynski/c6810ce0198793ad53ba9543138b3852 to your computer and use it in GitHub Desktop.

Analasis of dead jobs in rzarzynski_bug43903_more_pgnum_changes_osdmapfix_reverted that were scheduled on 2020-02-11.

teuthology has labelled these jobs as dead while coredumps were generated.

rzarzynski@teuthology:~$ ssh smithi172.front.sepia.ceph.com
Warning: Permanently added 'smithi172.front.sepia.ceph.com,172.21.15.172' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Wed Feb 12 07:42:48 2020 from 172.21.0.51
[rzarzynski@smithi172 ~]$ sudo ls -al /home/ubuntu/cephtest/archive/coredump/
total 461728
drwxr-xr-x. 2 ubuntu ubuntu       4096 Feb 11 22:26 .
drwxr-xr-x. 4 ubuntu ubuntu       4096 Feb 11 22:07 ..
-rw-------. 1 root   root   1469583360 Feb 11 22:26 1581459975.48882.core
[rzarzynski@smithi172 ~]$ sudo gdb /usr/bin/ceph-osd /home/ubuntu/cephtest/archive/coredump/1581459975.48882.core
...
Core was generated by `ceph-osd -f --cluster ceph -i 2'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc5e3d84c5f in raise () from /lib64/libpthread.so.0
[Current thread is 1 (Thread 0x7fc5c3574700 (LWP 49187))]
Missing separate debuginfos, use: yum debuginfo-install ceph-osd-15.0.0-10073.g28eebb0.el8.x86_64
(gdb) bt
#0  0x00007fc5e3d84c5f in raise () from /lib64/libpthread.so.0
#1  0x0000563221529153 in handle_fatal_signal(int) ()
#2  <signal handler called>
#3  0x0000563239c1fb00 in ?? ()
#4  0x0000563220f79b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#5  0x0000563220ef38aa in OSD::create_context() ()
#6  0x0000563220f59571 in OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) ()
#7  0x000056322118adb6 in ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&) ()
#8  0x0000563220f4c62f in OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) ()
#9  0x000056322157a094 in ShardedThreadPool::shardedthreadpool_worker(unsigned int) ()
#10 0x000056322157ccf4 in ShardedThreadPool::WorkThreadSharded::entry() ()
#11 0x00007fc5e3d7a2de in start_thread () from /lib64/libpthread.so.0
#12 0x00007fc5e2b24133 in clone () from /lib64/libc.so.6
(gdb) 
rzarzynski@teuthology:~$ ssh smithi145.front.sepia.ceph.com
Warning: Permanently added 'smithi145.front.sepia.ceph.com,172.21.15.145' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Wed Feb 12 07:51:59 2020 from 172.21.0.51
[rzarzynski@smithi145 ~]$ sudo ls -al /home/ubuntu/cephtest/archive/coredump/
total 133448
drwxr-xr-x. 2 ubuntu ubuntu      4096 Feb 11 23:59 .
drwxr-xr-x. 4 ubuntu ubuntu      4096 Feb 11 23:00 ..
-rw-------. 1 root   root   614842368 Feb 11 23:59 1581465560.267884.core
[rzarzynski@smithi145 ~]$ sudo gdb /usr/bin/ceph-osd /home/ubuntu/cephtest/archive/coredump/1581465560.267884.core
...
Core was generated by `ceph-osd -f --cluster ceph -i 3'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f9fad26ac5f in raise () from /lib64/libpthread.so.0
[Current thread is 1 (Thread 0x7f9faf2a7ec0 (LWP 267884))]
Missing separate debuginfos, use: yum debuginfo-install ceph-osd-15.0.0-10073.g28eebb0.el8.x86_64
(gdb) bt
#0  0x00007f9fad26ac5f in raise () from /lib64/libpthread.so.0
#1  0x0000557714a8f153 in handle_fatal_signal(int) ()
#2  <signal handler called>
#3  0x00007f9fabf458df in raise () from /lib64/libc.so.6
#4  0x00007f9fabf2fcf5 in abort () from /lib64/libc.so.6
#5  0x00005577143b1013 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) ()
#6  0x00005577143b11dc in ceph::__ceph_assert_fail(ceph::assert_data const&) ()
#7  0x0000557714563a06 in void PGLog::read_log_and_missing<pg_missing_set<true> >(ObjectStore*, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t, pg_info_t const&, PGLog::IndexedLog&, pg_missing_set<true>&, std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, bool, bool*, DoutPrefixProvider const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, bool) ()
#8  0x000055771454ad36 in PG::read_state(ObjectStore*) ()
#9  0x00005577144976f5 in OSD::load_pgs() ()
#10 0x00005577144c2fe7 in OSD::init() ()
#11 0x0000557714413856 in main ()
(gdb) 
rzarzynski@teuthology:~$ ssh smithi136.front.sepia.ceph.com
Warning: Permanently added 'smithi136.front.sepia.ceph.com,172.21.15.136' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

[rzarzynski@smithi136 ~]$ sudo ls -al /home/ubuntu/cephtest/archive/coredump/
total 330868
drwxr-xr-x. 2 ubuntu ubuntu       4096 Feb 11 23:21 .
drwxr-xr-x. 4 ubuntu ubuntu       4096 Feb 11 22:59 ..
-rw-------. 1 root   root   1402257408 Feb 11 23:21 1581463267.57697.core
[rzarzynski@smithi136 ~]$ sudo gdb /usr/bin/ceph-osd /home/ubuntu/cephtest/archive/coredump/1581463267.57697.core
...
Core was generated by `ceph-osd -f --cluster ceph -i 0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005605dea1a841 in PG::put(char const*) ()
[Current thread is 1 (Thread 0x7f880fa34700 (LWP 57981))]
Missing separate debuginfos, use: yum debuginfo-install ceph-osd-15.0.0-10073.g28eebb0.el8.x86_64
(gdb) bt
#0  0x00005605dea1a841 in PG::put(char const*) ()
#1  0x00005605de9cb068 in std::vector<boost::intrusive_ptr<PG>, std::allocator<boost::intrusive_ptr<PG> > >::~vector() ()
#2  0x00005605de9ac60c in OSD::consume_map() ()
#3  0x00005605de9b1a3c in OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) ()
#4  0x00005605dea052cb in C_OnMapCommit::finish(int) ()
#5  0x00005605de9ba06d in Context::complete(int) ()
#6  0x00005605def97f15 in Finisher::finisher_thread_entry() ()
#7  0x00007f8827a292de in start_thread () from /lib64/libpthread.so.0
#8  0x00007f88267d3133 in clone () from /lib64/libc.so.6
(gdb) 
rzarzynski@teuthology:~$ ssh smithi202.front.sepia.ceph.com
Warning: Permanently added 'smithi202.front.sepia.ceph.com,172.21.15.202' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

[rzarzynski@smithi202 ~]$ sudo ls -al /home/ubuntu/cephtest/archive/coredump/
total 1158624
drwxr-xr-x. 2 ubuntu ubuntu       4096 Feb 12 00:00 .
drwxr-xr-x. 4 ubuntu ubuntu       4096 Feb 11 23:05 ..
-rw-------. 1 root   root   2270064640 Feb 12 00:00 1581465605.160450.core
[rzarzynski@smithi202 ~]$ sudo gdb /usr/bin/ceph-osd /home/ubuntu/cephtest/archive/coredump/1581465605.160450.core
...
Core was generated by `ceph-osd -f --cluster ceph -i 3'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000555f8570d7e0 in ?? ()
[Current thread is 1 (Thread 0x7f77061d0700 (LWP 160760))]
Missing separate debuginfos, use: yum debuginfo-install ceph-osd-15.0.0-10073.g28eebb0.el8.x86_64
(gdb) bt
#0  0x0000555f8570d7e0 in ?? ()
#1  0x0000555f5b65bb27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#2  0x0000555f5b5d58aa in OSD::create_context() ()
#3  0x0000555f5b63b571 in OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) ()
#4  0x0000555f5b86cdb6 in ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&) ()
#5  0x0000555f5b62e62f in OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) ()
#6  0x0000555f5bc5c094 in ShardedThreadPool::shardedthreadpool_worker(unsigned int) ()
#7  0x0000555f5bc5ecf4 in ShardedThreadPool::WorkThreadSharded::entry() ()
#8  0x00007f77261d52de in start_thread () from /lib64/libpthread.so.0
#9  0x00007f7724f7f133 in clone () from /lib64/libc.so.6
(gdb) 
rzarzynski@teuthology:~$ ssh smithi068.front.sepia.ceph.com
Warning: Permanently added 'smithi068.front.sepia.ceph.com,172.21.15.68' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

[rzarzynski@smithi068 ~]$ sudo ls -al /home/ubuntu/cephtest/archive/coredump/
total 348492
drwxr-xr-x. 2 ubuntu ubuntu       4096 Feb 11 23:41 .
drwxr-xr-x. 4 ubuntu ubuntu       4096 Feb 11 23:12 ..
-rw-------. 1 root   root   1557655552 Feb 11 23:41 1581464505.102930.core
[rzarzynski@smithi068 ~]$ sudo gdb /usr/bin/ceph-osd /home/ubuntu/cephtest/archive/coredump/1581464505.102930.core
...
Core was generated by `ceph-osd -f --cluster ceph -i 0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f25888e1c5f in raise () from /lib64/libpthread.so.0
[Current thread is 1 (Thread 0x7f25650cb700 (LWP 103298))]
Missing separate debuginfos, use: yum debuginfo-install ceph-osd-15.0.0-10073.g28eebb0.el8.x86_64
(gdb) bt
#0  0x00007f25888e1c5f in raise () from /lib64/libpthread.so.0
#1  0x000056292dc20153 in handle_fatal_signal(int) ()
#2  <signal handler called>
#3  0x0000562958da5620 in ?? ()
#4  0x000056292d670b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#5  0x000056292d5ea8aa in OSD::create_context() ()
#6  0x000056292d650571 in OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) ()
#7  0x000056292d881db6 in ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&) ()
#8  0x000056292d64362f in OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) ()
#9  0x000056292dc71094 in ShardedThreadPool::shardedthreadpool_worker(unsigned int) ()
#10 0x000056292dc73cf4 in ShardedThreadPool::WorkThreadSharded::entry() ()
#11 0x00007f25888d72de in start_thread () from /lib64/libpthread.so.0
#12 0x00007f2587681133 in clone () from /lib64/libc.so.6
(gdb) 
rzarzynski@teuthology:~$ ssh smithi112.front.sepia.ceph.com
Warning: Permanently added 'smithi112.front.sepia.ceph.com,172.21.15.112' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

[rzarzynski@smithi112 ~]$ sudo ls -al /home/ubuntu/cephtest/archive/coredump/
total 311536
drwxr-xr-x. 2 ubuntu ubuntu       4096 Feb 12 00:02 .
drwxr-xr-x. 4 ubuntu ubuntu       4096 Feb 11 23:21 ..
-rw-------. 1 root   root   1350377472 Feb 12 00:02 1581465760.165783.core
[rzarzynski@smithi112 ~]$ sudo gdb /usr/bin/ceph-osd /home/ubuntu/cephtest/archive/coredump/1581465760.165783.core
...
Core was generated by `ceph-osd -f --cluster ceph -i 3'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fefeb830733 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /lib64/libtcmalloc.so.4
[Current thread is 1 (Thread 0x7fefd6d07700 (LWP 166048))]
Missing separate debuginfos, use: yum debuginfo-install ceph-osd-15.0.0-10073.g28eebb0.el8.x86_64
(gdb) bt
#0  0x00007fefeb830733 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /lib64/libtcmalloc.so.4
#1  0x00007fefeb830ac0 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) () from /lib64/libtcmalloc.so.4
#2  0x00005615e9eda8d6 in ceph::BackTrace::print(std::ostream&) const ()
#3  0x00005615e9ecf0b3 in handle_fatal_signal(int) ()
#4  <signal handler called>
#5  0x00007fefeb830733 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /lib64/libtcmalloc.so.4
#6  0x00007fefeb830ac0 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned int) () from /lib64/libtcmalloc.so.4
#7  0x00005615ea19d9b4 in int& std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >::emplace_back<int>(int&&) ()
#8  0x00005615ea19ff89 in std::enable_if<(!denc_traits<pg_t, void>::supported)||(!denc_traits<std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, void>::supported), void>::type ceph::decode<pg_t, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > >, denc_traits<pg_t, void>, denc_traits<std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, void> >(std::map<pg_t, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> >, std::less<pg_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<pg_t const, std::vector<int, mempool::pool_allocator<(mempool::pool_index_t)15, int> > > > >&, ceph::buffer::v14_2_0::list::iterator_impl<true>&) ()
#9  0x00005615ea17507a in OSDMap::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&) ()
#10 0x00005615ea177e65 in OSDMap::decode(ceph::buffer::v14_2_0::list&) ()
#11 0x00005615e98ac013 in OSDService::try_get_map(unsigned int) ()
#12 0x00005615e99053be in OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) ()
#13 0x00005615e99592cb in C_OnMapCommit::finish(int) ()
#14 0x00005615e990e06d in Context::complete(int) ()
#15 0x00005615e9eebf15 in Finisher::finisher_thread_entry() ()
#16 0x00007fefeacf42de in start_thread () from /lib64/libpthread.so.0
#17 0x00007fefe9a9e133 in clone () from /lib64/libc.so.6
(gdb) 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment