rzarzynski/20220506_auth_rotate_failure.md Secret

## 20220506_auth_rotate_failure.md

      
    Raw
  

              20220506_auth_rotate_failure.md
            
          
    auth: the key rotation failure

It looks the rotation of keys works fine for all OSDs; it fail at first mgr instance (mgr.y).
2022-04-02T15:34:17.008 INFO:teuthology.orchestra.run.smithi146.stderr:++ ceph auth get-key osd.7
2022-04-02T15:34:17.365 INFO:teuthology.orchestra.run.smithi146.stdout:rotating key for mgr.y
2022-04-02T15:34:17.366 INFO:teuthology.orchestra.run.smithi146.stderr:+ NK=AQDjbEhir5HtORAAHbC3qXbxxwbs1GwYyInm2g==
2022-04-02T15:34:17.366 INFO:teuthology.orchestra.run.smithi146.stderr:+ '[' AQD1akhi07GlGhAAibYZDWhJKuspZgRDt//5vg== == AQDjbEhir5HtORAAHbC3qXbxxwbs1GwYyInm2g== ']'
2022-04-02T15:34:17.366 INFO:teuthology.orchestra.run.smithi146.stderr:+ for f in osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 mgr.y mgr.x
2022-04-02T15:34:17.367 INFO:teuthology.orchestra.run.smithi146.stderr:+ echo 'rotating key for mgr.y'
2022-04-02T15:34:17.367 INFO:teuthology.orchestra.run.smithi146.stderr:++ ceph auth get-key mgr.y
2022-04-02T15:34:17.715 INFO:teuthology.orchestra.run.smithi146.stderr:+ K=AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA==
2022-04-02T15:34:17.715 INFO:teuthology.orchestra.run.smithi146.stderr:+ NK=AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA==
2022-04-02T15:34:17.716 INFO:teuthology.orchestra.run.smithi146.stderr:+ ceph orch daemon rotate-key mgr.y

The test got stuck on mgr.y for many hours:
2022-04-02T21:32:49.555 INFO:teuthology.orchestra.run.smithi146.stderr:++ ceph auth get-key mgr.y
2022-04-02T21:32:49.910 INFO:teuthology.orchestra.run.smithi146.stderr:+ NK=AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA==
2022-04-02T21:32:49.910 INFO:teuthology.orchestra.run.smithi146.stderr:+ '[' AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA== == AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA== ']'
2022-04-02T21:32:49.910 INFO:teuthology.orchestra.run.smithi146.stderr:+ sleep 5

The issue left a track in the manager's log:
2022-04-02T15:34:18.153+0000 7f7666b9b700  0 [cephadm ERROR root] Non-zero return from ['ceph', '-k', '/var/lib/ceph/mgr/ceph-y/keyring', '-n', 'mgr.y', 'tell', 'mgr.y', 'rotate-key', '-i', '-']: 2022-04-02T15:34:18.146+0000 7fdf9f7fe700  1 -- 172.21.15.146:0/952779436 <== mon.1 v2:172.21.15.146:3301/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (secure 0 0 0) 0x7fdf900037b0 con 0x7fdfa0105fc0

Making the test green

Just the sake of verifying the issue is about rotations at ceph-mgr instances only.

http://pulpito.front.sepia.ceph.com/rzarzynski-2022-05-06_13:56:46-orch-master-distro-basic-smithi/

--- a/qa/suites/orch/cephadm/with-work/tasks/rotate-keys.yaml
+++ b/qa/suites/orch/cephadm/with-work/tasks/rotate-keys.yaml
@@ -3,7 +3,7 @@ tasks:
     mon.a:
     - |
       set -ex
-      for f in osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 mgr.y mgr.x
+      for f in osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7
       do
           echo "rotating key for $f"
           K=$(ceph auth get-key $f)
Root cause: a surprise at _daemon_action(daemon_spec, 'restart')?

It appears the initially visible track the failure is that the mgr.y refuses a ceph tell CLI client, spawned by its cephadm own module, due to lack of credentials to contact the manager.
rzarzynski@teuthology:/a/yuriw-2022-04-02_14:54:16-orch-wip-yuri11-testing-2022-04-01-0842-distro-default-smithi/6773812$ less ./remote/smithi146/log/71f5957e-b298-11ec-8c36-001a4aab830c/ceph-mgr.y.log.gz
...
2022-04-02T15:34:18.037+0000 7f7666b9b700  0 [cephadm INFO root] Rotating authentication key for mgr.y
2022-04-02T15:34:18.037+0000 7f7666b9b700  0 log_channel(cephadm) log [INF] : Rotating authentication key for mgr.y
...
2022-04-02T15:34:18.044+0000 7f7666b9b700  0 [cephadm DEBUG root] exec: ceph -k /var/lib/ceph/mgr/ceph-y/keyring -n mgr.y tell mgr.y rotate-key -i -
2022-04-02T15:34:18.074+0000 7f76d0b11700  1 -- 172.21.15.146:0/958567540 --> [v2:172.21.15.146:3301/0,v1:172.21.15.146:6790/0] -- log(3 entries from seq 332 at 2022-04-02T15:34:17.995313+0000) v1 -- 0x55646b2b3180 con 0x5564617a9400
2022-04-02T15:34:18.150
...
2022-04-02T15:34:18.150+0000 7f76d631c700 10 mgr.server ms_handle_authentication ms_handle_authentication new session 0x55646b2eaa20 con 0x55646b97b000 entity mgr.y addr 
2022-04-02T15:34:18.151+0000 7f76d631c700  1 --2- v2:172.21.15.146:6800/907340359 >> 172.21.15.146:0/4073159835 conn(0x55646b97b000 0x55646b2f6c00 secure :-1 s=READY pgs=4 cs=0 l=1 rev1=1 crypto rx=0x55646b02890
0 tx=0x55646a7bbbc0 comp rx=0 tx=0).ready entity=client.15396 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2022-04-02T15:34:18.151+0000 7f769b60a700  1 -- v2:172.21.15.146:6800/907340359 <== client.15396 172.21.15.146:0/4073159835 1 ==== command(tid 0: {"prefix": "get_command_descriptions"}) v1 ==== 62+0+0 (secure 0
0 0) 0x55646af709a0 con 0x55646b97b000
2022-04-02T15:34:18.151+0000 7f769b60a700 10 mgr.server _handle_command decoded-size=1 prefix=get_command_descriptions
2022-04-02T15:34:18.151+0000 7f769b60a700  1 mgr.server log_access_denied  access denied
2022-04-02T15:34:18.151+0000 7f769b60a700  0 log_channel(audit) log [INF] : from='client.15396 -' entity='mgr.y' cmd=[{"prefix": "get_command_descriptions"}]:  access denied
2022-04-02T15:34:18.151+0000 7f769b60a700 -1 mgr.server reply reply (13) Permission denied access denied: does your client key have mgr caps? See http://docs.ceph.com/en/latest/mgr/administrator/#client-authentication
...
2022-04-02T15:34:18.153+0000 7f7666b9b700  0 [cephadm ERROR root] Non-zero return from ['ceph', '-k', '/var/lib/ceph/mgr/ceph-y/keyring', '-n', 'mgr.y', 'tell', 'mgr.y', 'rotate-key', '-i', '-']: 2022-04-02T15:34:18.146+0000 7fdf9f7fe700  1 -- 172.21.15.146:0/952779436 <== mon.1 v2:172.21.15.146:3301/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (secure 0 0 0) 0x7fdf900037b0 con 0x7fdfa0105fc0

This particular client (ceph tell rotate-key) is called from CephadmOrchestrator._rotate_daemon_key():
    def _rotate_daemon_key(self, daemon_spec: CephadmDaemonDeploySpec) -> str:
        self.log.info(f'Rotating authentication key for {daemon_spec.name()}')
        rc, out, err = self.mon_command({
            'prefix': 'auth get-or-create-pending',
            'entity': daemon_spec.entity_name(),
            'format': 'json',
        })
        j = json.loads(out)
        pending_key = j[0]['pending_key']

        # deploy a new keyring file
        if daemon_spec.daemon_type != 'osd':
            daemon_spec = self.cephadm_services[daemon_type_to_service(
                daemon_spec.daemon_type)].prepare_create(daemon_spec)
        CephadmServe(self)._create_daemon(daemon_spec, reconfig=True)

        # try to be clever, or fall back to restarting the daemon
        rc = -1
        if daemon_spec.daemon_type == 'osd':
            rc, out, err = self.tool_exec(
                args=['ceph', 'tell', daemon_spec.name(), 'rotate-stored-key', '-i', '-'],
                stdin=pending_key.encode()
            )
            if not rc:
                rc, out, err = self.tool_exec(
                    args=['ceph', 'tell', daemon_spec.name(), 'rotate-key', '-i', '-'],
                    stdin=pending_key.encode()
                )
        elif daemon_spec.daemon_type == 'mds':
            rc, out, err = self.tool_exec(
                args=['ceph', 'tell', daemon_spec.name(), 'rotate-key', '-i', '-'],
                stdin=pending_key.encode()
            )
        elif (
                daemon_spec.daemon_type == 'mgr'
                and daemon_spec.daemon_id == self.get_mgr_id()
        ):
            rc, out, err = self.tool_exec(
                args=['ceph', 'tell', daemon_spec.name(), 'rotate-key', '-i', '-'],
                stdin=pending_key.encode()
            )
        if rc:
            self._daemon_action(daemon_spec, 'restart')

        return f'Rotated key for {daemon_spec.name()}'
There are two interesting paths here:

daemon_spec.daemon_id == self.get_mgr_id() which fails because of the caps issue;
the daeamon restart which finally leads to the mgr fail mon commands. It's a fallback for #1.

mon.a received the mgr fail from mgr.y and requested its standby:
rzarzynski@teuthology:/a/yuriw-2022-04-02_14:54:16-orch-wip-yuri11-testing-2022-04-01-0842-distro-default-smithi/6773812$ less ./remote/smithi146/log/71f5957e-b298-11ec-8c36-001a4aab830c/ceph-mon.a.log.gz
...
22-04-02T15:34:19.049+0000 7f70841e2700  2 mon.a@0(leader) e3 send_reply 0x55bddced7e00 0x55bddc92fa00 mon_command_ack([{"prefix": "mgr fail", "who": "y"}]=0  v21) v1
2022-04-02T15:34:19.049+0000 7f70841e2700 15 mon.a@0(leader) e3 send_reply routing reply to 172.21.15.146:0/958567540 via v2:172.21.15.146:3301/0 for request mon_command({"prefix": "mgr fail", "who": "y"} v 0) v1
2022-04-02T15:34:19.049+0000 7f70849e3700  1 -- [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] reap_dead start
2022-04-02T15:34:19.049+0000 7f70841e2700  1 -- [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] --> [v2:172.21.15.146:3301/0,v1:172.21.15.146:6790/0] -- route(mon_command_ack([{"prefix": "mgr fail", "who": "y"}]=0  v21) v1 tid 1258) v3 -- 0x55bddc6914a0 con 0x55bddc6fe400
2022-04-02T15:34:19.049+0000 7f70841e2700  0 log_channel(cluster) log [DBG] : mgrmap e21: x(active, starting, since 0.891945s), standbys: y

The mgr.y instance has actually restarted:
2022-04-02T15:34:19.092+0000 7f4a904141c0  0 ceph version 17.0.0-11449-g76cd7b38 (76cd7b384b2b46da30d83ba18f236e3ddc3bdb8a) quincy (dev), process ceph-mgr, pid 7
2022-04-02T15:34:19.092+0000 7f4a904141c0  0 pidfile_write: ignore empty --pid-file
2022-04-02T15:34:19.093+0000 7f4a904141c0  1  Processor -- start
2022-04-02T15:34:19.093+0000 7f4a904141c0  1 --  start start
2022-04-02T15:34:19.093+0000 7f4a904141c0  4 mgr init Registered monc callback
2022-04-02T15:34:19.093+0000 7f4a904141c0  1 --2-  >> [v2:172.21.15.171:3300/0,v1:17

However, since then there is no log entry from CephadmOrchestrator._rotate_daeamon_key() while the test still continues
Hypothesis #0

The current hypothesis is:

ceph-mgr has a concept of Standby mode the mgr fail command triggers (calling self._daemon_action(daemon_spec, 'restart') doesn't just restart the instance!),
to handle operations a module must be compatible with such a mode,
the cephadm nor orchestrator modules apparently aren't:

2022-04-02T15:34:20.130+0000 7f4a904141c0  4 mgr[py] Standby mode not provided by module 'orchestrator'
...
2022-04-02T15:34:25.252+0000 7f4a904141c0  4 mgr[py] Standby mode not provided by module 'cephadm'

Verification #0

When analyzing the problem it has been found that:

the missing pending_key rotation happens randomly and
it is associated with lack of the KeyRing reload that should follow the mgr fail command.

The mgr fail command isn't a replacement for a daemon restart in any single case.
A successful KeyRing reload sequence happens as a result of MgrStandby::respawn() called from MgrStandby::handle_mgr_map().
2022-05-16T20:25:42.541+0200 7f1e847fb700 -1 mgr handle_mgr_map I was active but no longer am
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  e: '/home/rzarz/dev/ceph-2/build/bin/ceph-mgr'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  0: '/home/rzarz/dev/ceph-2/build/bin/ceph-mgr'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  1: '-i'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  2: 'x'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  3: '-c'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  4: '/home/rzarz/dev/ceph-2/build/ceph.conf'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn respawning with exe /home/rzarz/dev/ceph-2/build/bin/ceph-mgr
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  exe_path /proc/self/exe
...
2022-05-16T20:25:42.609+0200 7f9022a84000  2 auth: KeyRing::load: loaded key file /home/rzarz/dev/ceph-2/build/dev/mgr.x/keyring this=0x7fff97afa940
2022-05-16T20:25:42.609+0200 7f9022a84000  2 auth: KeyRing::load: loaded key file /home/rzarz/dev/ceph-2/build/dev/mgr.x/keyring
2022-05-16T20:25:42.609+0200 7f9022a84000  2 auth: KeyRing this=0x7fff97afa940secret=AQDhloJifhkjCBAA5wPoYBrjskFYPK1IvXR4OQ==

However, this path is conditional and, in some situations, the respawn can be skipped:
void MgrStandby::handle_mgr_map(ref_t<MMgrMap> mmap)
{
  // ..
  bool need_respawn = py_module_registry.handle_mgr_map(map);
  if (need_respawn) {
    dout(1) << "respawning because set of enabled modules changed!" << dendl;
    respawn();
  }
  
  if (active_in_map) {
    if (!active_mgr) {
      dout(1) << "Activating!" << dendl;
      active_mgr.reset(new Mgr(&monc, map, &py_module_registry,
                               client_messenger.get(), &objecter,
                               &client, clog, audit_clog));
      active_mgr->background_init(new LambdaContext(
            [this](int r){
              // Advertise our active-ness ASAP instead of waiting for
              // next tick.
              std::lock_guard l(lock);
              send_beacon();
            })); 
      dout(1) << "I am now activating" << dendl;
    } else {
      dout(10) << "I was already active" << dendl;
      bool need_respawn = active_mgr->got_mgr_map(map);
      if (need_respawn) {
        respawn();
      }
    } 
    // ...
  } else if (active_mgr != nullptr) {
    derr << "I was active but no longer am" << dendl;
    respawn();
  } else {
    // ...
  }
}
This has been observed altogether with the correlation to the key rotation stall:
2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr handle_mgr_map I was already active

The race condition

There is a race condition between the mgr fail handling  and mgrbeacon.
diff --git a/src/mon/MgrMonitor.cc b/src/mon/MgrMonitor.cc
index 8ada44e2628..9000b2e0687 100644
--- a/src/mon/MgrMonitor.cc
+++ b/src/mon/MgrMonitor.cc
@@ -1203,7 +1203,9 @@ bool MgrMonitor::prepare_command(MonOpRequestRef op)
     }
 
     if (changed && pending_map.active_gid == 0) {
+      dout(5) << "========== changed and active_state == 0" << dendl;
       promote_standby();
+      dout(5) << "========== after promote_standby: " << pending_map.active_gid << dendl;
     }
   } else if (prefix == "mgr module enable") {
     string module;
2022-05-17T00:11:19.602+0200 7f6bd5769700  0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "mgr fail", "who": "x"} v 0) v1
...
2022-05-17T00:11:19.614+0200 7f6bd5769700  5 mon.a@0(leader).mgr e25 ========== changed and active_state == 0
2022-05-17T00:11:19.614+0200 7f6bd5769700  5 mon.a@0(leader).mgr e25 ========== after promote_standby: 0
2022-05-17T00:11:19.614+0200 7f6bd5769700  4 mon.a@0(leader).mgr e25 prepare_command done, r=0
...
2022-05-17T00:11:19.630+0200 7f6bd5769700  4 mon.a@0(leader).mgr e25 selecting new active 4210 x (was 0 )

bool MgrMonitor::prepare_beacon(MonOpRequestRef op)
  if (pending_map.active_gid == m->get_gid()) {
    // ...
  } else if (pending_map.active_gid == 0) {
    // There is no currently active daemon, select this one.
    if (pending_map.standbys.count(m->get_gid())) {
      drop_standby(m->get_gid(), false);
    }
    dout(4) << "selecting new active " << m->get_gid()
            << " " << m->get_name()
            << " (was " << pending_map.active_gid << " "
            << pending_map.active_name << ")" << dendl;
    pending_map.active_gid = m->get_gid();
    pending_map.active_name = m->get_name();
    pending_map.active_change = ceph_clock_now()
The 25 version of MgrMap, when handled at mgr.x, doesn't trigger the respawn() path:
2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr ms_dispatch2 active mgrmap(e 25) v1
2022-05-17T00:10:11.197+0200 7fa3d1e0a700  4 mgr handle_mgr_map received map epoch 25
2022-05-17T00:10:11.197+0200 7fa3d1e0a700  4 mgr handle_mgr_map active in map: 1 active is 4210
2022-05-17T00:10:11.197+0200 7fa3d6613700 10 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_read_frame_dispatch tag=17
2022-05-17T00:10:11.197+0200 7fa3d6613700  5 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_message got 43089 + 0 + 0 byte message. envelope type=1796 src mon.0 off 0
2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr handle_mgr_map I was already active