Skip to content

Instantly share code, notes, and snippets.

@rzarzynski
Last active May 19, 2022 17:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rzarzynski/25ac59c8422e9ad0b1710a765a77f19a to your computer and use it in GitHub Desktop.
Save rzarzynski/25ac59c8422e9ad0b1710a765a77f19a to your computer and use it in GitHub Desktop.

auth: the key rotation failure

It looks the rotation of keys works fine for all OSDs; it fail at first mgr instance (mgr.y).

2022-04-02T15:34:17.008 INFO:teuthology.orchestra.run.smithi146.stderr:++ ceph auth get-key osd.7
2022-04-02T15:34:17.365 INFO:teuthology.orchestra.run.smithi146.stdout:rotating key for mgr.y
2022-04-02T15:34:17.366 INFO:teuthology.orchestra.run.smithi146.stderr:+ NK=AQDjbEhir5HtORAAHbC3qXbxxwbs1GwYyInm2g==
2022-04-02T15:34:17.366 INFO:teuthology.orchestra.run.smithi146.stderr:+ '[' AQD1akhi07GlGhAAibYZDWhJKuspZgRDt//5vg== == AQDjbEhir5HtORAAHbC3qXbxxwbs1GwYyInm2g== ']'
2022-04-02T15:34:17.366 INFO:teuthology.orchestra.run.smithi146.stderr:+ for f in osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 mgr.y mgr.x
2022-04-02T15:34:17.367 INFO:teuthology.orchestra.run.smithi146.stderr:+ echo 'rotating key for mgr.y'
2022-04-02T15:34:17.367 INFO:teuthology.orchestra.run.smithi146.stderr:++ ceph auth get-key mgr.y
2022-04-02T15:34:17.715 INFO:teuthology.orchestra.run.smithi146.stderr:+ K=AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA==
2022-04-02T15:34:17.715 INFO:teuthology.orchestra.run.smithi146.stderr:+ NK=AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA==
2022-04-02T15:34:17.716 INFO:teuthology.orchestra.run.smithi146.stderr:+ ceph orch daemon rotate-key mgr.y

The test got stuck on mgr.y for many hours:

2022-04-02T21:32:49.555 INFO:teuthology.orchestra.run.smithi146.stderr:++ ceph auth get-key mgr.y
2022-04-02T21:32:49.910 INFO:teuthology.orchestra.run.smithi146.stderr:+ NK=AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA==
2022-04-02T21:32:49.910 INFO:teuthology.orchestra.run.smithi146.stderr:+ '[' AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA== == AQD4aUhiCSlkCxAANVQf8PPD9eg8Cri81MR4eA== ']'
2022-04-02T21:32:49.910 INFO:teuthology.orchestra.run.smithi146.stderr:+ sleep 5

The issue left a track in the manager's log:

2022-04-02T15:34:18.153+0000 7f7666b9b700  0 [cephadm ERROR root] Non-zero return from ['ceph', '-k', '/var/lib/ceph/mgr/ceph-y/keyring', '-n', 'mgr.y', 'tell', 'mgr.y', 'rotate-key', '-i', '-']: 2022-04-02T15:34:18.146+0000 7fdf9f7fe700  1 -- 172.21.15.146:0/952779436 <== mon.1 v2:172.21.15.146:3301/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (secure 0 0 0) 0x7fdf900037b0 con 0x7fdfa0105fc0

Making the test green

Just the sake of verifying the issue is about rotations at ceph-mgr instances only.

--- a/qa/suites/orch/cephadm/with-work/tasks/rotate-keys.yaml
+++ b/qa/suites/orch/cephadm/with-work/tasks/rotate-keys.yaml
@@ -3,7 +3,7 @@ tasks:
     mon.a:
     - |
       set -ex
-      for f in osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7 mgr.y mgr.x
+      for f in osd.0 osd.1 osd.2 osd.3 osd.4 osd.5 osd.6 osd.7
       do
           echo "rotating key for $f"
           K=$(ceph auth get-key $f)

Root cause: a surprise at _daemon_action(daemon_spec, 'restart')?

It appears the initially visible track the failure is that the mgr.y refuses a ceph tell CLI client, spawned by its cephadm own module, due to lack of credentials to contact the manager.

rzarzynski@teuthology:/a/yuriw-2022-04-02_14:54:16-orch-wip-yuri11-testing-2022-04-01-0842-distro-default-smithi/6773812$ less ./remote/smithi146/log/71f5957e-b298-11ec-8c36-001a4aab830c/ceph-mgr.y.log.gz
...
2022-04-02T15:34:18.037+0000 7f7666b9b700  0 [cephadm INFO root] Rotating authentication key for mgr.y
2022-04-02T15:34:18.037+0000 7f7666b9b700  0 log_channel(cephadm) log [INF] : Rotating authentication key for mgr.y
...
2022-04-02T15:34:18.044+0000 7f7666b9b700  0 [cephadm DEBUG root] exec: ceph -k /var/lib/ceph/mgr/ceph-y/keyring -n mgr.y tell mgr.y rotate-key -i -
2022-04-02T15:34:18.074+0000 7f76d0b11700  1 -- 172.21.15.146:0/958567540 --> [v2:172.21.15.146:3301/0,v1:172.21.15.146:6790/0] -- log(3 entries from seq 332 at 2022-04-02T15:34:17.995313+0000) v1 -- 0x55646b2b3180 con 0x5564617a9400
2022-04-02T15:34:18.150
...
2022-04-02T15:34:18.150+0000 7f76d631c700 10 mgr.server ms_handle_authentication ms_handle_authentication new session 0x55646b2eaa20 con 0x55646b97b000 entity mgr.y addr 
2022-04-02T15:34:18.151+0000 7f76d631c700  1 --2- v2:172.21.15.146:6800/907340359 >> 172.21.15.146:0/4073159835 conn(0x55646b97b000 0x55646b2f6c00 secure :-1 s=READY pgs=4 cs=0 l=1 rev1=1 crypto rx=0x55646b02890
0 tx=0x55646a7bbbc0 comp rx=0 tx=0).ready entity=client.15396 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2022-04-02T15:34:18.151+0000 7f769b60a700  1 -- v2:172.21.15.146:6800/907340359 <== client.15396 172.21.15.146:0/4073159835 1 ==== command(tid 0: {"prefix": "get_command_descriptions"}) v1 ==== 62+0+0 (secure 0
0 0) 0x55646af709a0 con 0x55646b97b000
2022-04-02T15:34:18.151+0000 7f769b60a700 10 mgr.server _handle_command decoded-size=1 prefix=get_command_descriptions
2022-04-02T15:34:18.151+0000 7f769b60a700  1 mgr.server log_access_denied  access denied
2022-04-02T15:34:18.151+0000 7f769b60a700  0 log_channel(audit) log [INF] : from='client.15396 -' entity='mgr.y' cmd=[{"prefix": "get_command_descriptions"}]:  access denied
2022-04-02T15:34:18.151+0000 7f769b60a700 -1 mgr.server reply reply (13) Permission denied access denied: does your client key have mgr caps? See http://docs.ceph.com/en/latest/mgr/administrator/#client-authentication
...
2022-04-02T15:34:18.153+0000 7f7666b9b700  0 [cephadm ERROR root] Non-zero return from ['ceph', '-k', '/var/lib/ceph/mgr/ceph-y/keyring', '-n', 'mgr.y', 'tell', 'mgr.y', 'rotate-key', '-i', '-']: 2022-04-02T15:34:18.146+0000 7fdf9f7fe700  1 -- 172.21.15.146:0/952779436 <== mon.1 v2:172.21.15.146:3301/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 194+0+0 (secure 0 0 0) 0x7fdf900037b0 con 0x7fdfa0105fc0

This particular client (ceph tell rotate-key) is called from CephadmOrchestrator._rotate_daemon_key():

    def _rotate_daemon_key(self, daemon_spec: CephadmDaemonDeploySpec) -> str:
        self.log.info(f'Rotating authentication key for {daemon_spec.name()}')
        rc, out, err = self.mon_command({
            'prefix': 'auth get-or-create-pending',
            'entity': daemon_spec.entity_name(),
            'format': 'json',
        })
        j = json.loads(out)
        pending_key = j[0]['pending_key']

        # deploy a new keyring file
        if daemon_spec.daemon_type != 'osd':
            daemon_spec = self.cephadm_services[daemon_type_to_service(
                daemon_spec.daemon_type)].prepare_create(daemon_spec)
        CephadmServe(self)._create_daemon(daemon_spec, reconfig=True)

        # try to be clever, or fall back to restarting the daemon
        rc = -1
        if daemon_spec.daemon_type == 'osd':
            rc, out, err = self.tool_exec(
                args=['ceph', 'tell', daemon_spec.name(), 'rotate-stored-key', '-i', '-'],
                stdin=pending_key.encode()
            )
            if not rc:
                rc, out, err = self.tool_exec(
                    args=['ceph', 'tell', daemon_spec.name(), 'rotate-key', '-i', '-'],
                    stdin=pending_key.encode()
                )
        elif daemon_spec.daemon_type == 'mds':
            rc, out, err = self.tool_exec(
                args=['ceph', 'tell', daemon_spec.name(), 'rotate-key', '-i', '-'],
                stdin=pending_key.encode()
            )
        elif (
                daemon_spec.daemon_type == 'mgr'
                and daemon_spec.daemon_id == self.get_mgr_id()
        ):
            rc, out, err = self.tool_exec(
                args=['ceph', 'tell', daemon_spec.name(), 'rotate-key', '-i', '-'],
                stdin=pending_key.encode()
            )
        if rc:
            self._daemon_action(daemon_spec, 'restart')

        return f'Rotated key for {daemon_spec.name()}'

There are two interesting paths here:

  1. daemon_spec.daemon_id == self.get_mgr_id() which fails because of the caps issue;
  2. the daeamon restart which finally leads to the mgr fail mon commands. It's a fallback for #1.

mon.a received the mgr fail from mgr.y and requested its standby:

rzarzynski@teuthology:/a/yuriw-2022-04-02_14:54:16-orch-wip-yuri11-testing-2022-04-01-0842-distro-default-smithi/6773812$ less ./remote/smithi146/log/71f5957e-b298-11ec-8c36-001a4aab830c/ceph-mon.a.log.gz
...
22-04-02T15:34:19.049+0000 7f70841e2700  2 mon.a@0(leader) e3 send_reply 0x55bddced7e00 0x55bddc92fa00 mon_command_ack([{"prefix": "mgr fail", "who": "y"}]=0  v21) v1
2022-04-02T15:34:19.049+0000 7f70841e2700 15 mon.a@0(leader) e3 send_reply routing reply to 172.21.15.146:0/958567540 via v2:172.21.15.146:3301/0 for request mon_command({"prefix": "mgr fail", "who": "y"} v 0) v1
2022-04-02T15:34:19.049+0000 7f70849e3700  1 -- [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] reap_dead start
2022-04-02T15:34:19.049+0000 7f70841e2700  1 -- [v2:172.21.15.146:3300/0,v1:172.21.15.146:6789/0] --> [v2:172.21.15.146:3301/0,v1:172.21.15.146:6790/0] -- route(mon_command_ack([{"prefix": "mgr fail", "who": "y"}]=0  v21) v1 tid 1258) v3 -- 0x55bddc6914a0 con 0x55bddc6fe400
2022-04-02T15:34:19.049+0000 7f70841e2700  0 log_channel(cluster) log [DBG] : mgrmap e21: x(active, starting, since 0.891945s), standbys: y

The mgr.y instance has actually restarted:

2022-04-02T15:34:19.092+0000 7f4a904141c0  0 ceph version 17.0.0-11449-g76cd7b38 (76cd7b384b2b46da30d83ba18f236e3ddc3bdb8a) quincy (dev), process ceph-mgr, pid 7
2022-04-02T15:34:19.092+0000 7f4a904141c0  0 pidfile_write: ignore empty --pid-file
2022-04-02T15:34:19.093+0000 7f4a904141c0  1  Processor -- start
2022-04-02T15:34:19.093+0000 7f4a904141c0  1 --  start start
2022-04-02T15:34:19.093+0000 7f4a904141c0  4 mgr init Registered monc callback
2022-04-02T15:34:19.093+0000 7f4a904141c0  1 --2-  >> [v2:172.21.15.171:3300/0,v1:17

However, since then there is no log entry from CephadmOrchestrator._rotate_daeamon_key() while the test still continues

Hypothesis #0

The current hypothesis is:

  1. ceph-mgr has a concept of Standby mode the mgr fail command triggers (calling self._daemon_action(daemon_spec, 'restart') doesn't just restart the instance!),
  2. to handle operations a module must be compatible with such a mode,
  3. the cephadm nor orchestrator modules apparently aren't:
2022-04-02T15:34:20.130+0000 7f4a904141c0  4 mgr[py] Standby mode not provided by module 'orchestrator'
...
2022-04-02T15:34:25.252+0000 7f4a904141c0  4 mgr[py] Standby mode not provided by module 'cephadm'

Verification #0

When analyzing the problem it has been found that:

  • the missing pending_key rotation happens randomly and
  • it is associated with lack of the KeyRing reload that should follow the mgr fail command.

The mgr fail command isn't a replacement for a daemon restart in any single case.

A successful KeyRing reload sequence happens as a result of MgrStandby::respawn() called from MgrStandby::handle_mgr_map().

2022-05-16T20:25:42.541+0200 7f1e847fb700 -1 mgr handle_mgr_map I was active but no longer am
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  e: '/home/rzarz/dev/ceph-2/build/bin/ceph-mgr'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  0: '/home/rzarz/dev/ceph-2/build/bin/ceph-mgr'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  1: '-i'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  2: 'x'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  3: '-c'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  4: '/home/rzarz/dev/ceph-2/build/ceph.conf'
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn respawning with exe /home/rzarz/dev/ceph-2/build/bin/ceph-mgr
2022-05-16T20:25:42.541+0200 7f1e847fb700  1 mgr respawn  exe_path /proc/self/exe
...
2022-05-16T20:25:42.609+0200 7f9022a84000  2 auth: KeyRing::load: loaded key file /home/rzarz/dev/ceph-2/build/dev/mgr.x/keyring this=0x7fff97afa940
2022-05-16T20:25:42.609+0200 7f9022a84000  2 auth: KeyRing::load: loaded key file /home/rzarz/dev/ceph-2/build/dev/mgr.x/keyring
2022-05-16T20:25:42.609+0200 7f9022a84000  2 auth: KeyRing this=0x7fff97afa940secret=AQDhloJifhkjCBAA5wPoYBrjskFYPK1IvXR4OQ==

However, this path is conditional and, in some situations, the respawn can be skipped:

void MgrStandby::handle_mgr_map(ref_t<MMgrMap> mmap)
{
  // ..
  bool need_respawn = py_module_registry.handle_mgr_map(map);
  if (need_respawn) {
    dout(1) << "respawning because set of enabled modules changed!" << dendl;
    respawn();
  }
  
  if (active_in_map) {
    if (!active_mgr) {
      dout(1) << "Activating!" << dendl;
      active_mgr.reset(new Mgr(&monc, map, &py_module_registry,
                               client_messenger.get(), &objecter,
                               &client, clog, audit_clog));
      active_mgr->background_init(new LambdaContext(
            [this](int r){
              // Advertise our active-ness ASAP instead of waiting for
              // next tick.
              std::lock_guard l(lock);
              send_beacon();
            })); 
      dout(1) << "I am now activating" << dendl;
    } else {
      dout(10) << "I was already active" << dendl;
      bool need_respawn = active_mgr->got_mgr_map(map);
      if (need_respawn) {
        respawn();
      }
    } 
    // ...
  } else if (active_mgr != nullptr) {
    derr << "I was active but no longer am" << dendl;
    respawn();
  } else {
    // ...
  }
}

This has been observed altogether with the correlation to the key rotation stall:

2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr handle_mgr_map I was already active

The race condition

There is a race condition between the mgr fail handling and mgrbeacon.

diff --git a/src/mon/MgrMonitor.cc b/src/mon/MgrMonitor.cc
index 8ada44e2628..9000b2e0687 100644
--- a/src/mon/MgrMonitor.cc
+++ b/src/mon/MgrMonitor.cc
@@ -1203,7 +1203,9 @@ bool MgrMonitor::prepare_command(MonOpRequestRef op)
     }
 
     if (changed && pending_map.active_gid == 0) {
+      dout(5) << "========== changed and active_state == 0" << dendl;
       promote_standby();
+      dout(5) << "========== after promote_standby: " << pending_map.active_gid << dendl;
     }
   } else if (prefix == "mgr module enable") {
     string module;
2022-05-17T00:11:19.602+0200 7f6bd5769700  0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "mgr fail", "who": "x"} v 0) v1
...
2022-05-17T00:11:19.614+0200 7f6bd5769700  5 mon.a@0(leader).mgr e25 ========== changed and active_state == 0
2022-05-17T00:11:19.614+0200 7f6bd5769700  5 mon.a@0(leader).mgr e25 ========== after promote_standby: 0
2022-05-17T00:11:19.614+0200 7f6bd5769700  4 mon.a@0(leader).mgr e25 prepare_command done, r=0
...
2022-05-17T00:11:19.630+0200 7f6bd5769700  4 mon.a@0(leader).mgr e25 selecting new active 4210 x (was 0 )
bool MgrMonitor::prepare_beacon(MonOpRequestRef op)
  if (pending_map.active_gid == m->get_gid()) {
    // ...
  } else if (pending_map.active_gid == 0) {
    // There is no currently active daemon, select this one.
    if (pending_map.standbys.count(m->get_gid())) {
      drop_standby(m->get_gid(), false);
    }
    dout(4) << "selecting new active " << m->get_gid()
            << " " << m->get_name()
            << " (was " << pending_map.active_gid << " "
            << pending_map.active_name << ")" << dendl;
    pending_map.active_gid = m->get_gid();
    pending_map.active_name = m->get_name();
    pending_map.active_change = ceph_clock_now()

The 25 version of MgrMap, when handled at mgr.x, doesn't trigger the respawn() path:

2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr ms_dispatch2 active mgrmap(e 25) v1
2022-05-17T00:10:11.197+0200 7fa3d1e0a700  4 mgr handle_mgr_map received map epoch 25
2022-05-17T00:10:11.197+0200 7fa3d1e0a700  4 mgr handle_mgr_map active in map: 1 active is 4210
2022-05-17T00:10:11.197+0200 7fa3d6613700 10 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_read_frame_dispatch tag=17
2022-05-17T00:10:11.197+0200 7fa3d6613700  5 --2- 127.0.0.1:0/743576734 >> [v2:127.0.0.1:40929/0,v1:127.0.0.1:40930/0] conn(0x5592635ef400 0x5592635f6580 secure :-1 s=THROTTLE_DONE pgs=130 cs=0 l=1 rev1=1 crypto rx=0x55926362e810 tx=0x559263563b60 comp rx=0 tx=0).handle_message got 43089 + 0 + 0 byte message. envelope type=1796 src mon.0 off 0
2022-05-17T00:10:11.197+0200 7fa3d1e0a700 10 mgr handle_mgr_map I was already active
@rzarzynski
Copy link
Author

Local replicator

while true; do bin/ceph auth get-or-create-pending mgr.x --debug_ms=0 --debug_auth=0 -f json | jq '.[0] | "[\(.entity)]\n\tkey=\(.pending_key)"' -r > dev/mgr.x/keyring; bin/ceph mgr fail x; date; while bin/ceph auth get mgr.x --debug_ms=0 --debug_auth=0 | grep "pending key"; do echo no change; sleep 1; done; sleep 70; done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment