All 16 Gimlet sleds did not show up in pilot sp ls in either switch zone. SPs show up when packets emitted by udpbroadcast arrive in the switch zone (passing through the net task in transit, then into the management network). We see both PSC and Sidecar SPs show up just fine.
Running pilot sp exec -e 'component-details monorail' BRM50230003 provides port states and counters for the VC7448 (i.e. the main management network switch). We see 16 ports are Down, which (presumably) corresponds to the 16 unpopulated cubbies. All other ports are up.
Running this command twice with a short time gap, we see the rx port counters do not go up for any of the Gimlet SPs. They do go up for the Sidecar and PSC SPs. This indicates that the VSC7448 is not seeing packets arrive from the Gimlet SPs. This agrees with the fact that pilot sp ls does not hear any broadcast packets.
Running /usr/platform/oxide/bin/ipcc ident on one of the Gimlet sleds hangs, which indicates that the host-sp-comms task is not replying on the Gimlet SP. This also agrees with the ::ipcc_dbgmsg logs, showing the last successful IPCC message was weeks ago.
By running pilot host ls in a switch zone, we can see host IP addresses for the Gimlet sleds. We know that the corresponding SP address is 0x10 lower, so we tried to ping those addresses. All SPs respond to ping (!!), from both switch zones. I lose my bet that the Hubris kernel is hung. This tells us that the SP kernel is running and that the management network links are functioning.
Checking port counters around a ping, we see the unicast rx counter go up by 2 packets after a ping (it's presumably 2 because we do neighbor discovery then the ICMPv6 echo):
support@oxz_switch1:~$ pilot sp exec -e 'component-details monorail' BRM50230011 | grep "port: 10"
Nov 20 22:20:45.616 INFO creating SP handle on interface sidecar0, component: faux-mgs
Nov 20 22:20:45.617 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe05:3c00%78]:11111, interface: sidecar0, component: faux-mgs
PortStatus(Ok(PortStatus { port: 10, cfg: PortConfig { mode: Sgmii(Speed100M), dev: (Dev2g5, 2), serdes: (Serdes6g, 2) }, link_status: Up, phy_status: None, counters: PortCounters { rx: PacketCount { multicast: 1743839, unicast: 111955941, broadcast: 0 }, tx: PacketCount { multicast: 113012097, unicast: 45986, broadcast: 0 }, link_down_sticky: true, phy_link_down_sticky: false } }))
support@oxz_switch1:~$ ping fe80::aa40:25ff:fe04:ac8%gimlet8
fe80::aa40:25ff:fe04:ac8%gimlet8 is alive
support@oxz_switch1:~$ pilot sp exec -e 'component-details monorail' BRM50230011 | grep "port: 10"
Nov 20 22:21:01.960 INFO creating SP handle on interface sidecar0, component: faux-mgs
Nov 20 22:21:01.970 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe05:3c00%78]:11111, interface: sidecar0, component: faux-mgs
PortStatus(Ok(PortStatus { port: 10, cfg: PortConfig { mode: Sgmii(Speed100M), dev: (Dev2g5, 2), serdes: (Serdes6g, 2) }, link_status: Up, phy_status: None, counters: PortCounters { rx: PacketCount { multicast: 1743839, unicast: 111955943, broadcast: 0 }, tx: PacketCount { multicast: 113012105, unicast: 45988, broadcast: 0 }, link_down_sticky: true, phy_link_down_sticky: false } }))
This is further confirmation that packets are leaving the SP and making it through the management network.
Trying to talk to the SP's udpecho task does not work; the only thing that seems to work is ping.
ICMPv6 is handled within the net task itself and seems functional, but it looks like other tasks are unable to use the net task to send packets.
Here is the list of task priorities:
It's surprising that the net task took out host-sp-comms. The usual IPCC inventory messages performed by the host should not depend on net (moderate confidence).
net could take out host-sp-comms if it was monopolizing CPU time, because it's a higher priority. If the net task was always runnable (for Mysterious Reasons), then it would always be selected over lower-priority tasks (with higher numerical priority values). It's unclear why this could occur.
In conclusion, I sure hope resetting sled 23 brings the SP back online!