-
-
Save annapoorna-s-alt/bf88d0a1a268ea8fd5754e13ed8970fb to your computer and use it in GitHub Desktop.
CRAYSAT-1865:recreate SDU collection link
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# With cray-sdu-rda not running, ensure that it does not start cray-sdu-rda | |
Before booting the status of cray-sdu-rda | |
ncn-m001:/mnt/developer/sann # systemctl status cray-sdu-rda | |
○ cray-sdu-rda.service - Cray SDU/RDA Container Service | |
Loaded: loaded (/usr/lib/systemd/system/cray-sdu-rda.service; disabled; vendor preset: disabled) | |
Active: inactive (dead) | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217520]: Sending SIGKILL to remaining processes... | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217520]: All filesystems, swaps, loop devices, MD devices and DM devices detached. | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217520]: Halting system. | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217975]: All filesystems, swaps, loop devices, MD devices and DM devices detached. | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217975]: Halting system. | |
Jul 22 20:42:47 ncn-m001 podman[1217520]: 2024-07-22 20:42:47.748319518 +0000 UTC m=+182.398509359 container died 14413efb0a5> | |
Jul 22 20:42:48 ncn-m001 podman[1221317]: 2024-07-22 20:42:48.083903084 +0000 UTC m=+0.324355875 container remove 14413efb0a5> | |
Jul 22 20:42:48 ncn-m001 cray-sdu-rda[1221227]: flock: getting lock took 0.755862 seconds | |
Jul 22 20:42:48 ncn-m001 systemd[1]: cray-sdu-rda.service: Deactivated successfully. | |
Jul 22 20:42:48 ncn-m001 systemd[1]: Stopped Cray SDU/RDA Container Service. | |
ncn-m001:/mnt/developer/sann # cd /var/opt/cray/sdu/ | |
ncn-m001:/var/opt/cray/sdu # ls | |
collection collection-local collection-mount lock | |
ncn-m001:/var/opt/cray/sdu # ls -l | |
total 1 | |
lrwxrwxrwx 1 root root 16 Jul 22 20:39 collection -> collection-mount | |
drwxr-xr-x 3 root root 20 Jul 22 13:46 collection-local | |
drwxrwx--- 1 2370 2370 0 Jan 1 1970 collection-mount | |
drwxr-xr-x 2 root root 28 Jul 22 13:46 lock | |
ncn-m001:/mnt/developer/sann/26-7-24-sdu # sat bootsys boot --stage ncn-power --ncn-boot-timeout 900 | |
IPMI username: root | |
IPMI password: | |
The following Non-compute Nodes (NCNs) will be included in this operation: | |
managers: | |
- ncn-m002 | |
- ncn-m003 | |
storage: | |
- ncn-s001 | |
- ncn-s002 | |
- ncn-s003 | |
workers: | |
- ncn-w001 | |
- ncn-w002 | |
- ncn-w003 | |
- ncn-w004 | |
The following Non-compute Nodes (NCNs) will be excluded from this operation: | |
managers: | |
- ncn-m001 | |
storage: [] | |
workers: [] | |
Are the above NCN groupings and exclusions correct? [yes,no] yes | |
INFO: Starting console logging on ncn-s003,ncn-s002,ncn-w001,ncn-w004,ncn-s001,ncn-w003,ncn-m003,ncn-m002,ncn-w002. | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Sending IPMI power on command to host ncn-s003 | |
INFO: Sending IPMI power on command to host ncn-s001 | |
INFO: Sending IPMI power on command to host ncn-s002 | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s003: b'2836ef39c532a6c83a1dd530b3ae4a61' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s001: b'f32a79b747ea84f6e7c4de1b63e9268b' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s002: b'ce886c50d372500ed502bf81aab43261' | |
WARNING: warnings.warn( | |
INFO: Powered on NCNs: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Unfreezing Ceph | |
INFO: Running command: ceph osd unset noout | |
INFO: Command output: noout is unset | |
INFO: Running command: ceph osd unset norecover | |
INFO: Command output: norecover is unset | |
INFO: Running command: ceph osd unset nobackfill | |
INFO: Command output: nobackfill is unset | |
INFO: Waiting up to 60 seconds for Ceph to become healthy after unfreeze | |
INFO: Checking Ceph health | |
INFO: Ceph is healthy. | |
INFO: Ceph unfreeze completed successfully on storage NCNs. | |
INFO: Checking whether ceph filesystem is mounted on /etc/cray/upgrade/csm. | |
INFO: ceph filesystem is already mounted on /etc/cray/upgrade/csm. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/config-data. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/config-data. | |
INFO: Successfully restarted 'cray-sdu-rda' service on ncn-m001 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-m002, ncn-m003 | |
INFO: Sending IPMI power on command to host ncn-m002 | |
INFO: Sending IPMI power on command to host ncn-m003 | |
INFO: Powered on NCNs: ncn-m002, ncn-m003 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w001 | |
INFO: Sending IPMI power on command to host ncn-w003 | |
INFO: Sending IPMI power on command to host ncn-w002 | |
INFO: Powered on NCNs: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Stopping console logging on ncn-s003,ncn-s002,ncn-w001,ncn-w004,ncn-s001,ncn-w003,ncn-m003,ncn-m002,ncn-w002. | |
INFO: Succeeded with boot of other management NCNs. | |
After booting the status of cray-sdu-rda | |
ncn-m001:/var/opt/cray/sdu # systemctl status cray-sdu-rda | |
○ cray-sdu-rda.service - Cray SDU/RDA Container Service | |
Loaded: loaded (/usr/lib/systemd/system/cray-sdu-rda.service; disabled; vendor preset: disabled) | |
Active: inactive (dead) | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217520]: Sending SIGKILL to remaining processes... | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217520]: All filesystems, swaps, loop devices, MD devices and DM devices detached. | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217520]: Halting system. | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217975]: All filesystems, swaps, loop devices, MD devices and DM devices detached. | |
Jul 22 20:42:47 ncn-m001 cray-sdu-rda[1217975]: Halting system. | |
Jul 22 20:42:47 ncn-m001 podman[1217520]: 2024-07-22 20:42:47.748319518 +0000 UTC m=+182.398509359 container died 14413efb0a5> | |
Jul 22 20:42:48 ncn-m001 podman[1221317]: 2024-07-22 20:42:48.083903084 +0000 UTC m=+0.324355875 container remove 14413efb0a5> | |
Jul 22 20:42:48 ncn-m001 cray-sdu-rda[1221227]: flock: getting lock took 0.755862 seconds | |
Jul 22 20:42:48 ncn-m001 systemd[1]: cray-sdu-rda.service: Deactivated successfully. | |
Jul 22 20:42:48 ncn-m001 systemd[1]: Stopped Cray SDU/RDA Container Service. | |
ncn-m001:/var/opt/cray/sdu # ls -l | |
total 1 | |
lrwxrwxrwx 1 root root 16 Jul 22 20:39 collection -> collection-mount | |
drwxr-xr-x 3 root root 20 Jul 22 13:46 collection-local | |
drwxrwx--- 1 2370 2370 0 Jan 1 1970 collection-mount | |
drwxr-xr-x 2 root root 28 Jul 22 13:46 lock |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# With cray-sdu-rda running and the /var/opt/cray/sdu/collection-mount s3fs mounted the /var/opt/cray/sdu/collection link existing pointing at collection-mount. | |
In this case, it should remain pointing at collection-mount | |
Before booting | |
ncn-m001:/var/opt/cray/sdu # ls -l | |
total 1 | |
lrwxrwxrwx 1 root root 16 Jul 26 06:33 collection -> collection-mount | |
drwxr-xr-x 3 root root 20 Jul 22 13:46 collection-local | |
drwxrwx--- 1 2370 2370 0 Jan 1 1970 collection-mount | |
drwxr-xr-x 2 root root 28 Jul 22 13:46 lock | |
ncn-m001:/mnt/developer/sann/26-7-24-sdu # sat bootsys boot --stage ncn-power --ncn-boot-timeout 900 | |
IPMI username: root | |
IPMI password: | |
The following Non-compute Nodes (NCNs) will be included in this operation: | |
managers: | |
- ncn-m002 | |
- ncn-m003 | |
storage: | |
- ncn-s001 | |
- ncn-s002 | |
- ncn-s003 | |
workers: | |
- ncn-w001 | |
- ncn-w002 | |
- ncn-w003 | |
- ncn-w004 | |
The following Non-compute Nodes (NCNs) will be excluded from this operation: | |
managers: | |
- ncn-m001 | |
storage: [] | |
workers: [] | |
Are the above NCN groupings and exclusions correct? [yes,no] yes | |
INFO: Starting console logging on ncn-w004,ncn-s002,ncn-w001,ncn-m002,ncn-m003,ncn-w003,ncn-s001,ncn-w002,ncn-s003. | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Sending IPMI power on command to host ncn-s002 | |
INFO: Sending IPMI power on command to host ncn-s001 | |
INFO: Sending IPMI power on command to host ncn-s003 | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s001: b'f32a79b747ea84f6e7c4de1b63e9268b' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s002: b'ce886c50d372500ed502bf81aab43261' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s003: b'2836ef39c532a6c83a1dd530b3ae4a61' | |
WARNING: warnings.warn( | |
INFO: Powered on NCNs: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Unfreezing Ceph | |
INFO: Running command: ceph osd unset noout | |
INFO: Command output: noout is unset | |
INFO: Running command: ceph osd unset norecover | |
INFO: Command output: norecover is unset | |
INFO: Running command: ceph osd unset nobackfill | |
INFO: Command output: nobackfill is unset | |
INFO: Waiting up to 60 seconds for Ceph to become healthy after unfreeze | |
INFO: Checking Ceph health | |
INFO: Ceph is healthy. | |
INFO: Ceph unfreeze completed successfully on storage NCNs. | |
INFO: Checking whether ceph filesystem is mounted on /etc/cray/upgrade/csm. | |
INFO: ceph filesystem is already mounted on /etc/cray/upgrade/csm. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/config-data. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/config-data. | |
INFO: Successfully restarted 'cray-sdu-rda' service on ncn-m001 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-m002, ncn-m003 | |
INFO: Sending IPMI power on command to host ncn-m003 | |
INFO: Sending IPMI power on command to host ncn-m002 | |
INFO: Powered on NCNs: ncn-m002, ncn-m003 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w003 | |
INFO: Sending IPMI power on command to host ncn-w001 | |
INFO: Sending IPMI power on command to host ncn-w002 | |
INFO: Powered on NCNs: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Stopping console logging on ncn-w004,ncn-s002,ncn-w001,ncn-m002,ncn-m003,ncn-w003,ncn-s001,ncn-w002,ncn-s003. | |
INFO: Succeeded with boot of other management NCNs. | |
ncn-m001:/var/opt/cray/sdu # ls -l | |
total 1 | |
lrwxrwxrwx 1 root root 16 Jul 26 06:33 collection -> collection-mount | |
drwxr-xr-x 3 root root 20 Jul 22 13:46 collection-local | |
drwxrwx--- 1 2370 2370 0 Jan 1 1970 collection-mount | |
drwxr-xr-x 2 root root 28 Jul 22 13:46 lock | |
# cray-sdu-rda running and /var/opt/cray/sdu/collection-mount NOT mounted. | |
ncn-m001:/var/opt/cray/sdu # findmnt --type fuse.s3fs | |
TARGET SOURCE FSTYPE OPTIONS | |
/var/opt/cray/config-data s3fs fuse.s3fs rw,relatime,user_id=0,group_id=0,allow_other | |
ncn-m001:/mnt/developer/sann/26-7-24-sdu # sat bootsys boot --stage ncn-power --ncn-boot-timeout 900 | |
IPMI username: root | |
IPMI password: | |
The following Non-compute Nodes (NCNs) will be included in this operation: | |
managers: | |
- ncn-m002 | |
- ncn-m003 | |
storage: | |
- ncn-s001 | |
- ncn-s002 | |
- ncn-s003 | |
workers: | |
- ncn-w001 | |
- ncn-w002 | |
- ncn-w003 | |
- ncn-w004 | |
The following Non-compute Nodes (NCNs) will be excluded from this operation: | |
managers: | |
- ncn-m001 | |
storage: [] | |
workers: [] | |
Are the above NCN groupings and exclusions correct? [yes,no] yes | |
INFO: Starting console logging on ncn-s002,ncn-w002,ncn-w001,ncn-w003,ncn-s003,ncn-m003,ncn-m002,ncn-s001,ncn-w004. | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Sending IPMI power on command to host ncn-s001 | |
INFO: Sending IPMI power on command to host ncn-s002 | |
INFO: Sending IPMI power on command to host ncn-s003 | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s001: b'f32a79b747ea84f6e7c4de1b63e9268b' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s002: b'ce886c50d372500ed502bf81aab43261' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s003: b'2836ef39c532a6c83a1dd530b3ae4a61' | |
WARNING: warnings.warn( | |
INFO: Powered on NCNs: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Unfreezing Ceph | |
INFO: Running command: ceph osd unset noout | |
INFO: Command output: noout is unset | |
INFO: Running command: ceph osd unset norecover | |
INFO: Command output: norecover is unset | |
INFO: Running command: ceph osd unset nobackfill | |
INFO: Command output: nobackfill is unset | |
INFO: Waiting up to 60 seconds for Ceph to become healthy after unfreeze | |
INFO: Checking Ceph health | |
INFO: Ceph is healthy. | |
INFO: Ceph unfreeze completed successfully on storage NCNs. | |
INFO: Checking whether ceph filesystem is mounted on /etc/cray/upgrade/csm. | |
INFO: ceph filesystem is already mounted on /etc/cray/upgrade/csm. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/config-data. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/config-data. | |
INFO: Successfully restarted 'cray-sdu-rda' service on ncn-m001 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-m002, ncn-m003 | |
INFO: Sending IPMI power on command to host ncn-m003 | |
INFO: Sending IPMI power on command to host ncn-m002 | |
INFO: Powered on NCNs: ncn-m002, ncn-m003 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w002 | |
INFO: Sending IPMI power on command to host ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w001 | |
INFO: Sending IPMI power on command to host ncn-w003 | |
INFO: Powered on NCNs: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Stopping console logging on ncn-s002,ncn-w002,ncn-w001,ncn-w003,ncn-s003,ncn-m003,ncn-m002,ncn-s001,ncn-w004. | |
INFO: Succeeded with boot of other management NCNs. | |
ncn-m001:/var/opt/cray/sdu # findmnt --type fuse.s3fs | |
TARGET SOURCE FSTYPE OPTIONS | |
/var/opt/cray/config-data s3fs fuse.s3fs rw,relatime,user_id=0,group_id=0,allow_other | |
/var/opt/cray/sdu/collection-mount s3fs fuse.s3fs rw,relatime,user_id=0,group_id=0,allow_other | |
ncn-m001:/var/opt/cray/sdu # ls -l | |
total 1 | |
lrwxrwxrwx 1 root root 11 Jul 26 07:03 adhoc -> adhoc-local | |
drwxr-xr-x 2 root root 20 Jul 26 06:58 adhoc-local | |
lrwxrwxrwx 1 root root 16 Jul 26 07:03 collection -> collection-mount | |
drwxr-xr-x 3 root root 20 Jul 26 06:58 collection-local | |
drwxrwx--- 1 2370 2370 0 Jan 1 1970 collection-mount | |
drwxr-xr-x 2 root root 28 Jul 26 06:57 lock |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# With the /var/opt/cray/sdu/collection link not existing at all. In this case, it should be created and should point at collection-mount | |
ncn-m001:/var/opt/cray/sdu # ls | |
adhoc adhoc-local collection-local collection-mount lock | |
ncn-m001:/var/opt/cray/sdu # ls -l | |
total 1 | |
lrwxrwxrwx 1 root root 11 Jul 26 08:54 adhoc -> adhoc-local | |
drwxr-xr-x 2 root root 20 Jul 26 06:58 adhoc-local | |
drwxr-xr-x 3 root root 20 Jul 26 06:58 collection-local | |
drwxrwx--- 1 2370 2370 0 Jan 1 1970 collection-mount | |
drwxr-xr-x 2 root root 28 Jul 26 06:57 lock | |
ncn-m001:/mnt/developer/sann/26-7-24-sdu # sat bootsys boot --stage ncn-power --ncn-boot-timeout 900 | |
IPMI username: root | |
IPMI password: | |
The following Non-compute Nodes (NCNs) will be included in this operation: | |
managers: | |
- ncn-m002 | |
- ncn-m003 | |
storage: | |
- ncn-s001 | |
- ncn-s002 | |
- ncn-s003 | |
workers: | |
- ncn-w001 | |
- ncn-w002 | |
- ncn-w003 | |
- ncn-w004 | |
The following Non-compute Nodes (NCNs) will be excluded from this operation: | |
managers: | |
- ncn-m001 | |
storage: [] | |
workers: [] | |
Are the above NCN groupings and exclusions correct? [yes,no] yes | |
INFO: Starting console logging on ncn-w004,ncn-m003,ncn-w002,ncn-s002,ncn-m002,ncn-s003,ncn-w003,ncn-s001,ncn-w001. | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Sending IPMI power on command to host ncn-s002 | |
INFO: Sending IPMI power on command to host ncn-s001 | |
INFO: Sending IPMI power on command to host ncn-s003 | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s002: b'ce886c50d372500ed502bf81aab43261' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s001: b'f32a79b747ea84f6e7c4de1b63e9268b' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s003: b'2836ef39c532a6c83a1dd530b3ae4a61' | |
WARNING: warnings.warn( | |
INFO: Powered on NCNs: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Unfreezing Ceph | |
INFO: Running command: ceph osd unset noout | |
INFO: Command output: noout is unset | |
INFO: Running command: ceph osd unset norecover | |
INFO: Command output: norecover is unset | |
INFO: Running command: ceph osd unset nobackfill | |
INFO: Command output: nobackfill is unset | |
INFO: Waiting up to 60 seconds for Ceph to become healthy after unfreeze | |
INFO: Checking Ceph health | |
INFO: Ceph is healthy. | |
INFO: Ceph unfreeze completed successfully on storage NCNs. | |
INFO: Checking whether ceph filesystem is mounted on /etc/cray/upgrade/csm. | |
INFO: ceph filesystem is already mounted on /etc/cray/upgrade/csm. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/config-data. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/config-data. | |
INFO: Successfully restarted 'cray-sdu-rda' service on ncn-m001 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-m002, ncn-m003 | |
INFO: Sending IPMI power on command to host ncn-m002 | |
INFO: Sending IPMI power on command to host ncn-m003 | |
INFO: Powered on NCNs: ncn-m002, ncn-m003 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w002 | |
INFO: Sending IPMI power on command to host ncn-w001 | |
INFO: Sending IPMI power on command to host ncn-w003 | |
INFO: Powered on NCNs: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Stopping console logging on ncn-w004,ncn-m003,ncn-w002,ncn-s002,ncn-m002,ncn-s003,ncn-w003,ncn-s001,ncn-w001. | |
INFO: Succeeded with boot of other management NCNs. | |
ncn-m001:/var/opt/cray/sdu # ls | |
adhoc adhoc-local collection collection-local collection-mount lock | |
ncn-m001:/var/opt/cray/sdu # ls -l | |
total 1 | |
lrwxrwxrwx 1 root root 11 Jul 26 08:55 adhoc -> adhoc-local | |
drwxr-xr-x 2 root root 20 Jul 26 06:58 adhoc-local | |
lrwxrwxrwx 1 root root 16 Jul 26 08:55 collection -> collection-mount | |
drwxr-xr-x 3 root root 20 Jul 26 06:58 collection-local | |
drwxrwx--- 1 2370 2370 0 Jan 1 1970 collection-mount | |
drwxr-xr-x 2 root root 28 Jul 26 06:57 lock |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# With the /var/opt/cray/sdu/collection link not existing at all. In this case, it should be created and should point at collection-mount | |
ncn-m001:/var/opt/cray/sdu # findmnt --type fuse.s3fs | |
TARGET SOURCE FSTYPE OPTIONS | |
/var/opt/cray/config-data s3fs fuse.s3fs rw,relatime,user_id=0,group_id=0,allow_other | |
/var/opt/cray/sdu/collection-mount s3fs fuse.s3fs rw,relatime,user_id=0,group_id=0,allow_other | |
ncn-m001:/var/opt/cray/sdu # umount /var/opt/cray/sdu/collection-mount | |
ncn-m001:/var/opt/cray/sdu # ls | |
adhoc adhoc-local collection collection-local collection-mount lock | |
ncn-m001:/var/opt/cray/sdu # rm collection | |
ncn-m001:/var/opt/cray/sdu # ls -l | |
total 0 | |
lrwxrwxrwx 1 root root 11 Jul 26 08:55 adhoc -> adhoc-local | |
drwxr-xr-x 2 root root 20 Jul 26 06:58 adhoc-local | |
drwxr-xr-x 3 root root 20 Jul 26 06:58 collection-local | |
drwxr-x--- 2 root root 6 Jul 26 06:54 collection-mount | |
drwxr-xr-x 2 root root 28 Jul 26 06:57 lock | |
ncn-m001:/mnt/developer/sann/26-7-24-sdu # sat bootsys boot --stage ncn-power --ncn-boot-timeout 900 | |
IPMI username: root | |
IPMI password: | |
The following Non-compute Nodes (NCNs) will be included in this operation: | |
managers: | |
- ncn-m002 | |
- ncn-m003 | |
storage: | |
- ncn-s001 | |
- ncn-s002 | |
- ncn-s003 | |
workers: | |
- ncn-w001 | |
- ncn-w002 | |
- ncn-w003 | |
- ncn-w004 | |
The following Non-compute Nodes (NCNs) will be excluded from this operation: | |
managers: | |
- ncn-m001 | |
storage: [] | |
workers: [] | |
Are the above NCN groupings and exclusions correct? [yes,no] yes | |
INFO: Starting console logging on ncn-s003,ncn-m003,ncn-s001,ncn-w002,ncn-w001,ncn-w004,ncn-m002,ncn-s002,ncn-w003. | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Sending IPMI power on command to host ncn-s002 | |
INFO: Sending IPMI power on command to host ncn-s001 | |
INFO: Sending IPMI power on command to host ncn-s003 | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s002: b'ce886c50d372500ed502bf81aab43261' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s001: b'f32a79b747ea84f6e7c4de1b63e9268b' | |
WARNING: warnings.warn( | |
WARNING: /sat/venv/lib/python3.9/site-packages/paramiko/client.py:889: UserWarning: Unknown ssh-ed25519 host key for ncn-s003: b'2836ef39c532a6c83a1dd530b3ae4a61' | |
WARNING: warnings.warn( | |
INFO: Powered on NCNs: ncn-s001, ncn-s002, ncn-s003 | |
INFO: Unfreezing Ceph | |
INFO: Running command: ceph osd unset noout | |
INFO: Command output: noout is unset | |
INFO: Running command: ceph osd unset norecover | |
INFO: Command output: norecover is unset | |
INFO: Running command: ceph osd unset nobackfill | |
INFO: Command output: nobackfill is unset | |
INFO: Waiting up to 60 seconds for Ceph to become healthy after unfreeze | |
INFO: Checking Ceph health | |
INFO: Ceph is healthy. | |
INFO: Ceph unfreeze completed successfully on storage NCNs. | |
INFO: Checking whether ceph filesystem is mounted on /etc/cray/upgrade/csm. | |
INFO: ceph filesystem is already mounted on /etc/cray/upgrade/csm. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/sdu/collection-mount. | |
INFO: Checking whether fuse.s3fs filesystem is mounted on /var/opt/cray/config-data. | |
INFO: fuse.s3fs filesystem is already mounted on /var/opt/cray/config-data. | |
INFO: Successfully restarted 'cray-sdu-rda' service on ncn-m001 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-m002, ncn-m003 | |
INFO: Sending IPMI power on command to host ncn-m002 | |
INFO: Sending IPMI power on command to host ncn-m003 | |
INFO: Powered on NCNs: ncn-m002, ncn-m003 | |
INFO: Powering on NCNs and waiting up to 900 seconds for them to be reachable via SSH: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w004 | |
INFO: Sending IPMI power on command to host ncn-w001 | |
INFO: Sending IPMI power on command to host ncn-w003 | |
INFO: Sending IPMI power on command to host ncn-w002 | |
INFO: Powered on NCNs: ncn-w001, ncn-w002, ncn-w003, ncn-w004 | |
INFO: Stopping console logging on ncn-s003,ncn-m003,ncn-s001,ncn-w002,ncn-w001,ncn-w004,ncn-m002,ncn-s002,ncn-w003. | |
INFO: Succeeded with boot of other management NCNs. | |
ncn-m001:/var/opt/cray/sdu # findmnt --type fuse.s3fs | |
TARGET SOURCE FSTYPE OPTIONS | |
/var/opt/cray/config-data s3fs fuse.s3fs rw,relatime,user_id=0,group_id=0,allow_other | |
/var/opt/cray/sdu/collection-mount s3fs fuse.s3fs rw,relatime,user_id=0,group_id=0,allow_other | |
ncn-m001:/var/opt/cray/sdu # ls -l | |
total 1 | |
lrwxrwxrwx 1 root root 11 Jul 26 09:01 adhoc -> adhoc-local | |
drwxr-xr-x 2 root root 20 Jul 26 06:58 adhoc-local | |
lrwxrwxrwx 1 root root 16 Jul 26 09:01 collection -> collection-mount | |
drwxr-xr-x 3 root root 20 Jul 26 06:58 collection-local | |
drwxrwx--- 1 2370 2370 0 Jan 1 1970 collection-mount | |
drwxr-xr-x 2 root root 28 Jul 26 06:57 lock |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment