Skip to content

Instantly share code, notes, and snippets.

@kalaspuffar
Last active November 4, 2023 02:46
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save kalaspuffar/c45c15c7748200450dcc54d8e23d96c5 to your computer and use it in GitHub Desktop.
Save kalaspuffar/c45c15c7748200450dcc54d8e23d96c5 to your computer and use it in GitHub Desktop.
I had to reset a ceph host that the OS broke on.

Reseting a Ceph Host after OS crash or failed update

Preparing cluster

First you need to set the cluster into a noout and norebalance mode in order to ensure that no data is moved around during the process. This is not super crucial but you could loose a lot of time moving data back and forth so it's a good practice.

ceph osd set noout
ceph osd set norebalance

Reinstall

Reinstalling was quite easy but required me to move the computer, open it up and deattach the OSD drive, plugin a USB with the operating system and reinstalling with a fresh system.

Update and upgrade.

After a new install I usually upgrade all packages to the latest version, and if I install an older version I will do an upgrade to the appropriate system so I'm in a good state before applying other software.

apt update
apt upgrade

Adding Ceph software

Adding Ceph software is as easy as adding the keys for Ceph, adding the pacific package location for buster. Then update and install the ceph and ceph-common packages.

wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
echo deb https://download.ceph.com/debian-pacific/ buster main | sudo tee /etc/apt/sources.list.d/ceph.list
apt update
apt install ceph ceph-common

Adding Smartmontools

We also want to add the latest smartmontools from bullseye so we add that using a backport.

echo deb http://deb.debian.org/debian buster-backports main | sudo tee /etc/apt/sources.list
vi /etc/apt/sources.list
apt update
apt install smartmontools/buster-backports

Reboot before configuration

Here is a good spot to do a reboot of the machine to ensure that all packages and libraries are loaded correctly for kernel access.

shutdown -r now

Configuring MON

Next up I configured the cluster by opening the ceph.conf and ceph.client.admin.keyring to add the same information that I have on other cluster members. There is a couple of values in the configuration file that is specific for each host so I updated the values for those but that was a minor effort.

cd /etc/ceph/
vi ceph.conf
vi ceph.client.admin.keyring

Next up we setup the monitor fetching keys and map from the cluster and then just creating the local resources needed to run it.

mkdir /var/lib/ceph/mon/ceph-node5
ceph auth get mon. -o /tmp/monkey
ceph mon getmap -o /tmp/monmap
ceph-mon -i node5 --mkfs --monmap /tmp/monmap --keyring /tmp/monkey 
chown ceph:ceph -R /var/lib/ceph/mon/

Starting the service is also pretty straight forward, we need to remember to enable the service so it will start on next reboot.

systemctl status ceph-mon@node5.service
systemctl start ceph-mon@node5.service
systemctl status ceph-mon@node5.service
systemctl enable ceph-mon@node5.service

Configuring MGR

We also want to install the manager interface, this is just a graphical interface so adding the keys to the configuration directory is enough.

mkdir /var/lib/ceph/mgr/ceph-node5
ceph auth get-or-create mgr.node5 mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-node5/keyring
chown ceph:ceph -R /var/lib/ceph/mgr/

Again we need to start the service and ensure to enable it in order to start when the computer is rebooted.

systemctl status ceph-mgr@node5.service
systemctl start ceph-mgr@node5.service
systemctl status ceph-mgr@node5.service
systemctl enable ceph-mgr@node5.service

Configuring MDS

Last but not least we have the MDS service which works only in memory and uses the cluster to store information so it also require a key to get up and running and then we can start it as usual.

mkdir /var/lib/ceph/mds/ceph-node5
ceph auth get-or-create mds.node5 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node5/keyring
chown ceph:ceph -R /var/lib/ceph/mds

Again we need to start the service and ensure to enable it in order to start when the computer is rebooted.

systemctl status ceph-mds@node5.service
systemctl start ceph-mds@node5.service
systemctl status ceph-mds@node5.service
systemctl enable ceph-mds@node5.service

Configuring OSD

The last part installing a OSD is a little bit more involved if you do it from scratch but getting an old host up and running was fairly easy. Creating config directory and adding the keyring is standard practice at this point.

mkdir /var/lib/ceph/osd/ceph-3
ceph auth get osd.3 > /var/lib/ceph/osd/ceph-3/keyring
chown -R ceph:ceph /var/lib/ceph/osd/

Next up I listed all the lvm devices on the host and realized that I already had the correct drive configured and ready so I only had to activate it and then enable the service to ensure it starting again after a reboot.

ceph-volume lvm list
ceph-volume lvm activate --all
systemctl status ceph-osd@3.service
systemctl enable ceph-osd@3.service

Last step

When all is said and done it worked and I could reset the status flags of my cluster so it could return to normal operation.

ceph osd unset noout
ceph osd unset norebalance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment