Skip to content

Instantly share code, notes, and snippets.

@mdbooth
Last active February 9, 2024 16:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mdbooth/4be3d0183e62c9c78961337a871e65a4 to your computer and use it in GitHub Desktop.
Save mdbooth/4be3d0183e62c9c78961337a871e65a4 to your computer and use it in GitHub Desktop.
OpenShift on OpenStack with etcd on local ephemeral disk
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 98-var-lib-etcd
spec:
config:
ignition:
version: 3.4.0
systemd:
units:
- contents: |
[Unit]
Description=Mount local-etcd to /var/lib/etcd
[Mount]
# This must be mounted by device, not label, to ensure systemd generates
# the device dependency we use below to trigger filesystem creation.
What=/dev/disk/by-label/local-etcd
Where=/var/lib/etcd
Type=xfs
Options=defaults,prjquota
[Install]
WantedBy=local-fs.target
enabled: true
name: var-lib-etcd.mount
- contents: |
[Unit]
Description=Create local-etcd filesystem
DefaultDependencies=no
After=local-fs-pre.target
# Don't run if the filesystem already exists
ConditionPathIsSymbolicLink=!/dev/disk/by-label/local-etcd
[Service]
Type=oneshot
RemainAfterExit=yes
# Fail with an obvious message if /dev/disk/by-label/ephemeral0
# doesn't exist.
# This is important so we can fail the device unit and therefore the
# mount immediately without a timeout.
ExecStart=/bin/bash -c "[ -L /dev/disk/by-label/ephemeral0 ] || ( >&2 echo Ephemeral disk does not exist; /usr/bin/false )"
ExecStart=/usr/sbin/mkfs.xfs -f -L local-etcd /dev/disk/by-label/ephemeral0
[Install]
# The mount unit has an implicit dependency on its device. We run as
# a dependency of the device unit. This allows us to create the device
# if required, or to fail fast if it cannot be created.
RequiredBy=dev-disk-by\x2dlabel-local\x2detcd.device
enabled: true
name: create-local-etcd.service
- contents: |
[Unit]
Description=Migrate existing data to local etcd
# Run after /var/lib/etcd is mounted, but before crio starts so etcd
# isn't running yet.
After=var-lib-etcd.mount
Before=crio.service
# Only migrate etcd data if /var/lib/etcd is mounted, doesn't contain
# a member directory, and the ostree does
Requisite=var-lib-etcd.mount
ConditionPathExists=!/var/lib/etcd/member
ConditionPathIsDirectory=/sysroot/ostree/deploy/rhcos/var/lib/etcd/member
[Service]
Type=oneshot
RemainAfterExit=yes
# Clean up any previous migration state
ExecStart=/bin/bash -c "if [ -d /var/lib/etcd/member.migrate ]; then rm -rf /var/lib/etcd/member.migrate; fi"
# Copy and move in separate steps to ensure atomic creation of a
# complete member directory
ExecStart=/usr/bin/cp -aZ /sysroot/ostree/deploy/rhcos/var/lib/etcd/member/ /var/lib/etcd/member.migrate
ExecStart=/usr/bin/mv /var/lib/etcd/member.migrate /var/lib/etcd/member
[Install]
RequiredBy=var-lib-etcd.mount
enabled: true
name: migrate-to-local-etcd.service
- contents: |
[Unit]
Description=Relabel /var/lib/etcd
# Run after we've migrated any existing content, but before crio so
# etcd isn't running yet.
After=migrate-to-local-etcd.service
Before=crio.service
# Only if /var/lib/etcd is mounted
Requisite=var-lib-etcd.mount
[Service]
Type=oneshot
RemainAfterExit=yes
# Do a quick check of the mountpoint directory before doing a full recursive relabel
# If restorecon /var/lib/etcd would not relabel the directory, don't
# run the recursive relabel
ExecCondition=/bin/bash -c "[ -n \"$(restorecon -nv /var/lib/etcd)\" ]"
ExecStart=/usr/sbin/restorecon -R /var/lib/etcd
[Install]
RequiredBy=var-lib-etcd.mount
enabled: true
name: relabel-var-lib-etcd.service
@mdbooth
Copy link
Author

mdbooth commented Feb 8, 2024

Test plan:

Prep

  • Install cluster. Control plane nodes use root volumes
  • Update CPMS to add local etcd
    • Update flavor to use quicklab.amd.ocp3.master
    • Add additionalBlockDevices stanza
  • Wait for rollout to complete

Test 1

This test applies the MachineConfig in place to nodes which already have an ephemeral disk. To work correctly it must migrate etcd from the root disk to the new etcd mount.

  • Apply MachineConfig
  • Wait for rollout to complete

Verify:

  • The cluster is operating correctly
  • /var/lib/etcd is a new mount
  • /var/lib/etcd has the correct SELinux labels

Test 2

This tests that the MachineConfig operates correctly when configured on a machine with no ephemeral disk.

  • Update CPMS to remove local etcd
  • Wait for rollout to complete

Verify:

  • The cluster is operating correctly
  • There will be an additional failed systemd service: create-local-etcd.service

Test 3

This tests that it is possible to remove the local etcd configuration when it is not in use.

  • Remove MachineConfig

Verify:

  • The cluster is operating correctly
  • There are no additional failed systemd services

Test 4

This tests that the MachineConfig works correctly when deployed on a new machine without any existing etcd state to migrate.

  • Apply MachineConfig again
  • Wait for rollout to complete
  • Update CPMS to add local etcd
  • Wait for rollout to complete

Verify:

  • The cluster is operating correctly
  • /var/lib/etcd is a new mount
  • /var/lib/etcd has the correct SELinux labels

@EmilienM
Copy link

EmilienM commented Feb 8, 2024

This should be covered by openshift/release#48616

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment