ruo91/ocp3-etcd-backup-restore.md

## ocp3-etcd-backup-restore.md

      
    Raw
  

              ocp3-etcd-backup-restore.md
            
          
    OpenShift v3.11 - ETCD 백업 및 복구 방법

본 글은 OpenShift v3.11 환경에서 클러스터가 정상적이지 않거나 사용자에 의해

object resource가 손상되었을 경우를 대비해 백업 및 복구하는 방법에 대해서 작성 되었다.
또한, OpenShift v3.11 버전은 레드햇(RedHat)에서 EoL(End of Life) 및 EoS(End of Service)된

제품이기 때문에 RedHat에서 더 이상 지원하지 않음을 알린다.
1. ETCD 백업

etcd는 kubernetes에서 사용되는 모든 정보들이 저장되어 있는 key/value 기반의 database 이다.

etcd 백업은 크게 2가지 방법으로 수행이 가능하다.
1.1. 명령어 백업

OpenShift v3.11에서 Control Plane(Master Nodes)에서 etcdctl 명령어로 snapshot 백업이 가능하다.

etcdctl 명령어는 etcd 패키지를 설치 후 사용이 가능하고,

사용법은 해당 명령어를 각 Master 노드에서 수행하고 백업 파일이 저장될 경로와 파일 이름을 지정 해주면 된다.
- ETCD 패키지 설치

[root@master ~]# yum install etcd

- 백업 디렉토리 위치 확인

snapshot 파일이 저장될 경로는 master에 Static Pod로 구동되어 있는 Container에 존재하므로,

/var/lib/etcd/ 하위 디렉토리에 백업 받는것이 좋다.
[root@master ~]# cat /etc/origin/node/pods/etcd.yaml | sed -n '41,65p'
    name: etcd
    securityContext:
      privileged: true
    volumeMounts:
    - mountPath: /etc/etcd/
      name: master-config
      readOnly: true
    - mountPath: /var/lib/etcd/
      name: master-data
    - mountPath: /etc/localtime
      name: host-localtime
    workingDir: /var/lib/etcd
  hostNetwork: true
  priorityClassName: system-node-critical
  restartPolicy: Always
  volumes:
  - hostPath:
      path: /etc/etcd/
    name: master-config
  - hostPath:
      path: /var/lib/etcd
    name: master-data
  - hostPath:
      path: /etc/localtime
    name: host-localtime

- 백업 디렉토리 생성 및 권한 부여

[root@master ~]# mkdir -p /var/lib/etcd/backup/
[root@master ~]# chown etcd:etcd -R /var/lib/etcd/backup/
[root@master ~]# restorecon -Rv /var/lib/etcd/backup/

- ETCD snapshot 백업

[root@master ~]# etcdctl3 snapshot save /var/lib/etcd/backup/snapshot-$(date +%Y-%m-%d).db

- ETCD snapshot 파일 보관

Bastion에서 백업된 snapshot 파일을 복사해온다.
(복사시 Master01,02,03 노드 한곳에서 백업된 파일만 복사 해온다.)
[root@bastion ~]# scp root@master01:/var/lib/etcd/backup/snapshot-*.db /opt/

1.2. cronjob 백업

위의 "1.1. 명령어 백업"의 내용을 kubernetes의 cronjob 기능으로 자동 수행 하도록 설정하는 방식이다.

이 방식은 총 4가지의 object resource를 kube-system namespace에 생성하여 구성한다.
구성전 etcd snapshot 파일이 백업될 디렉토리를 각 Master 노드에서 모두 생성한다.
- 백업 디렉토리 생성 및 권한 부여

[root@master ~]# mkdir -p /var/lib/etcd/backup/
[root@master ~]# chown etcd:etcd -R /var/lib/etcd/backup/
[root@master ~]# restorecon -Rv /var/lib/etcd/backup/

- Service Account 생성

CronJobs을 수행할 Service Account를 생성한다.
[root@bastion ~]# vi 00_service-account.yaml
kind: ServiceAccount
apiVersion: v1
metadata:
  name: cluster-backup
  namespace: kube-system
  labels:
    cluster-backup: "true"
[root@bastion ~]# oc create -f 00_service-account.yaml

- Cluster Role 생성

Service Account에 클러스터의 권한을 부여 한다.
[root@bastion ~]# vi 01_cluster-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-backup
rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'
- nonResourceURLs:
  - '*'
  verbs:
  - '*'
[root@bastion ~]# oc create -f 01_cluster-role.yaml

- Cluster Role Binding 생성

Service Account에 ClusterRole을 반영할 수 있도록 ClusterRoleBinding을 생성한다.
[root@bastion ~]# vi 02_cluster-role-binding.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cluster-backup
  labels:
    cluster-backup: "true"
subjects:
  - kind: ServiceAccount
    name: cluster-backup
    namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-backup
[root@bastion ~]# oc create -f 02_cluster-role-binding.yaml

- Cronjob 생성

매주 일요일 00시 30분에 수행 날짜를 기준으로 디렉토리를 생성 후 7일치의 디렉토리만 남기고 etcd 백업을 진행한다.
[root@bastion ~]# vi 03_cronjobs-etcd-backup.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: kube-system
spec:
  # Sunday, 00:30
  schedule: "30 0 * * 0"
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  concurrencyPolicy: Forbid
  suspend: false
  jobTemplate:
    metadata:
      creationTimestamp: null
      labels:
       etcd-backup: "true"
    spec:
      backoffLimit: 0
      template:
        metadata:
          creationTimestamp: null
          labels:
            etcd-backup: "true"
        spec:
          containers:
          - name: etcd-backup
            args:
            - "-c"
            - oc get pod -n kube-system -o name | cut -d '/' -f '2' | grep 'master-etcd' | xargs -I {} -- oc exec {} -n kube-system -c etcd -- bash -c "mkdir -p /var/lib/etcd/backup/$(date +%Y-%m-%d)/ && ETCDCTL_API=3 etcdctl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --endpoints https://$(oc get node -l node-role.kubernetes.io/master --no-headers -o name | cut -d '/' -f '2' | sed -n 1p):2379,https://$(oc get node -l node-role.kubernetes.io/master --no-headers -o name | cut -d '/' -f '2' | sed -n 2p):2379,https://$(oc get node -l node-role.kubernetes.io/master --no-headers -o name | cut -d '/' -f '2' | sed -n 3p):2379 snapshot save /var/lib/etcd/backup/$(date +%Y-%m-%d)/snapshot.db && find /var/lib/etcd/backup/ -type d -ctime +'7' -delete"
            command:
            - "/bin/bash"
            image: "registry.ocp3.local:5000/openshift3/ose-cli:v3.11"
            imagePullPolicy: IfNotPresent
            resources:
              requests:
                cpu: 100m
                memory: 256Mi
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: FallbackToLogsOnError
          securityContext:
            privileged: true
            runAsUser: 0
          tolerations:
            - operator: Exists
          nodeSelector:
            node-role.kubernetes.io/master: 'true'
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          serviceAccount: cluster-backup
          serviceAccountName: cluster-backup
          terminationGracePeriodSeconds: 30
          activeDeadlineSeconds: 500

- Cronjobs 확인

[root@bastion ~]# oc get cronjobs -n kube-system
NAME          SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE
etcd-backup   30 0 * * 0   False     1         2s              25s

- 백업 수행 된 Jobs 확인

[root@bastion ~]# oc get jobs -n kube-system
NAME                     DESIRED   SUCCESSFUL   AGE
etcd-backup-1675047600   1         1            22s
[root@bastion etcd-backup]# oc get pod -l etcd-backup

- 백업 수행 된 Pod 확인

[root@bastion ~]# oc get pod -l etcd-backup
NAME                           READY     STATUS      RESTARTS   AGE
etcd-backup-1675047600-6nq9z   0/1       Completed   0          30s

- 백업 수행 된 Pod 로그

ose-cli pod를 통해 master 노드에 구동된 etcd static pod에 접근하여 백업을 수행 한다.
[root@bastion ~]# oc logs -f etcd-backup-1675047600-6nq9z
Snapshot saved at /var/lib/etcd/backup/2023-01-30/snapshot.db
Snapshot saved at /var/lib/etcd/backup/2023-01-30/snapshot.db
Snapshot saved at /var/lib/etcd/backup/2023-01-30/snapshot.db

2. ETCD 복구

복구 방식은 백업 과정에서 생성한 snapshot 파일을 기준으로 각 master 노드에서 수행한다.
2.1. Static Pod 중지

모든 Master 노드에서 static pod로 사용되는 etcd.yaml, apiserver.yaml, controller.yaml 파일을 임시로 다른 곳으로 옮긴다.
- 임시 디렉토리 생성

[root@bastion ~]# for masters in {master01,master02,master03}; do
  ssh root@$masters.ocp3.local "mkdir -p /etc/origin/node/pods-stopped";
done

- YAML 파일 이동

[root@bastion ~]# for masters in {master01,master02,master03}; do
  ssh root@$masters.ocp3.local "mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped";
done

2.2. ETCD snapshot 백업본 파일 복사

Master 노드 중 한곳을 recovery host로 지정하고 etcd 백업본을 각 나머지 master 노드에 파일을 모두 복사한다.

본 내용에서는 master01 노드를 recovery host로 지정한다.
- Recovery Host: ETCD snapshot 파일 복사

[root@master01 ~]# cp /var/lib/etcd/backup/2023-01-30/snapshot.db /opt/snapshot-2023-01-30.db

- Recovery Host: master02,03에 ETCD snapshot 파일 복사

[root@master01 ~]# scp /opt/snapshot-2023-01-30.db root@master02.ocp3.local:/opt/snapshot-2023-01-30.db
[root@master01 ~]# scp /opt/snapshot-2023-01-30.db root@master03.ocp3.local:/opt/snapshot-2023-01-30.db

2.3. ETCD Member 디렉토리 삭제

recovery host(master01)의 snapshot 백업 데이터를 기준으로 복구하기 위해 기존 etcd 데이터를 삭제한다.

(기존 데이터를 삭제하지 않으면 데이터 정합성이 일치하지 않아 복구가 되지 않는다.)
[root@bastion ~]# for masters in {master01,master02,master03}; do
  ssh root@$masters.ocp3.local "rm -rf /var/lib/etcd";
done

2.4. ETCD 복구

모든 master 노드에 아래 명령어를 수행하여 ETCD를 복구 한다.
[root@master ~]# source /etc/etcd/etcd.conf
[root@master ~]# export ETCDCTL_API=3
[root@master ~]# etcdctl snapshot restore /opt/snapshot-2023-01-30.db \
--name $ETCD_NAME \
--initial-cluster $ETCD_INITIAL_CLUSTER \
--initial-cluster-token $ETCD_INITIAL_CLUSTER_TOKEN \
--initial-advertise-peer-urls $ETCD_INITIAL_ADVERTISE_PEER_URLS \
--data-dir /var/lib/etcd

- 디렉토리 권한 부여 및 SELinux Context 복구

[root@master ~]# chown etcd:etcd -R /var/lib/etcd
[root@master ~]# restorecon -Rv /var/lib/etcd

2.5. Static Pod 시작

모든 Master 노드에서 static pod로 사용되는 etcd.yaml, apiserver.yaml, controller.yaml 파일을 원본 디렉토리로 옮긴다.
[root@bastion ~]# for masters in {master01,master02,master03}; do
  ssh root@$masters.ocp3.local "mv /etc/origin/node/pods-stopped/* /etc/origin/node/pods/";
done

2.6. 클러스터 상태 확인

복구가 완료되면 환경에 따라 최소 10분이내에 복구가 완료 된다.
- ETCD 상태 확인

[root@bastion ~]# oc get pod -o wide -n kube-system | grep etcd
master-etcd-master01.ocp3.local          1/1       Running   0          13h       172.16.45.20   master01.ocp3.local   <none>
master-etcd-master02.ocp3.local          1/1       Running   0          13h       172.16.45.21   master02.ocp3.local   <none>
master-etcd-master03.ocp3.local          1/1       Running   0          13h       172.16.45.22   master03.ocp3.local   <none>

- ETCD Member 목록 확인

[root@master01 ~]# ETCD_ALL_ENDPOINTS=` etcdctl3 --write-out=fields   member list | awk '/ClientURL/{printf "%s%s",sep,$3; sep=","}'`
[root@master01 ~]# etcdctl3 --endpoints=$ETCD_ALL_ENDPOINTS  endpoint status  --write-out=table
+----------------------------------+------------------+---------+---------+-----------+-----------+------------+
|             ENDPOINT             |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://master01.ocp3.local:2379 | 57b1603f55394850 |  3.2.32 |   30 MB |     false |         2 |     174176 |
|        https://172.16.45.20:2379 | 57b1603f55394850 |  3.2.32 |   30 MB |     false |         2 |     174176 |
|        https://172.16.45.21:2379 | 6b0db1f119b9991e |  3.2.32 |   30 MB |      true |         2 |     174176 |
|        https://172.16.45.22:2379 | 7899562af965edcd |  3.2.32 |   30 MB |     false |         2 |     174176 |
+----------------------------------+------------------+---------+---------+-----------+-----------+------------+

[root@master02 ~]# ETCD_ALL_ENDPOINTS=` etcdctl3 --write-out=fields   member list | awk '/ClientURL/{printf "%s%s",sep,$3; sep=","}'`
[root@master02 ~]# etcdctl3 --endpoints=$ETCD_ALL_ENDPOINTS  endpoint status  --write-out=table
+----------------------------------+------------------+---------+---------+-----------+-----------+------------+
|             ENDPOINT             |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://master02.ocp3.local:2379 | 6b0db1f119b9991e |  3.2.32 |   30 MB |      true |         2 |     174494 |
|        https://172.16.45.20:2379 | 57b1603f55394850 |  3.2.32 |   30 MB |     false |         2 |     174494 |
|        https://172.16.45.21:2379 | 6b0db1f119b9991e |  3.2.32 |   30 MB |      true |         2 |     174494 |
|        https://172.16.45.22:2379 | 7899562af965edcd |  3.2.32 |   30 MB |     false |         2 |     174494 |
+----------------------------------+------------------+---------+---------+-----------+-----------+------------+

[root@master03 ~]# ETCD_ALL_ENDPOINTS=` etcdctl3 --write-out=fields   member list | awk '/ClientURL/{printf "%s%s",sep,$3; sep=","}'`
[root@master03 ~]# etcdctl3 --endpoints=$ETCD_ALL_ENDPOINTS  endpoint status  --write-out=table
+----------------------------------+------------------+---------+---------+-----------+-----------+------------+
|             ENDPOINT             |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://master03.ocp3.local:2379 | 7899562af965edcd |  3.2.32 |   30 MB |     false |         2 |     174499 |
|        https://172.16.45.20:2379 | 57b1603f55394850 |  3.2.32 |   30 MB |     false |         2 |     174499 |
|        https://172.16.45.21:2379 | 6b0db1f119b9991e |  3.2.32 |   30 MB |      true |         2 |     174499 |
|        https://172.16.45.22:2379 | 7899562af965edcd |  3.2.32 |   30 MB |     false |         2 |     174499 |
+----------------------------------+------------------+---------+---------+-----------+-----------+------------+

- 클러스터 노드 확인

[root@bastion ~]# oc get node
NAME                   STATUS    ROLES     AGE       VERSION
infra01.ocp3.local     Ready     infra     13h       v1.11.0+d4cacc0
infra02.ocp3.local     Ready     infra     13h       v1.11.0+d4cacc0
infra03.ocp3.local     Ready     infra     13h       v1.11.0+d4cacc0
logging01.ocp3.local   Ready     logging   13h       v1.11.0+d4cacc0
logging02.ocp3.local   Ready     logging   13h       v1.11.0+d4cacc0
logging03.ocp3.local   Ready     logging   13h       v1.11.0+d4cacc0
master01.ocp3.local    Ready     master    13h       v1.11.0+d4cacc0
master02.ocp3.local    Ready     master    13h       v1.11.0+d4cacc0
master03.ocp3.local    Ready     master    13h       v1.11.0+d4cacc0
router01.ocp3.local    Ready     router    13h       v1.11.0+d4cacc0
router02.ocp3.local    Ready     router    13h       v1.11.0+d4cacc0
worker01.ocp3.local    Ready     worker    13h       v1.11.0+d4cacc0
worker02.ocp3.local    Ready     worker    13h       v1.11.0+d4cacc0
worker03.ocp3.local    Ready     worker    13h       v1.11.0+d4cacc0

- Pod 확인

[root@bastion ~]# oc get pod -o wide --all-namespaces
NAMESPACE                  NAME                                           READY     STATUS      RESTARTS   AGE       IP             NODE                   NOMINATED NODE
default                    docker-registry-2-22qzg                        1/1       Running     0          13h       10.128.5.11    infra02.ocp3.local     <none>
default                    docker-registry-2-cn5xr                        1/1       Running     0          13h       10.128.7.13    infra03.ocp3.local     <none>
default                    docker-registry-2-kn2p4                        1/1       Running     0          13h       10.128.6.9     infra01.ocp3.local     <none>
default                    logging-eventrouter-1-2wjz5                    1/1       Running     0          11h       10.128.8.20    logging01.ocp3.local   <none>
default                    logging-eventrouter-1-v7z5f                    1/1       Running     0          11h       10.128.9.18    logging02.ocp3.local   <none>
default                    logging-eventrouter-1-xrv7v                    1/1       Running     0          11h       10.128.10.17   logging03.ocp3.local   <none>
default                    router-1-bbvs4                                 1/1       Running     0          13h       172.16.45.31   router02.ocp3.local    <none>
default                    router-1-tjtqp                                 1/1       Running     0          13h       172.16.45.30   router01.ocp3.local    <none>
kube-system                master-api-master01.ocp3.local                 1/1       Running     0          13h       172.16.45.20   master01.ocp3.local    <none>
kube-system                master-api-master02.ocp3.local                 1/1       Running     0          13h       172.16.45.21   master02.ocp3.local    <none>
kube-system                master-api-master03.ocp3.local                 1/1       Running     0          13h       172.16.45.22   master03.ocp3.local    <none>
kube-system                master-controllers-master01.ocp3.local         1/1       Running     0          13h       172.16.45.20   master01.ocp3.local    <none>
kube-system                master-controllers-master02.ocp3.local         1/1       Running     0          13h       172.16.45.21   master02.ocp3.local    <none>
kube-system                master-controllers-master03.ocp3.local         1/1       Running     0          13h       172.16.45.22   master03.ocp3.local    <none>
kube-system                master-etcd-master01.ocp3.local                1/1       Running     0          13h       172.16.45.20   master01.ocp3.local    <none>
kube-system                master-etcd-master02.ocp3.local                1/1       Running     0          13h       172.16.45.21   master02.ocp3.local    <none>
kube-system                master-etcd-master03.ocp3.local                1/1       Running     0          13h       172.16.45.22   master03.ocp3.local    <none>
openshift-console          console-5f5bb7f877-842hc                       1/1       Running     0          13h       10.128.2.7     master03.ocp3.local    <none>
openshift-console          console-5f5bb7f877-nbwcz                       1/1       Running     0          13h       10.128.1.9     master02.ocp3.local    <none>
openshift-console          console-5f5bb7f877-pzld7                       1/1       Running     0          13h       10.128.0.10    master01.ocp3.local    <none>
openshift-infra            bootstrap-autoapprover-0                       1/1       Running     0          13h       10.128.0.9     master01.ocp3.local    <none>
openshift-infra            hawkular-cassandra-1-xqrtq                     1/1       Running     0          12h       10.128.5.12    infra02.ocp3.local     <none>
openshift-infra            hawkular-metrics-75vsp                         1/1       Running     0          12h       10.128.5.13    infra02.ocp3.local     <none>
openshift-infra            hawkular-metrics-schema-zsxxx                  0/1       Completed   0          12h       10.128.12.2    worker02.ocp3.local    <none>
openshift-infra            heapster-x5tql                                 1/1       Running     0          12h       10.128.7.10    infra03.ocp3.local     <none>
openshift-logging          logging-curator-1675017000-8klvn               0/1       Completed   0          9h        10.128.9.21    logging02.ocp3.local   <none>
openshift-logging          logging-curator-ops-1675017000-rrhfj           0/1       Completed   0          9h        10.128.8.23    logging01.ocp3.local   <none>
openshift-logging          logging-es-data-master-47t2x0lo-1-5fhsv        2/2       Running     0          11h       10.128.10.18   logging03.ocp3.local   <none>
openshift-logging          logging-es-data-master-mp39sg6z-1-l5cr9        2/2       Running     0          11h       10.128.9.16    logging02.ocp3.local   <none>
openshift-logging          logging-es-data-master-npezejqj-1-dhn5q        2/2       Running     0          11h       10.128.8.18    logging01.ocp3.local   <none>
openshift-logging          logging-es-ops-data-master-pei9e4d3-1-g5dfp    2/2       Running     0          11h       10.128.10.19   logging03.ocp3.local   <none>
openshift-logging          logging-fluentd-5wjt2                          1/1       Running     0          10h       10.128.3.5     router01.ocp3.local    <none>
openshift-logging          logging-fluentd-8drkd                          1/1       Running     0          10h       10.128.13.10   worker03.ocp3.local    <none>
openshift-logging          logging-fluentd-bvxb2                          1/1       Running     0          10h       10.128.10.22   logging03.ocp3.local   <none>
openshift-logging          logging-fluentd-cczv2                          1/1       Running     0          10h       10.128.2.9     master03.ocp3.local    <none>
openshift-logging          logging-fluentd-gzd56                          1/1       Running     0          10h       10.128.12.12   worker02.ocp3.local    <none>
openshift-logging          logging-fluentd-hvl25                          1/1       Running     0          10h       10.128.7.11    infra03.ocp3.local     <none>
openshift-logging          logging-fluentd-lmd9v                          1/1       Running     0          10h       10.128.1.8     master02.ocp3.local    <none>
openshift-logging          logging-fluentd-m6flf                          1/1       Running     0          10h       10.128.9.19    logging02.ocp3.local   <none>
openshift-logging          logging-fluentd-sz958                          1/1       Running     0          10h       10.128.4.6     router02.ocp3.local    <none>
openshift-logging          logging-fluentd-t8kxr                          1/1       Running     0          10h       10.128.5.10    infra02.ocp3.local     <none>
openshift-logging          logging-fluentd-tnpgr                          1/1       Running     0          10h       10.128.6.10    infra01.ocp3.local     <none>
openshift-logging          logging-fluentd-v55kk                          1/1       Running     0          10h       10.128.8.21    logging01.ocp3.local   <none>
openshift-logging          logging-fluentd-vg2qd                          1/1       Running     0          10h       10.128.11.10   worker01.ocp3.local    <none>
openshift-logging          logging-fluentd-z89m9                          1/1       Running     0          10h       10.128.0.11    master01.ocp3.local    <none>
openshift-logging          logging-kibana-1-58l5l                         2/2       Running     0          11h       10.128.10.20   logging03.ocp3.local   <none>
openshift-logging          logging-kibana-1-vdlpt                         2/2       Running     0          11h       10.128.9.20    logging02.ocp3.local   <none>
openshift-logging          logging-kibana-1-vl2fx                         2/2       Running     0          11h       10.128.8.19    logging01.ocp3.local   <none>
openshift-logging          logging-kibana-ops-1-kdzlw                     2/2       Running     0          11h       10.128.10.21   logging03.ocp3.local   <none>
openshift-logging          logging-kibana-ops-1-n8tnx                     2/2       Running     0          11h       10.128.9.17    logging02.ocp3.local   <none>
openshift-logging          logging-kibana-ops-1-vfpvk                     2/2       Running     0          11h       10.128.8.22    logging01.ocp3.local   <none>
openshift-metrics-server   metrics-server-7cb48555f7-jhr89                1/1       Running     0          12h       10.128.7.12    infra03.ocp3.local     <none>
openshift-monitoring       alertmanager-main-0                            3/3       Running     0          11h       10.128.12.11   worker02.ocp3.local    <none>
openshift-monitoring       alertmanager-main-1                            3/3       Running     0          11h       10.128.11.11   worker01.ocp3.local    <none>
openshift-monitoring       alertmanager-main-2                            3/3       Running     0          11h       10.128.13.11   worker03.ocp3.local    <none>
openshift-monitoring       cluster-monitoring-operator-79c559d786-z7ghz   1/1       Running     0          11h       10.128.11.8    worker01.ocp3.local    <none>
openshift-monitoring       grafana-784cbccb8f-zl66b                       2/2       Running     0          11h       10.128.12.10   worker02.ocp3.local    <none>
openshift-monitoring       kube-state-metrics-6cf558b5f4-dsmlk            3/3       Running     0          11h       10.128.12.9    worker02.ocp3.local    <none>
openshift-monitoring       node-exporter-2lp9b                            2/2       Running     0          11h       172.16.45.52   logging03.ocp3.local   <none>
openshift-monitoring       node-exporter-62xh8                            2/2       Running     0          11h       172.16.45.30   router01.ocp3.local    <none>
openshift-monitoring       node-exporter-7v5lr                            2/2       Running     0          11h       172.16.45.42   infra03.ocp3.local     <none>
openshift-monitoring       node-exporter-99h4f                            2/2       Running     0          11h       172.16.45.41   infra02.ocp3.local     <none>
openshift-monitoring       node-exporter-9jprb                            2/2       Running     0          11h       172.16.45.21   master02.ocp3.local    <none>
openshift-monitoring       node-exporter-cwb4p                            2/2       Running     0          11h       172.16.45.51   logging02.ocp3.local   <none>
openshift-monitoring       node-exporter-ffmqx                            2/2       Running     0          11h       172.16.45.20   master01.ocp3.local    <none>
openshift-monitoring       node-exporter-jj8hw                            2/2       Running     0          11h       172.16.45.62   worker03.ocp3.local    <none>
openshift-monitoring       node-exporter-kftw5                            2/2       Running     0          11h       172.16.45.22   master03.ocp3.local    <none>
openshift-monitoring       node-exporter-mp8kc                            2/2       Running     0          11h       172.16.45.31   router02.ocp3.local    <none>
openshift-monitoring       node-exporter-ql62l                            2/2       Running     0          11h       172.16.45.61   worker02.ocp3.local    <none>
openshift-monitoring       node-exporter-spglb                            2/2       Running     0          11h       172.16.45.40   infra01.ocp3.local     <none>
openshift-monitoring       node-exporter-vqnjh                            2/2       Running     0          11h       172.16.45.50   logging01.ocp3.local   <none>
openshift-monitoring       node-exporter-zs82p                            2/2       Running     0          11h       172.16.45.60   worker01.ocp3.local    <none>
openshift-monitoring       prometheus-k8s-0                               4/4       Running     1          2h        10.128.11.12   worker01.ocp3.local    <none>
openshift-monitoring       prometheus-k8s-1                               4/4       Running     1          2h        10.128.12.13   worker02.ocp3.local    <none>
openshift-monitoring       prometheus-operator-57548d4b75-422r7           1/1       Running     0          11h       10.128.13.8    worker03.ocp3.local    <none>
openshift-node             sync-2nkt7                                     1/1       Running     0          13h       172.16.45.20   master01.ocp3.local    <none>
openshift-node             sync-4vwcm                                     1/1       Running     0          13h       172.16.45.52   logging03.ocp3.local   <none>
openshift-node             sync-6279f                                     1/1       Running     0          13h       172.16.45.42   infra03.ocp3.local     <none>
openshift-node             sync-6vbn2                                     1/1       Running     0          13h       172.16.45.50   logging01.ocp3.local   <none>
openshift-node             sync-7zpj8                                     1/1       Running     0          13h       172.16.45.60   worker01.ocp3.local    <none>
openshift-node             sync-bm47j                                     1/1       Running     0          13h       172.16.45.62   worker03.ocp3.local    <none>
openshift-node             sync-gnlz8                                     1/1       Running     0          13h       172.16.45.31   router02.ocp3.local    <none>
openshift-node             sync-hrmxz                                     1/1       Running     0          13h       172.16.45.22   master03.ocp3.local    <none>
openshift-node             sync-k8nq9                                     1/1       Running     0          13h       172.16.45.21   master02.ocp3.local    <none>
openshift-node             sync-m768f                                     1/1       Running     0          13h       172.16.45.61   worker02.ocp3.local    <none>
openshift-node             sync-mstzd                                     1/1       Running     0          13h       172.16.45.41   infra02.ocp3.local     <none>
openshift-node             sync-n9wzg                                     1/1       Running     0          13h       172.16.45.51   logging02.ocp3.local   <none>
openshift-node             sync-sfgns                                     1/1       Running     0          13h       172.16.45.40   infra01.ocp3.local     <none>
openshift-node             sync-xz92j                                     1/1       Running     0          13h       172.16.45.30   router01.ocp3.local    <none>
openshift-sdn              ovs-24n89                                      1/1       Running     0          13h       172.16.45.21   master02.ocp3.local    <none>
openshift-sdn              ovs-4qjt8                                      1/1       Running     0          13h       172.16.45.31   router02.ocp3.local    <none>
openshift-sdn              ovs-62pqj                                      1/1       Running     0          13h       172.16.45.62   worker03.ocp3.local    <none>
openshift-sdn              ovs-62ssg                                      1/1       Running     0          13h       172.16.45.41   infra02.ocp3.local     <none>
openshift-sdn              ovs-6fqwt                                      1/1       Running     0          13h       172.16.45.42   infra03.ocp3.local     <none>
openshift-sdn              ovs-7f55g                                      1/1       Running     0          13h       172.16.45.40   infra01.ocp3.local     <none>
openshift-sdn              ovs-7jcjs                                      1/1       Running     0          13h       172.16.45.61   worker02.ocp3.local    <none>
openshift-sdn              ovs-7m4z5                                      1/1       Running     0          13h       172.16.45.30   router01.ocp3.local    <none>
openshift-sdn              ovs-d9hs5                                      1/1       Running     0          13h       172.16.45.51   logging02.ocp3.local   <none>
openshift-sdn              ovs-k9lhn                                      1/1       Running     0          13h       172.16.45.60   worker01.ocp3.local    <none>
openshift-sdn              ovs-nrd5r                                      1/1       Running     0          13h       172.16.45.50   logging01.ocp3.local   <none>
openshift-sdn              ovs-t8527                                      1/1       Running     0          13h       172.16.45.20   master01.ocp3.local    <none>
openshift-sdn              ovs-wlk2k                                      1/1       Running     0          13h       172.16.45.22   master03.ocp3.local    <none>
openshift-sdn              ovs-x5ftt                                      1/1       Running     0          13h       172.16.45.52   logging03.ocp3.local   <none>
openshift-sdn              sdn-2dkcv                                      1/1       Running     0          13h       172.16.45.52   logging03.ocp3.local   <none>
openshift-sdn              sdn-45fhn                                      1/1       Running     0          13h       172.16.45.20   master01.ocp3.local    <none>
openshift-sdn              sdn-78xwx                                      1/1       Running     0          13h       172.16.45.40   infra01.ocp3.local     <none>
openshift-sdn              sdn-7f26d                                      1/1       Running     0          13h       172.16.45.21   master02.ocp3.local    <none>
openshift-sdn              sdn-j2zmx                                      1/1       Running     0          13h       172.16.45.41   infra02.ocp3.local     <none>
openshift-sdn              sdn-k6mhw                                      1/1       Running     0          13h       172.16.45.22   master03.ocp3.local    <none>
openshift-sdn              sdn-k826k                                      1/1       Running     0          13h       172.16.45.62   worker03.ocp3.local    <none>
openshift-sdn              sdn-lhv8t                                      1/1       Running     0          13h       172.16.45.30   router01.ocp3.local    <none>
openshift-sdn              sdn-m89v8                                      1/1       Running     0          13h       172.16.45.31   router02.ocp3.local    <none>
openshift-sdn              sdn-mffs2                                      1/1       Running     0          13h       172.16.45.50   logging01.ocp3.local   <none>
openshift-sdn              sdn-rhs7f                                      1/1       Running     0          13h       172.16.45.60   worker01.ocp3.local    <none>
openshift-sdn              sdn-sb4fc                                      1/1       Running     0          13h       172.16.45.61   worker02.ocp3.local    <none>
openshift-sdn              sdn-szvb5                                      1/1       Running     0          13h       172.16.45.51   logging02.ocp3.local   <none>
openshift-sdn              sdn-vtvsd                                      1/1       Running     0          13h       172.16.45.42   infra03.ocp3.local     <none>
openshift-web-console      webconsole-55ccd559bb-27gtn                    1/1       Running     0          13h       10.128.0.8     master01.ocp3.local    <none>
openshift-web-console      webconsole-55ccd559bb-2jbng                    1/1       Running     0          13h       10.128.2.8     master03.ocp3.local    <none>
openshift-web-console      webconsole-55ccd559bb-4d97f                    1/1       Running     0          13h       10.128.1.7     master02.ocp3.local    <none>

2.9. 재부팅

모든 OpenShift 클러스터 노드를 재부팅하여 작업을 완료한다.
[root@bastion ~]# for node in $(oc get node -o name | cut -d '/' -f '2'); do
  ssh root@$node "systemctl reboot";
done

99. RefURL

[1]: RedHat Knowledge-Centered Support - How to restore etcd on OpenShift 3.11 with 2 etcd members in error state

[2]: RedHat Knowledge-Centered Support - How do I restore from an etcd backup in OpenShift 3.9 and older?

[3]: Gist - OpenShift v4.x - ETCD 백업 및 복구 방법

[4]: Gist - OpenShift v3.11 - 2022-03-28 이슈 정리