Skip to content

Instantly share code, notes, and snippets.

@superseb
Last active April 13, 2023 07:04
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save superseb/bcfeb07931b70b8722b77f1fbd791e99 to your computer and use it in GitHub Desktop.
Save superseb/bcfeb07931b70b8722b77f1fbd791e99 to your computer and use it in GitHub Desktop.
Rancher v2.6.3 and up single install etcd maintenance

Rancher v2.6.3 and up single install embedded etcd maintenance

This is not official documentation, have/make backups, use at your own risk.

v2.6.3 and up only

When etcd db size exceeds quota, it will raise an alarm and throw the error mvcc: database space exceeded.

To manually trigger this situation:

docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "while [ 1 ]; do dd if=/dev/urandom bs=1024 count=1024  | ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl put key  || break; done"

Rancher container is running

You can get the current status of etcd by running:

docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.4.13  | 1.1 GB  | true      |         2 |       3409 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl alarm list"
memberID:10276657743932975437 alarm:NOSPACE

Compact and defrag:

rev=$(docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl endpoint status --write-out fields | grep Revision | cut -d: -f2")
echo $rev
4161
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl compact ${rev%?}"
compacted revision 4161
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl defrag"
Finished defragmenting etcd member[127.0.0.1:2379]
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl alarm disarm"
memberID:10276657743932975437 alarm:NOSPACE
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl alarm list"
<empty>
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.4.13  | 3.2 MB  | true      |         3 |       5014 |
+----------------+------------------+---------+---------+-----------+-----------+------------+

At this point, the rancher/rancher container should stop logging mvcc: database space exceeded.

Rancher container keeps crashing/restarting

In case that the rancher/rancher won't keep running, we need external maintenance to etcd as we cannot use the rancher/rancher container to perform maintenance.

# Stop Rancher container (and block restarting)
docker stop $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')

# Run etcd container with data dir from Rancher's embedded etcd
docker run -d -e ETCDCTL_API=3 --name etcd-maintenance --volumes-from=$(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') quay.io/coreos/etcd:v3.4.13 /usr/local/bin/etcd   --data-dir=/var/lib/rancher/k3s/server/db/etcd

# Check etcd status
docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX |             ERRORS             |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 8e9e05c52164694d |  3.4.15 |  2.2 GB |      true |      false |         4 |       6180 |               6180 |  memberID:10276657743932975437 |
|                |                  |         |         |           |            |           |            |                    |                 alarm:NOSPACE  |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
docker exec etcd-maintenance etcdctl alarm list
memberID:10276657743932975437 alarm:NOSPACE

# Run compact/defrag
rev=$(docker exec etcd-maintenance etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
echo $rev
4161
docker exec etcd-maintenance etcdctl compact "$rev"
compacted revision 4161
  docker exec etcd-maintenance etcdctl defrag
Finished defragmenting etcd member[127.0.0.1:2379]
docker exec etcd-maintenance etcdctl alarm disarm
memberID:10276657743932975437 alarm:NOSPACE
docker exec etcd-maintenance etcdctl alarm list
<empty>
docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+                                                                                                                                                                                     
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |                                                                                                                                                                                     
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+                                                                                                                                                                                     
| 127.0.0.1:2379 | 8e9e05c52164694d |  3.4.15 |  8.4 MB |      true |      false |         4 |       6185 |               6185 |        |                                                                                                                                                                                     
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+         

# Stop etcd-maintenance container
docker stop etcd-maintenance

# Start Rancher
docker start $(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment