Skip to content

Instantly share code, notes, and snippets.

@superseb
Last active December 22, 2021 15:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save superseb/566960fa1ebbb0891cf11b0cdf255369 to your computer and use it in GitHub Desktop.
Save superseb/566960fa1ebbb0891cf11b0cdf255369 to your computer and use it in GitHub Desktop.
Rancher v2.6.x single install etcd maintenance

Rancher v2.6.x single install embedded etcd maintenance

This is not official documentation, have/make backups, use at your own risk.

v2.6.0/v2.6.1/v2.6.2 only, for v2.6.3 and up, see https://gist.github.com/superseb/bcfeb07931b70b8722b77f1fbd791e99

When etcd db size exceeds quota, it will raise an alarm and throw the error mvcc: database space exceeded.

To manually trigger this situation:

$ docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -ti -e ETCDCTL_API=3 rancher/rke-tools:v0.1.78 bash -c "while [ 1 ]; do dd if=/dev/urandom bs=1024 count=1024  | ETCDCTL_API=3 etcdctl put key  || break; done"

Rancher container is running

You can get the current status of etcd by running:

docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -ti -e ETCDCTL_API=3 rancher/rke-tools:v0.1.78 bash -c "etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.4.15  | 1.1 GB  | true      |         2 |       3409 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -ti -e ETCDCTL_API=3 rancher/rke-tools:v0.1.78 bash -c "etcdctl alarm list"
memberID:10276657743932975437 alarm:NOSPACE

Compact and defrag:

rev=$(docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.78 bash -c "etcdctl endpoint status --write-out json | egrep -o '\"revision\":[0-9]*' | egrep -o '[0-9]*'")
echo $rev
4161
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.78 bash -c "etcdctl compact $rev"
compacted revision 4161
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.78 bash -c "etcdctl defrag"
Finished defragmenting etcd member[127.0.0.1:2379]
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.78 bash -c "etcdctl alarm disarm"
memberID:10276657743932975437 alarm:NOSPACE
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.78 bash -c "etcdctl alarm list"
<empty>
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.78 bash -c "etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.4.15  | 3.2 MB  | true      |         3 |       5014 |
+----------------+------------------+---------+---------+-----------+-----------+------------+

At this point, the rancher/rancher container should stop logging mvcc: database space exceeded.

Rancher container keeps crashing/restarting

In case that the rancher/rancher won't keep running, we need external maintenance to etcd as we cannot use the rancher/rancher container to perform maintenance.

# Stop Rancher container (and block restarting)
docker stop $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')

# Run etcd container with data dir from Rancher's embedded etcd
docker run -d -e ETCDCTL_API=3 --name etcd-maintenance --volumes-from=$(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') quay.io/coreos/etcd:v3.4.15 /usr/local/bin/etcd   --data-dir=/var/lib/rancher/etcd

# Check etcd status
docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX |             ERRORS             |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 8e9e05c52164694d |  3.4.15 |  2.2 GB |      true |      false |         4 |       6180 |               6180 |  memberID:10276657743932975437 |
|                |                  |         |         |           |            |           |            |                    |                 alarm:NOSPACE  |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
docker exec etcd-maintenance etcdctl alarm list
memberID:10276657743932975437 alarm:NOSPACE

# Run compact/defrag
rev=$(docker exec etcd-maintenance etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
echo $rev
4161
docker exec etcd-maintenance etcdctl compact "$rev"
compacted revision 4161
docker exec etcd-maintenance etcdctl defrag
Finished defragmenting etcd member[127.0.0.1:2379]
docker exec etcd-maintenance etcdctl alarm disarm
memberID:10276657743932975437 alarm:NOSPACE
docker exec etcd-maintenance etcdctl alarm list
<empty>
docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+                                                                                                                                                                                     
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |                                                                                                                                                                                     
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+                                                                                                                                                                                     
| 127.0.0.1:2379 | 8e9e05c52164694d |  3.4.15 |  8.4 MB |      true |      false |         4 |       6185 |               6185 |        |                                                                                                                                                                                     
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+         

# Stop etcd-maintenance container
docker stop etcd-maintenance

# Start Rancher
docker start $(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment