Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Rancher v2.2.0 single install etcd maintenance

Rancher v2.2.0 single install embedded etcd maintenance

This is not official documentation, have/make backups, use at your own risk.

v2.2.0 only (for v2.1.x, see https://gist.github.com/superseb/48037c0323147e603bc0197bd5ecb9b5)

When etcd db size exceeds quota, it will raise an alarm and throw the error mvcc: database space exceeded.

To manually trigger this situation:

$ docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -ti -e ETCDCTL_API=3 rancher/rke-tools:v0.1.27 bash -c "while [ 1 ]; do dd if=/dev/urandom bs=1024 count=1024  | ETCDCTL_API=3 etcdctl put key  || break; done"

Rancher container is running

You can get the current status of etcd by running:

docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -ti -e ETCDCTL_API=3 rancher/rke-tools:v0.1.27 bash -c "etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.2.13  | 1.1 GB  | true      |         2 |       3409 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -ti -e ETCDCTL_API=3 rancher/rke-tools:v0.1.27 bash -c "etcdctl alarm list"
memberID:10276657743932975437 alarm:NOSPACE

Compact and defrag:

rev=$(docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.27 bash -c "etcdctl endpoint status --write-out json | egrep -o '\"revision\":[0-9]*' | egrep -o '[0-9]*'")
echo $rev
4161
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.27 bash -c "etcdctl compact $rev"
compacted revision 4161
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.27 bash -c "etcdctl defrag"
Finished defragmenting etcd member[127.0.0.1:2379]
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.27 bash -c "etcdctl alarm disarm"
memberID:10276657743932975437 alarm:NOSPACE
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.27 bash -c "etcdctl alarm list"
<empty>
docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -e ETCDCTL_API=3 rancher/rke-tools:v0.1.27 bash -c "etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.2.13  | 3.2 MB  | true      |         3 |       5014 |
+----------------+------------------+---------+---------+-----------+-----------+------------+

At this point, the rancher/rancher container should stop logging mvcc: database space exceeded.

Rancher container keeps crashing/restarting

In case that the rancher/rancher won't keep running, we need external maintenance to etcd as we cannot use the rancher/rancher container to perform maintenance.

# Stop Rancher container (and block restarting)
docker stop $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')

# Run etcd container with data dir from Rancher's embedded etcd
docker run -d -e ETCDCTL_API=3 --name etcd-maintenance --volumes-from=$(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') quay.io/coreos/etcd:v3.2.13 /usr/local/bin/etcd   --data-dir=/var/lib/rancher/etcd

# Check etcd status
docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.2.13  | 1.1 GB  | true      |         2 |       3409 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
docker exec etcd-maintenance etcdctl alarm list
memberID:10276657743932975437 alarm:NOSPACE

# Run compact/defrag
rev=$(docker exec etcd-maintenance etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
echo $rev
4161
docker exec etcd-maintenance etcdctl compact "$rev"
compacted revision 4161
docker exec etcd-maintenance etcdctl defrag
Finished defragmenting etcd member[127.0.0.1:2379]
docker exec etcd-maintenance etcdctl alarm disarm
memberID:10276657743932975437 alarm:NOSPACE
docker exec etcd-maintenance etcdctl alarm list
<empty>
docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.2.13  | 3.2 MB  | true      |         3 |       5014 |
+----------------+------------------+---------+---------+-----------+-----------+------------+

# Stop etcd-maintenance container
docker stop etcd-maintenance

# Start Rancher
docker start $(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment