Skip to content

Instantly share code, notes, and snippets.

@embano1
Last active July 7, 2021 10:11
Show Gist options
  • Save embano1/aedd423abe07c8012897658db15e139a to your computer and use it in GitHub Desktop.
Save embano1/aedd423abe07c8012897658db15e139a to your computer and use it in GitHub Desktop.
etcd Deep Dive at Golang Leipzig Meetup

Setup

Prepare the etcd server:

docker pull bitnami/etcd:latest
docker network create app-tier --driver bridge
docker run -d --name etcd-server \
    --network app-tier \
    --publish 2379:2379 \
    --publish 2380:2380 \
    --env ALLOW_NONE_AUTHENTICATION=yes \
    --env ETCD_ADVERTISE_CLIENT_URLS=http://etcd-server:2379 \
    bitnami/etcd:latest

Exec into etcd container instance and retrieve status:

docker exec -it etcd-server bash

# inside container
alias e=etcdctl
e endpoint status -w json | jq
[
  {
    "Endpoint": "127.0.0.1:2379",
    "Status": {
      "header": {
        "cluster_id": 14841639068965180000,
        "member_id": 10276657743932975000,
        "revision": 1,
        "raft_term": 2
      },
      "version": "3.5.0",
      "dbSize": 20480,
      "leader": 10276657743932975000,
      "raftIndex": 4,
      "raftTerm": 2,
      "raftAppliedIndex": 4,
      "dbSizeInUse": 16384
    }
  }
]

Update to latest version of jq (supports @base64d syntax):

URL=https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64
curl -L -o /tmp/jq $URL && cd /tmp/ && chmod +x jq
export PATH=$PWD:$PATH

💡 TIP: Use CTRL-L to clear console in etcd container

Interacting with etcd through etcdctl

Create some data:

for i in {1..5}; do e put /keys/${i} value-${i}; done
OK
OK
OK
OK
OK

Show and explain get without --prefix:

# no data returned
e get .

# use JSON output
e get . -w json | jq .
{
  "header": {
    "cluster_id": 14841639068965180000,
    "member_id": 10276657743932975000,
    "revision": 6,
    "raft_term": 2
  }
}

Show and explain get with --prefix:

e get / --prefix -w json | jq .
{
  "header": {
    "cluster_id": 14841639068965180000,
    "member_id": 10276657743932975000,
    "revision": 6,
    "raft_term": 2
  },
  "kvs": [
    {
      "key": "L2tleXMvMQ==",
      "create_revision": 2,
      "mod_revision": 2,
      "version": 1,
      "value": "dmFsdWUtMQ=="
    },
    {
      "key": "L2tleXMvMg==",
      "create_revision": 3,
      "mod_revision": 3,
      "version": 1,
      "value": "dmFsdWUtMg=="
    },
    {
      "key": "L2tleXMvMw==",
      "create_revision": 4,
      "mod_revision": 4,
      "version": 1,
      "value": "dmFsdWUtMw=="
    },
    {
      "key": "L2tleXMvNA==",
      "create_revision": 5,
      "mod_revision": 5,
      "version": 1,
      "value": "dmFsdWUtNA=="
    },
    {
      "key": "L2tleXMvNQ==",
      "create_revision": 6,
      "mod_revision": 6,
      "version": 1,
      "value": "dmFsdWUtNQ=="
    }
  ],
  "count": 5
}

To decode base64-encoded values use the following syntax (requires jq +v1.6):

e get / --prefix -w json | jq '.kvs[].key|=@base64d|.kvs[].value|=@base64d'

Update an existing key and explain behavior:

# returns new rev
e put /keys/1 value-1a -w json
{"header":{"cluster_id":14841639068965180000,"member_id":10276657743932975000,"revision":7,"raft_term":2}}

# inspect key
e get /keys/1 -w json | jq '.kvs[].value|=@base64d'
{
  "header": {
    "cluster_id": 14841639068965180000,
    "member_id": 10276657743932975000,
    "revision": 7,
    "raft_term": 2
  },
  "kvs": [
    {
      "key": "L2tleXMvMQ==",
      "create_revision": 2,
      "mod_revision": 7,
      "version": 2,
      "value": "value-1a"
    }
  ],
  "count": 1
}

Delete a key:

e del /keys/1 -w json
{"header":{"cluster_id":14841639068965180000,"member_id":10276657743932975000,"revision":8,"raft_term":2},"deleted":1}

# show that /keys/1 is gone
e get / --prefix --keys-only
/keys/2
/keys/3
/keys/4
/keys/5

Show and explain time-travel with revisions for one (previously deleted) key:

# show count at current rev (here rev=8)
e get /keys/1 --rev 8 -w json | jq .
{"header":{"cluster_id":14841639068965180000,"member_id":10276657743932975000,"revision":8,"raft_term":2}}

# use rev from last modification
e get /keys/1 --rev 7 -w json | jq '.kvs[].key|=@base64d|.kvs[].value|=@base64d'
{
  "header": {
    "cluster_id": 14841639068965180000,
    "member_id": 10276657743932975000,
    "revision": 8,
    "raft_term": 2
  },
  "kvs": [
    {
      "key": "/keys/1",
      "create_revision": 2,
      "mod_revision": 7,
      "version": 2,
      "value": "value-1a"
    }
  ],
  "count": 1
}


# use earlier revision (workaround: mod_revision - 1)
e get /keys/1 --rev 6 -w json | jq '.kvs[].key|=@base64d|.kvs[].value|=@base64d'
{
  "header": {
    "cluster_id": 14841639068965180000,
    "member_id": 10276657743932975000,
    "revision": 8,
    "raft_term": 2
  },
  "kvs": [
    {
      "key": "/keys/1",
      "create_revision": 2,
      "mod_revision": 2,
      "version": 1,
      "value": "value-1"
    }
  ],
  "count": 1
}

💡 Note: There is no command to perform queries like "give me all revisions for this key", so use mod_revision - 1 >= create_revision to iteratively walk backwards.

Recreate a previously deleted key:

e put /keys/1 value-1b -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":9,"raft_term":2}}

Inspect the raw database content:

docker cp etcd-server:/bitnami/etcd/data/member/snap/db .

./etcd-dump-db iterate-bucket db key --decode
rev={main:9 sub:0}, value=[key "/keys/1" | val "value-1b" | created 9 | mod 9 | ver 1]
rev={main:8 sub:0}, value=[key "/keys/1" | val "" | created 0 | mod 0 | ver 0]
rev={main:7 sub:0}, value=[key "/keys/1" | val "value-1a" | created 2 | mod 7 | ver 2]
rev={main:6 sub:0}, value=[key "/keys/5" | val "value-5" | created 6 | mod 6 | ver 1]
rev={main:5 sub:0}, value=[key "/keys/4" | val "value-4" | created 5 | mod 5 | ver 1]
rev={main:4 sub:0}, value=[key "/keys/3" | val "value-3" | created 4 | mod 4 | ver 1]
rev={main:3 sub:0}, value=[key "/keys/2" | val "value-2" | created 3 | mod 3 | ver 1]
rev={main:2 sub:0}, value=[key "/keys/1" | val "value-1" | created 2 | mod 2 | ver 1]

Run compaction at current rev:

e compact 9 --physical
compacted revision 9

Inspect and explain the database content:

docker cp etcd-server:/bitnami/etcd/data/member/snap/db .

./etcd-dump-db iterate-bucket db key --decode
rev={main:9 sub:0}, value=[key "/keys/1" | val "value-1b" | created 9 | mod 9 | ver 1]
rev={main:6 sub:0}, value=[key "/keys/5" | val "value-5" | created 6 | mod 6 | ver 1]
rev={main:5 sub:0}, value=[key "/keys/4" | val "value-4" | created 5 | mod 5 | ver 1]
rev={main:4 sub:0}, value=[key "/keys/3" | val "value-3" | created 4 | mod 4 | ver 1]
rev={main:3 sub:0}, value=[key "/keys/2" | val "value-2" | created 3 | mod 3 | ver 1]

💡 Note: Compaction will only purge keys from the underlying database if the key has a tombstone marker within the rev range specified.

Watch a key and update/delete it in parallel (two put followed by a del operation):

e watch /keys/1 -w json | jq '.Events[].kv.key|=@base64d|.Events[].kv.value|=@base64d'
{
  "Header": {
    "cluster_id": 14841639068965180000,
    "member_id": 10276657743932975000,
    "revision": 10,
    "raft_term": 2
  },
  "Events": [
    {
      "kv": {
        "key": "/keys/1",
        "create_revision": 9,
        "mod_revision": 10,
        "version": 2,
        "value": "value-1c"
      }
    }
  ],
  "CompactRevision": 0,
  "Canceled": false,
  "Created": false
}
{
  "Header": {
    "cluster_id": 14841639068965180000,
    "member_id": 10276657743932975000,
    "revision": 11,
    "raft_term": 2
  },
  "Events": [
    {
      "kv": {
        "key": "/keys/1",
        "create_revision": 9,
        "mod_revision": 11,
        "version": 3,
        "value": "value-1d"
      }
    }
  ],
  "CompactRevision": 0,
  "Canceled": false,
  "Created": false
}

# if the key gets deleted
{
  "Header": {
    "cluster_id": 14841639068965180000,
    "member_id": 10276657743932975000,
    "revision": 12,
    "raft_term": 2
  },
  "Events": [
    {
      "type": 1,
      "kv": {
        "key": "/keys/1",
        "mod_revision": 12,
        "value": "��"
      }
    }
  ],
  "CompactRevision": 0,
  "Canceled": false,
  "Created": false
}

💡 Note: Without JSON output you would directly see one of the two event types (PUT or DELETE):

PUT
/keys/1
value-1d
DELETE
/keys/1

Show and explain watch on a range with an explicit rev (ie. not "now"):

e watch /keys --prefix --rev 9
PUT
/keys/1
value-1b
PUT
/keys/1
value-1c
PUT
/keys/1
value-1d
DELETE
/keys/1

💡 Note: watch works with future revisions (e.g. current + n) but not when the specified revision has been compacted: etcdserver: mvcc: required revision has been compacted

Kubernetes

Setup

Create cluster:

kind create cluster --name tt-etcd

Compile etcdctl and copy into kind cluster

# clone github.com/etcd-io/etcd then:
cd etcd/etcdctl
GOOS=linux go build .
docker cp etcdctl tt-etcd-control-plane:/usr/local/bin

Install CRD (so we can see JSON data structures in the registry)

kubectl create -f https://raw.githubusercontent.com/embano1/codeconnect-vm-operator/main/config/crd/bases/vm.codeconnect.vmworld.com_vmgroups.yaml

# create one example CR
kubectl create -f https://raw.githubusercontent.com/embano1/codeconnect-vm-operator/main/config/samples/vg-1.yaml
docker exec -it tt-etcd-control-plane bash
apt-get update && apt-get install jq -y
alias e="etcdctl --endpoints 127.0.0.1:2379   --cert=/etc/kubernetes/pki/etcd/server.crt   --key=/etc/kubernetes/pki/etcd/server.key   --cacert=/etc/kubernetes/pki/etcd/ca.crt"

ListWatch Pattern

func (wc *watchChan) startWatching(watchClosedCh chan struct{}) {
	if wc.initialRev == 0 {
		if err := wc.sync(); err != nil {
			klog.Errorf("failed to sync with latest state: %v", err)
			wc.sendError(err)
			return
		}
	}
	opts := []clientv3.OpOption{clientv3.WithRev(wc.initialRev + 1), clientv3.WithPrevKV()}
	if wc.recursive {
		opts = append(opts, clientv3.WithPrefix())
	}
	if wc.progressNotify {
		opts = append(opts, clientv3.WithProgressNotify())
	}
	wch := wc.watcher.client.Watch(wc.ctx, wc.key, opts...)

ListWatch via kubectl

# inspect the request flow and parameters (resourceVersion=5820&watch=true)
kubectl get vg -A -w -v 9 -o json --output-watch-events

# filtered by event type/label/resourceVersion
kubectl get vg -A -w -v9 -o json --output-watch-events | jq '.|.type,.object.metadata.labels,.object.metadata.resourceVersion'

ListWatch via etcdctl

# note the current revision returned in header
e get /registry/vm.codeconnect.vmworld.com/vmgroups --prefix -w json --keys-only | jq '.kvs[].key|=@base64d'

# watch from revision+1
nextRev=$(expr $(e get /registry/vm.codeconnect.vmworld.com/vmgroups/default/vg-1 -w json | jq '.header.revision') + 1)
e watch /registry/vm.codeconnect.vmworld.com/vmgroups --prefix -w json --rev ${nextRev} | jq '.Events[].kv.key|=@base64d|.Events[].kv.value|=@base64d'

Now make a modification to a pod (e.g. add a label) and observe the output on kubectl and etcdctl (compare the resourceVersion and (mod)revision)

Dump Kubernetes etcd Registry

# build dump tool
cd etcd/tools/etcd-dump-db
go build .

# copy DB
docker cp tt-etcd-control-plane:/var/lib/etcd/member/snap/db db

# show contents and pipe to VS code for better readability
./etcd-dump-db iterate-bucket db key --decode | code -

alternative: use auger

auger extract -f db

value for a given key

auger extract -f db -k /registry/secrets/kube-system/kube-proxy-token-zghms


Explain why `resourceVersion` is not persisted in `etcd`

# Resources

- https://www.mgasch.com/2021/01/listwatch-part-1/
- https://github.com/etcd-io/etcd
- https://github.com/etcd-io/bbolt
- https://pkg.go.dev/go.etcd.io/etcd/clientv3#Client
- https://jepsen.io/analyses/etcd-3.4.3
- https://github.com/jpbetz/auger
- https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s11-sun.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment