Skip to content

Instantly share code, notes, and snippets.

@fgksgf
Last active October 22, 2022 07:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fgksgf/2cbf26f6c1ddda74c691cb69d0a00b25 to your computer and use it in GitHub Desktop.
Save fgksgf/2cbf26f6c1ddda74c691cb69d0a00b25 to your computer and use it in GitHub Desktop.

https://github.com/fgksgf/eticd

ETiCD RFC

Project description

ETiCD is an etcdshim that translates etcd API to TiKV, aiming to overcome etcd’s system limits.

Background & motivation

etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. Most notably, it manages the configuration data, state data, and metadata for Kubernetes, which is the most popular container orchestration platform.

However, there are two system limits of etcd: the request size limit and the storage size limit. First, etcd is designed to handle small key-value pairs typical for metadata. Larger requests will work but may increase the latency of other requests.

By default, the maximum size of any request is 1.5 MiB. Second, the default storage size limit of etcd is 2GB. 8GB is a suggested maximum size for normal environments and exceeding the limit makes etcd deployment read-only. In real scenarios, the suggested maximum size of etcd gradually cannot meet the demand as the amount of data grows.

To overcome the two limits, we find TiKV, a highly scalable, low latency, and easy-to-use key-value database. More importantly, TiKV excels at working with large-scale data by supporting petabyte-scale deployments spanning trillions of rows.

Therefore, we propose the ETiCD, which is an intermediate layer to translate etcd API to TiKV.

Project design

Architecture

Architecture

Translation Layer

As we can see, the translation layer is the most important part of ETiCD, this section will introduce how we convert etcd’s operation to TiKV’s operation.

To begin with, let’s take a look at etcd’s API:

  • GET: gets the key or a range of keys, supports setting prefix and specifying the kv revision.
  • PUT: put the given key into the store.
  • WATCH: watch events stream on keys or prefixes, the core feature of etcd.

As for TiKV, it provides both raw and ACID-compliant transactional (Txn) key-value API. We will only use the txn API in the design.

Get

Applications can get a single key or a range of keys with a prefix or a specific revision from ETiCD.

ETiCD converts the get operation of etcd into the operation of TiKV according to different scenarios(The following tcli encapsulates some basic operations of TiKV):

  • Get a key.
tcli get [key]
  • Get a Key with a specific version as follows.
# The `REV` in the key `REV_foo_4` means the revision, `foo` is the key’s name, and `4` is the revision number of the key `foo`.
tcli get REV_[key]_[Version Number] 
  • Get a range of keys.
# scan all keys from [key], and max [size] keys.
tcli scan [key] –limit=[size]
  • Get a range of keys with a prefix.
# scan all keys from [key prefix], and max [size] keys.
tcli scan [key prefix] strict-prefix=true –limit=[size]

Put

For the put operation of etcd, like putting <foo, bar>, the translation layer first tries to get the value of foo in TiKV.

If it does not exist, ETiCD will:

  1. Put <REV_foo_0, bar> to TiKV
  2. Put <foo, [value: bar, created_event: 0, deleted_event: -1, prev: -1]> to TiKV

If the foo already exists, ETiCD will:

  1. Check if deleted_event is -1, if so (means not deleted):
  2. Get the older value [value: boo, created_event: 5, deleted_event: -1, prev: 3]
  3. Put <REV_foo_4, boo>
  4. Put (update) <foo, [value: bar, created_event: 5, deleted_event: -1, prev: 4]>

In the above example:

  • created_event and deleted_event store a global event id, we can get the event message with the key EVENT_<id>
  • prev points to the revision when the kv was modified last time.

Watch

The main process of watch operation in ETiCD is as follows:

  1. Get all keys with a revision number greater than a certain revision.
  2. Do scatter and filter operations on these keys.
  3. Return all keys that meet the conditions to the watcher.

Verification

There are two ways to verify ETiCD’s correctness: benchmarking and launching a k8s cluster with ETiCD.

Benchmark

There is an official benchmarking tool provided by etcd, we can use it to benchmark ETiCD to compare ETiCD’s performance with etcd.

Using with kubeadm

On the other hand, to verify ETiCD is usable, we can launch a k8s cluster by kubeadm and configure the cluster to use ETiCD. Then do some operations to check if the k8s cluster works as expected.

Here is a sample kubeadm-master.cfg:

apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 0.0.0.0
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: kubeadm
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.17.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}
controlPlaneEndpoint: "0.0.0.0:6443"
etcd:
  external:
    endpoints:
    - http://ETiCD:2379
    caFile: ./ca.crt
    certFile: ./server.crt
    keyFile: ./server.key

Then the cluster can be launched as:

kubeadm init --config kubeadm-master.cfg --ignore-preflight-errors ExternalEtcdVersion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment