Skip to content

Instantly share code, notes, and snippets.

@aojea
Last active December 27, 2023 02:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aojea/c20eb117bf1c1214f8bba26c495be9c7 to your computer and use it in GitHub Desktop.
Save aojea/c20eb117bf1c1214f8bba26c495be9c7 to your computer and use it in GitHub Desktop.
Kubernetes ServiceCIDR KEP-1880

KEP-1880: Multiple Service CIDRs

Introduction

KEP-1880 allows to configure via API objects the ServiceCIDRs assigned to Kubernetes clusters

This demo is using kind and based on the existing PR kubernetes/kubernetes#116516

A live demo was done in the SIG-Network meeting on Oct 12th 2023 https://youtube.com/playlist?list=PL69nYSiGNLP2E8vmnqo5MwPOY25sDWIxb&si=vTNuT7EBFujoQOce

You can checkout the PR locally and build your own image

kind build node-image --image kindest:servicecidr

or use my own one, to create the cluster just specify the image and the configuration that enable the Alpha Runtime and Feature flags.

kind create cluster --image aojea/kindest:servicecidr --config kind-config.yaml -v9 --name servicecidr 

You can observe after creation that there are several objects created at bootsrap:

  1. Default ServiceCIDR , named kubernetes, created from the flags values
kubectl get servicecidrs
NAME         IPV4           IPV6     AGE
kubernetes   10.96.0.0/28   <none>   17m
  1. Default Kubernets Service, named kubernetes, the first IP from the default ServiceCIDR
kubectl get service
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   17m

The default Service has a ClusterIP 10.96.0.1, that must create the corresponding IPAddress object

kubectl get ipaddress
NAME         PARENTREF
10.96.0.1    services/default/kubernetes
10.96.0.10   services/kube-system/kube-dns

The relation of this objects is Services get allocated ClusterIPs from the ServiceCIDRs, to solve the problem of ClusterIP uniqueness across the cluster, each ClusterIP has an IPAddress object associated.

The ServiceCIDRs are protected with finalizers, to avoid leaving Service ClusterIPs orphans, the finalizer is only removed if there is another subnet that contains the existing IPAddresses or there are no IPAddresses belonging to the subnet.

Use cases

IP Exhaustion

There are cases that the ServiceCIDR range is exhausted, previously, increasing the Service range was a disruptive operation that could also cause data loss. With this new feature users just need to add a new ServiceCIDR.

Create Services so we exhaust the existing ServiceCIDR

for i in $(seq 1 13); do kubectl create service clusterip "test-$i" --tcp 80 -o json | jq -r .spec.clusterIP; done
for i in $(seq 1 13); do kubectl create service clusterip "test-$i" --tcp 80 -o json | jq -r .spec.clusterIP; done
10.96.0.11
10.96.0.5
10.96.0.12
10.96.0.13
10.96.0.14
10.96.0.2
10.96.0.3
10.96.0.4
10.96.0.6
10.96.0.7
10.96.0.8
10.96.0.9
error: failed to create ClusterIP service: Internal error occurred: failed to allocate a serviceIP: range is full

We can see how the last Service fails to be created because the range is full, so we just create a new ServiceCIDR

$ cat cidr.yaml
apiVersion: networking.k8s.io/v1alpha1
kind: ServiceCIDR
metadata:
  name: newcidr1
spec:
  ipv4: 192.96.0.0/24
$ kubectl apply -f cidr.yaml
servicecidr.networking.k8s.io/newcidr1 created

and we can see how we can create new Services

for i in $(seq 13 16); do kubectl create service clusterip "test-$i" --tcp 80 -o json | jq -r .spec.clusterIP; done
192.96.0.48
192.96.0.200
192.96.0.121
192.96.0.144

that get IPs from the new Service CIDR

kubectl get service/test-13
NAME      TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
test-13   ClusterIP   192.96.0.83   <none>        80/TCP    25s

Deleting

A ServiceCIDR can not be deleted if there are still IPs pending on that

kubectl delete servicecidr newcidr1
servicecidr.networking.k8s.io "newcidr1" deleted

a finalizer will keep it there

 kubectl get servicecidr newcidr1 -o yaml
apiVersion: networking.k8s.io/v1alpha1
kind: ServiceCIDR
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.k8s.io/v1alpha1","kind":"ServiceCIDR","metadata":{"annotations":{},"name":"newcidr1"},"spec":{"ipv4":"192.96.0.0/24"}}
  creationTimestamp: "2023-10-12T15:11:07Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2023-10-12T15:12:45Z"
  finalizers:
  - networking.k8s.io/service-cidr-finalizer
  name: newcidr1
  resourceVersion: "1133"
  uid: 5ffd8afe-c78f-4e60-ae76-cec448a8af40
spec:
  ipv4: 192.96.0.0/24
status:
  conditions:
  - lastTransitionTime: "2023-10-12T15:12:45Z"
    message: There are still IPAddresses referencing the ServiceCIDR, please remove
      them or create a new ServiceCIDR
    reason: OrphanIPAddress
    status: "False"
    type: Ready

until all the referenced IPAddresses are deleted

for i in $(seq 13 16); do kubectl delete service "test-$i" ; done
service "test-13" deleted
service "test-14" deleted
service "test-15" deleted
service "test-16" deleted

so it can be completely removes

 kubectl get servicecidr newcidr1
Error from server (NotFound): servicecidrs.networking.k8s.io "newcidr1" not found

Renumbering

Another common use case is when users want to move the existing Service range to a new range, imagine we want to move our 10.96.0.0/28 to 192.168.7.0/24.

We can follow the next steps:

  1. Create new ServiceCIDR with 192.168.7.0/24
  2. Delete the default ServiceCIDR to make its Ready condition to False, so no new IPAddresses will be allocated from it.
  3. At this point only the kubernetes.default Service must be in the default ServiceCIDR subnet
  4. Recreate all the existing Services to they get IPs from the new ServiceCIDR (Delete and Create)
  5. At this point we can start a new apiserver with the flags matching the new ServiceCIDR range
  6. When the new apiserver is running and ready, we can shutdown the old apiserver.
  7. Then delete the "kubernetes.default" Service, this will unblock the deletion of the default ServiceCIDR, that will be recreated by the new apiserver, and also create the new kubernetes.default Service in the new range
  8. At this point we can delete the temporary ServiceCIDR, since we'll be overlapping with the new created default ServiceCIDR

IP family migration

This is the same as previous, but just using a different IP family for the new subnet.

# config for 1 control plane node and 2 workers (necessary for conformance)
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
ipFamily: ipv4
kubeProxyMode: iptables
serviceSubnet: "10.96.0.0/28"
# don't pass through host search paths
dnsSearch: []
nodes:
- role: control-plane
- role: worker
- role: worker
featureGates: {"AllAlpha":true}
runtimeConfig: {"api/alpha":"true"}
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
metadata:
name: config
apiServer:
extraArgs:
"v": "4"
controllerManager:
extraArgs:
"v": "4"
scheduler:
extraArgs:
"v": "4"
---
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
"v": "4"
---
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
"v": "4"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment