Skip to content

Instantly share code, notes, and snippets.

@aojea
Last active March 18, 2024 05:54
Show Gist options
  • Save aojea/22a772dcf51948d955ac7cc3b75e7776 to your computer and use it in GitHub Desktop.
Save aojea/22a772dcf51948d955ac7cc3b75e7776 to your computer and use it in GitHub Desktop.

checkout cilium repo and run it in kind

git clone https://github.com/cilium/cilium.git
cd cilium
REPO_ROOT=$PWD
KUBEPROXY_MODE="none" make kind
make kind-image
make kind-install-cilium

if we want to change some options (kubeproxy replacement and pprof on) we can change the values during the installation

cilium install \
		--chart-directory=${ROOT_DIR}/install/kubernetes/cilium \
		--helm-values=${ROOT_DIR}/contrib/testing/kind-values.yaml \
		--version= \
		>/dev/null 2>&1 &

or after install

# https://docs.cilium.io/en/v1.13/configuration/
kubectl edit configmap cilium-config -n kube-system
# set pprof: "true"
# set kube-proxy-replacement: strict

cilium config also allow to set values, it is important to know that those are not automatically applied, and the pods needs to be restarted

kubectl rollout restart daemonset cilium -n kube-system

We may want to see metrics for our tests, install the monitoring.yaml that scrapes the agents metrics ports and exposes prometheus console in a service NodePort

kubectl get nodes -o wide
# get nodeport to open prometheus
kubectl get monitoring services 

If we want to obtain a pprof, after enabling the option on the config, forward the port 6060 in the pod and obtain the dump

kubectl -n kube-system get pods -o wide | grep worker
cilium-f78xr                                 1/1     Running   0          135m   192.168.10.2   kind-worker          <none>           <none>
cilium-operator-5946647599-src9x             1/1     Running   0          43h    192.168.10.2   kind-worker          <none>           <none>

kubectl -n kube-system port-forward pod/cilium-f78xr 6060:6060
Forwarding from 127.0.0.1:6060 -> 6060
Forwarding from [::1]:6060 -> 6060

curl "http://127.0.0.1:6060/debug/pprof/profile?seconds=15" > cpu.pprof

Micro-benchmarking

With this setup we can start micro-benchmarking the agent, in the sense that we can create API objects and observe the agent behavior.

Per example, we can stress the agent to create a 200 pods with a parallelism of 50 and obtain a pprof of the CPU and memory

kubectl apply -f stress-pod.yaml

curl "http://127.0.0.1:6060/debug/pprof/heap?seconds=15" > mem.pprof

curl "http://127.0.0.1:6060/debug/pprof/profile?seconds=15" > cpu.pprof

cpu pprof

mem pprof

From the graphs we can see that there is a lot of CPU time and memory used to process the conversion to cmaps, also a considerable amount on time spent in the GC

Resilience

Another useful tests are repetitive operations for a long time for detecting memory leaks

This is just as simple as creating a loop doing the same operation for a long time, and observing the graph the memory consumed by the agent

while true; kubectl run test --image busybox -- sleep 1; sleep 1; kubectl delete pod test; done

image

---
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
namespace: monitoring
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9090'
spec:
selector:
app: prometheus-server
type: NodePort
ports:
- port: 8080
targetPort: 9090
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: default
namespace: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-server-conf
labels:
name: prometheus-server-conf
namespace: monitoring
data:
prometheus.yml: |-
global:
scrape_interval: 5s
evaluation_interval: 5s
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-controller-manager'
honor_labels: true
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
static_configs:
- targets:
- 127.0.0.1:10257
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: localhost:6443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: localhost:6443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: cilium
honor_labels: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_name
separator: '/'
regex: 'kube-system/cilium.+'
- source_labels:
- __address__
action: replace
target_label: __address__
regex: (.+?)(\\:\\d+)?
replacement: $1:9962
---
apiVersion: v1
kind: Pod
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus-server
spec:
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
containers:
- name: prometheus
image: prom/prometheus:v2.43.1
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
- "--web.enable-admin-api"
ports:
- containerPort: 9090
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus/
- name: prometheus-storage-volume
mountPath: /prometheus/
volumes:
- name: prometheus-config-volume
configMap:
defaultMode: 420
name: prometheus-server-conf
- name: prometheus-storage-volume
emptyDir: {}
apiVersion: batch/v1
kind: Job
metadata:
name: busybox2
spec:
completions: 200
parallelism: 50
template:
spec:
nodeName: kind-worker
containers:
- name: busybox
image: busybox
resources:
requests:
cpu: "1000m"
command:
- /bin/sh
- -c
- |
echo "starting!"
sleep 5s
_term() {
echo "trapped sigterm"
sleep 99999s
}
trap _term TERM
restartPolicy: Never
backoffLimit: 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment