When debugging a tracee process (i.e. Nginx worker), it needs to be attached to the tracer process (e.g. gcore or gdb).
This post explains how to build a custom Kong image and generate core dump of Nginx workers with gcore.
Create a Dockerfile and save it as wp-ubuntu-3501.Dockerfile as follows.
# wp-ubuntu-3501.Dockerfile
FROM kong/kong-gateway:3.5.0.1
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
USER root
LABEL com.example.description="Kong Gateway Development" \
com.example.department="Team Sustaining" \
com.example.builder="Zachary Hu (zachary.hu@konghq.com)"
RUN <<-EOF
set -ex
apt-get update
apt-get install -y \
gdb \
jq \
ncat \
iproute2 \
less
EOF
USER kong
We add the essential package gdb which includes the gdb and gcore commands. The gcore command is the one that generates core dumps.
Other packages are optional and only serve as a debug auxiliary.
Run the docker build command to build a custom image.
~ $ docker buildx build --no-cache --load -t 'kong-wp:3.5.0.1' -f wp-ubuntu-3501.Dockerfile <dir-of-the-wp-ubuntu-3501.Dockerfile>Verify if the image is successfully built.
~ $ docker images 'kong-wp'Once the image has been built, the image will need uploading to the repo that the k8s cluster uses. This can be achieved by docker image save/load and/or docker image push/pull. For example, we can dump the image to a tar file for network transmission as below, which can be, later on, loaded to Docker daemon.
~ $ docker image save -o kong-wp-3501.tar kong-wp:3.5.0.1
~ $ docker image load -i kong-wp-3501.tarIn order for the Kong container to generate core dump, we need a few adjustments to the Helm deplyment YAML file kong-values.yaml (attached) prior to Kong startup.
We should use the newly built image.
# kong-values.yaml
image:
repository: kong-wp
tag: "3.5.0.1"Kernel parameters in the containerized environment (e.g. K8s) can be categorized into safe and unsafe.
-
Safe parameteres refer to those that are enabled by default.
Only a limited set of paramters are safe.
-
Unsafe parameters refer to those that are disabled by default. Unsafe parameters must be enabled before value setting.
Most parameters are unsafe.
Kernel parameters in the containerized environment (e.g. K8s) can also be categorized into namespaced and non-namespaced.
-
Namespaced parameters refer to those that are set on a per-pod basis – pod-level isolation. For instance, a value set for containers of one pod does not interfere with containers of other pods. Of course, such parameters can also be set on a per-node basis. But for security concerns, we do not do this.
Namespaced parameters can be safe or unsafe. Namespaced unsafe paramters must be enabled via the option
allowed-unsafe-sysctlsbefore value setting.We set namespaced parameters via the
securityContext.sysctlsof a pod. -
Non-namespaced parameters are intuitively unsafe (global and not restricted to a namespace). Therefore, they can only be set on a per-node basis – node-level paramters (via the node's OS). They are hence disabled by default.
We utilize privileged initContainer to enable and set non-spaced parameters.
Please be kindly reminded that gcore depends on system call ptrace() and is hence subject to the kernel.yama.ptrace_scope kernel parameter. This kernel parameter happens to be non-namespaced (and hence unsafe). We must enable it and then set it to 0 for ptrace(), as long as the worker process and the gcore process are ran by the same uid. The default value 1 only allows ptrace() between parent and child processes.
# kong-values.yaml
deployment:
initContainers:
- name: sysctl
image: busybox
securityContext:
privileged: true
command: ["sh", "-c", "sysctl -w kernel.yama.ptrace_scope=0"]Refer to Using sysctls in a Kubernetes Cluster.
The core dump is a binary file representing the the memory footprint of the virtual memory space. We must prepare, at a minimum, disk space of the same size as the virtual memory to hold the file.
We check the virtual memory VIRT as follows.
~ $ top -o +RES -p $(pgrep -d',' nginx)
top - 15:23:09 up 46 days, 8:36, 0 users, load average: 0.12, 0.10, 0.09
Tasks: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7636.7 total, 2460.2 free, 1697.9 used, 3478.6 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 5609.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
75557 kong 20 0 4729348 417644 70784 S 0.0 5.3 0:16.11 nginx
75556 kong 20 0 4717280 403284 68736 S 0.0 5.2 0:15.28 nginx
75559 kong 20 0 4708540 394960 69120 S 0.0 5.1 0:13.74 nginx
75558 kong 20 0 4707876 393268 68224 S 0.0 5.0 0:14.03 nginx
75555 kong 20 0 2368944 87192 640 S 0.0 1.1 0:00.01 nginx
~ $ ps auxww | grep '[n]ginx\|[U]SER'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kong 75555 0.0 1.1 2368944 87192 ? Ss 14:53 0:00 nginx: master process /usr/local/bin/nginx -p /usr/local/kong -c nginx.conf
kong 75556 0.8 5.1 4717280 403284 ? S 14:53 0:15 nginx: worker process
kong 75557 0.8 5.3 4729348 417644 ? S 14:53 0:16 nginx: worker process
kong 75558 0.7 5.0 4707876 393268 ? S 14:53 0:14 nginx: worker process
kong 75559 0.7 5.0 4708540 394960 ? S 14:53 0:13 nginx: worker processPay attention to the RES of top and RSS of ps. They refer to the same thing, which is reflected by the fact that both values has the same %MEM.
We will generate the core dump file under the kong user account and save the core dump file to the kong_prefix directory as kong has write permission to this direcotry. If the worker process has several gigabytes of memory allocated, then the same sized space is needed on that directory.
We must spare enough disk space in advance. If the kong_prefix directory is a mount point, make the following adjustment to the Helm YAML.
# kong-values.yaml
deployment:
prefixDir:
sizeLimit: 4GiCore dump is usually a sparse file though not guaranteed. The actually disk size required is probably smaller then expected, but we must prepare in advance.
All-in-one modification to Helm YAML.
# kong-values.yaml
deployment:
prefixDir:
sizeLimit: 4Gi
initContainers:
- name: sysctl
image: busybox
securityContext:
privileged: true
command: ["sh", "-c", "sysctl -w kernel.yama.ptrace_scope=0"]
image:
repository: kong-wp
tag: "3.5.0.1"Usually, we wait for the memory consumption to be large enough before generating the core dump. A small memeory footprint does not reveal memory issues.
~ $ kubectl exec -n kong kong-enterprise-dbless-kong-795d58d9c6-2sqhn -c proxy -it -- bashThe default prefix directory /usr/local/kong, and change it according to the Helm YAML (/kong_prefix/ for this case). This will allow the writing of the core dump file as the user kong.
~ $ cd /kong_prefix/Ensure its value is 0 in the container.
~ $ sysctl kernel.yama.ptrace_scopeBefore generating the core dump file, ensure enough disk space is prepared for the core dump. We check different partitions as follows.
~ $ df -hFind the worker that consumes the largest memory.
kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ ps aux | grep [n]ginx
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kong 1 0.0 1.2 530080 98128 ? Ss 11:05 0:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx -p /kong_prefix/ -c nginx.conf
kong 2383 0.2 7.0 1104492 573772 ? S 11:05 0:03 nginx: worker process
kong 2384 0.1 7.0 1102904 572560 ? S 11:05 0:02 nginx: worker processWe will use 2383 for this example.
The gcore command will generate the core dump file.
kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ gcore -o /kong_prefix/wp-core 2383
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000ffff946d6028 in epoll_pwait () from /lib/aarch64-linux-gnu/libc.so.6
Saved corefile core.2383
[Inferior 1 (process 2383) detached]
kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ ls -lrta wp-core*
-rw-r--r-- 1 kong kong 565M Jun 4 11:33 wp-core.2383The core dump file is saved to the kong_prefix directory. Let's compress it to a smaller size.
kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ gzip wp-core.2383
kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ ls -lrta wp-core*
-rw-r--r-- 1 kong kong 56M Jun 4 11:33 wp-core.2383.gzCopy the core dump file from container to node. The first two commands requires the tar exists in the target container.
kubectl cp -n kong kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix/wp-core.2383.gz -c proxy /tmp/wp-core.2383.gz
# -or-
kubectl exec -n kong kong/kong-enterprise-dbless-kong-795d58d9c6-2sqhn -c proxy -- tar cf - /kong_prefix/wp-core.2383.gz | tar xf - -C /tmp/wp-core.2383.gz
# -or-
kubectl exec -n kong kong/kong-enterprise-dbless-kong-795d58d9c6-2sqhn -c proxy -- cat /kong_prefix/wp-core.2383.gz > /tmp/wp-core.2383.gzVerify the file format is right.
ubuntu@ip-172-31-9-194:~/misc$ file wp-core.2383
wp-core.2383: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'nginx: worker prnginx: worker process', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: '/usr/local/openresty/nginx/sbin/nginx', platform: 'x86_64'
Reference: https://gist.github.com/ashman1984/975b984ae0994781306e346f09d43af7