Skip to content

Instantly share code, notes, and snippets.

@outsinre
Last active June 3, 2025 02:59
Show Gist options
  • Select an option

  • Save outsinre/d2b58b289425fbdd2d0f0294f3fdf0c9 to your computer and use it in GitHub Desktop.

Select an option

Save outsinre/d2b58b289425fbdd2d0f0294f3fdf0c9 to your computer and use it in GitHub Desktop.
Generate core dump within K8s Environment

Process Trace

When debugging a tracee process (i.e. Nginx worker), it needs to be attached to the tracer process (e.g. gcore or gdb).

This post explains how to build a custom Kong image and generate core dump of Nginx workers with gcore.

Procedures

1. Create Dockerfile

Create a Dockerfile and save it as wp-ubuntu-3501.Dockerfile as follows.

# wp-ubuntu-3501.Dockerfile

FROM kong/kong-gateway:3.5.0.1

SHELL ["/bin/bash", "-o", "pipefail", "-c"]
USER root

LABEL com.example.description="Kong Gateway Development" \
      com.example.department="Team Sustaining" \
      com.example.builder="Zachary Hu (zachary.hu@konghq.com)"

RUN <<-EOF
set -ex
apt-get update
apt-get install -y \
  gdb \
  jq \
  ncat \
  iproute2 \
  less
EOF

USER kong

We add the essential package gdb which includes the gdb and gcore commands. The gcore command is the one that generates core dumps.

Other packages are optional and only serve as a debug auxiliary.

2. Build Image

Run the docker build command to build a custom image.

~ $ docker buildx build --no-cache --load -t 'kong-wp:3.5.0.1' -f wp-ubuntu-3501.Dockerfile <dir-of-the-wp-ubuntu-3501.Dockerfile>

Verify if the image is successfully built.

~ $ docker images 'kong-wp'

Once the image has been built, the image will need uploading to the repo that the k8s cluster uses. This can be achieved by docker image save/load and/or docker image push/pull. For example, we can dump the image to a tar file for network transmission as below, which can be, later on, loaded to Docker daemon.

~ $ docker image save -o kong-wp-3501.tar kong-wp:3.5.0.1

~ $ docker image load -i kong-wp-3501.tar

3. Update Helm Deployment

In order for the Kong container to generate core dump, we need a few adjustments to the Helm deplyment YAML file kong-values.yaml (attached) prior to Kong startup.

3.1 Reference the Custom Image

We should use the newly built image.

# kong-values.yaml

image:
  repository: kong-wp
  tag: "3.5.0.1"

3.2 Update kernel.yama.ptrace_scope

Kernel parameters in the containerized environment (e.g. K8s) can be categorized into safe and unsafe.

  • Safe parameteres refer to those that are enabled by default.

    Only a limited set of paramters are safe.

  • Unsafe parameters refer to those that are disabled by default. Unsafe parameters must be enabled before value setting.

    Most parameters are unsafe.

Kernel parameters in the containerized environment (e.g. K8s) can also be categorized into namespaced and non-namespaced.

  • Namespaced parameters refer to those that are set on a per-pod basis – pod-level isolation. For instance, a value set for containers of one pod does not interfere with containers of other pods. Of course, such parameters can also be set on a per-node basis. But for security concerns, we do not do this.

    Namespaced parameters can be safe or unsafe. Namespaced unsafe paramters must be enabled via the option allowed-unsafe-sysctls before value setting.

    We set namespaced parameters via the securityContext.sysctls of a pod.

  • Non-namespaced parameters are intuitively unsafe (global and not restricted to a namespace). Therefore, they can only be set on a per-node basis – node-level paramters (via the node's OS). They are hence disabled by default.

    We utilize privileged initContainer to enable and set non-spaced parameters.

Please be kindly reminded that gcore depends on system call ptrace() and is hence subject to the kernel.yama.ptrace_scope kernel parameter. This kernel parameter happens to be non-namespaced (and hence unsafe). We must enable it and then set it to 0 for ptrace(), as long as the worker process and the gcore process are ran by the same uid. The default value 1 only allows ptrace() between parent and child processes.

# kong-values.yaml

deployment:
  initContainers:
    - name: sysctl
      image: busybox
      securityContext:
        privileged: true
      command: ["sh", "-c", "sysctl -w kernel.yama.ptrace_scope=0"]

Refer to Using sysctls in a Kubernetes Cluster.

3.3 Update Disk Space

The core dump is a binary file representing the the memory footprint of the virtual memory space. We must prepare, at a minimum, disk space of the same size as the virtual memory to hold the file.

We check the virtual memory VIRT as follows.

~ $ top -o +RES -p $(pgrep -d',' nginx)

top - 15:23:09 up 46 days,  8:36,  0 users,  load average: 0.12, 0.10, 0.09
Tasks:   5 total,   0 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7636.7 total,   2460.2 free,   1697.9 used,   3478.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   5609.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  75557 kong      20   0 4729348 417644  70784 S   0.0   5.3   0:16.11 nginx
  75556 kong      20   0 4717280 403284  68736 S   0.0   5.2   0:15.28 nginx
  75559 kong      20   0 4708540 394960  69120 S   0.0   5.1   0:13.74 nginx
  75558 kong      20   0 4707876 393268  68224 S   0.0   5.0   0:14.03 nginx
  75555 kong      20   0 2368944  87192    640 S   0.0   1.1   0:00.01 nginx
     
~ $ ps auxww | grep '[n]ginx\|[U]SER'

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
kong       75555  0.0  1.1 2368944 87192 ?       Ss   14:53   0:00 nginx: master process /usr/local/bin/nginx -p /usr/local/kong -c nginx.conf
kong       75556  0.8  5.1 4717280 403284 ?      S    14:53   0:15 nginx: worker process
kong       75557  0.8  5.3 4729348 417644 ?      S    14:53   0:16 nginx: worker process
kong       75558  0.7  5.0 4707876 393268 ?      S    14:53   0:14 nginx: worker process
kong       75559  0.7  5.0 4708540 394960 ?      S    14:53   0:13 nginx: worker process

Pay attention to the RES of top and RSS of ps. They refer to the same thing, which is reflected by the fact that both values has the same %MEM.

We will generate the core dump file under the kong user account and save the core dump file to the kong_prefix directory as kong has write permission to this direcotry. If the worker process has several gigabytes of memory allocated, then the same sized space is needed on that directory.

We must spare enough disk space in advance. If the kong_prefix directory is a mount point, make the following adjustment to the Helm YAML.

# kong-values.yaml

deployment:
  prefixDir:
   sizeLimit: 4Gi

Core dump is usually a sparse file though not guaranteed. The actually disk size required is probably smaller then expected, but we must prepare in advance.

3.4 Helm Deployment Overview

All-in-one modification to Helm YAML.

# kong-values.yaml

deployment:
  prefixDir:
    sizeLimit: 4Gi
  initContainers:
    - name: sysctl
      image: busybox
      securityContext:
        privileged: true
      command: ["sh", "-c", "sysctl -w kernel.yama.ptrace_scope=0"]

image:
  repository: kong-wp
  tag: "3.5.0.1"

4. Generate Core Dump

Usually, we wait for the memory consumption to be large enough before generating the core dump. A small memeory footprint does not reveal memory issues.

4.1 Exec Into Kong Container

~ $ kubectl exec -n kong kong-enterprise-dbless-kong-795d58d9c6-2sqhn -c proxy -it -- bash

4.2 Change to kong_prefix Directory

The default prefix directory /usr/local/kong, and change it according to the Helm YAML (/kong_prefix/ for this case). This will allow the writing of the core dump file as the user kong.

~ $ cd /kong_prefix/

4.3 Verify kernel.yama.ptrace_scope

Ensure its value is 0 in the container.

~ $ sysctl kernel.yama.ptrace_scope

4.4 Check Disk Space

Before generating the core dump file, ensure enough disk space is prepared for the core dump. We check different partitions as follows.

~ $ df -h

4.5 Identify the Worker PID

Find the worker that consumes the largest memory.

kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ ps aux | grep [n]ginx
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
kong           1  0.0  1.2 530080 98128 ?        Ss   11:05   0:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx -p /kong_prefix/ -c nginx.conf
kong        2383  0.2  7.0 1104492 573772 ?      S    11:05   0:03 nginx: worker process
kong        2384  0.1  7.0 1102904 572560 ?      S    11:05   0:02 nginx: worker process

We will use 2383 for this example.

4.6 Run gcore Command

The gcore command will generate the core dump file.

kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ gcore -o /kong_prefix/wp-core 2383

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000ffff946d6028 in epoll_pwait () from /lib/aarch64-linux-gnu/libc.so.6
Saved corefile core.2383
[Inferior 1 (process 2383) detached]

kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ ls -lrta wp-core*
-rw-r--r-- 1 kong kong 565M Jun  4 11:33 wp-core.2383

4.7 Copy core dump File

The core dump file is saved to the kong_prefix directory. Let's compress it to a smaller size.

kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ gzip wp-core.2383

kong@kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix$ ls -lrta wp-core*
-rw-r--r-- 1 kong kong 56M Jun  4 11:33 wp-core.2383.gz

Copy the core dump file from container to node. The first two commands requires the tar exists in the target container.

kubectl cp -n kong kong-enterprise-dbless-kong-795d58d9c6-2sqhn:/kong_prefix/wp-core.2383.gz -c proxy /tmp/wp-core.2383.gz

# -or-
kubectl exec -n kong kong/kong-enterprise-dbless-kong-795d58d9c6-2sqhn -c proxy -- tar cf - /kong_prefix/wp-core.2383.gz | tar xf - -C /tmp/wp-core.2383.gz

# -or-
kubectl exec -n kong kong/kong-enterprise-dbless-kong-795d58d9c6-2sqhn -c proxy -- cat /kong_prefix/wp-core.2383.gz > /tmp/wp-core.2383.gz

Verify the file format is right.

ubuntu@ip-172-31-9-194:~/misc$ file wp-core.2383
wp-core.2383: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'nginx: worker prnginx: worker process', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: '/usr/local/openresty/nginx/sbin/nginx', platform: 'x86_64'
USER-SUPPLIED VALUES:
admin:
annotations: {}
enabled: false
http:
containerPort: 8001
enabled: false
parameters: []
servicePort: 8001
ingress:
annotations: {}
enabled: false
hostname: null
ingressClassName: null
path: /
pathType: ImplementationSpecific
labels: {}
loadBalancerClass: null
tls:
client:
caBundle: ""
secretName: ""
containerPort: 8444
enabled: true
parameters: []
servicePort: 8444
type: NodePort
autoscaling:
behavior: {}
enabled: false
maxReplicas: 24
metrics:
- resource:
name: cpu
target:
averageUtilization: 80
type: Utilization
type: Resource
minReplicas: 6
targetCPUUtilizationPercentage: null
certificates:
admin:
clusterIssuer: ""
commonName: kong.example
dnsNames: []
enabled: false
issuer: ""
cluster:
clusterIssuer: ""
commonName: kong_clustering
dnsNames: []
enabled: false
issuer: ""
clusterIssuer: ""
enabled: false
issuer: ""
portal:
clusterIssuer: ""
commonName: developer.example
dnsNames: []
enabled: false
issuer: ""
proxy:
clusterIssuer: ""
commonName: app.example
dnsNames: []
enabled: false
issuer: ""
cluster:
annotations: null
enabled: false
ingress:
annotations: {}
enabled: false
hostname: null
ingressClassName: null
path: /
pathType: ImplementationSpecific
labels: {}
loadBalancerClass: null
tls:
containerPort: 8005
enabled: false
parameters: []
servicePort: 8005
type: ClusterIP
clusterCaSecretName: ""
clustertelemetry:
annotations: {}
enabled: false
ingress:
annotations: {}
enabled: false
hostname: null
ingressClassName: null
path: /
pathType: ImplementationSpecific
labels: {}
loadBalancerClass: null
tls:
containerPort: 8006
enabled: false
parameters: []
servicePort: 8006
type: ClusterIP
containerSecurityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- NET_RAW
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
dblessConfig:
config: |
_format_version: "1.1"
services:
# Example configuration
# - name: example.com
# url: http://example.com
# routes:
# - name: example
# paths:
# - "/example"
configMap: ""
secret: ""
deployment:
daemonset: false
hostNetwork: false
hostname: ""
kong:
enabled: true
prefixDir:
sizeLimit: 5Gi
serviceAccount:
automountServiceAccountToken: false
create: true
name: kong-kong-proxy
test:
enabled: false
tmpDir:
sizeLimit: 1Gi
userDefinedVolumeMounts:
- mountPath: /etc/ssl/private
name: lbcert
readOnly: true
userDefinedVolumes:
- name: lbcert
secret:
secretName: lbcert
deploymentAnnotations: {}
enterprise:
enabled: true
license_secret: kong-enterprise-license
portal:
enabled: false
rbac:
admin_gui_auth: basic-auth
admin_gui_auth_conf_secret: CHANGEME-admin-gui-auth-conf-secret
enabled: false
session_conf_secret: kong-session-config
smtp:
admin_emails_from: none@example.com
admin_emails_reply_to: none@example.com
auth:
smtp_password_secret: CHANGEME-smtp-password
smtp_username: ""
enabled: false
portal_emails_from: none@example.com
portal_emails_reply_to: none@example.com
smtp_admin_emails: none@example.com
smtp_auth_type: ""
smtp_host: smtp.example.com
smtp_port: 587
smtp_ssl: nil
smtp_starttls: true
vitals:
enabled: true
env:
admin_access_log: "off"
admin_error_log: /dev/stderr
admin_gui_access_log: "off"
admin_gui_error_log: /dev/stderr
anonymous_reports: "off"
database: "off"
dns_valid_ttl: "180"
headers: "off"
lmdb_map_size: 4096m
log_level: notice
mem_cache_size: 1024m
nginx_admin_client_body_buffer_size: 250m
nginx_admin_client_max_body_size: "0"
nginx_http_client_header_buffer_size: 32k
nginx_http_large_client_header_buffers: 8 64k
nginx_http_lua_regex_cache_max_entries: "32768"
nginx_http_proxy_buffer_size: 32k
nginx_http_proxy_buffers: 32 8k
nginx_http_proxy_busy_buffers_size: 64k
nginx_main_worker_rlimit_nofile: "300000"
nginx_worker_processes: "4"
portal_api_access_log: "off"
portal_api_error_log: /dev/stderr
prefix: /kong_prefix/
proxy_access_log: "off"
proxy_error_log: /dev/stderr
router_flavor: traditional
ssl_cert: /etc/ssl/private/lbcert
ssl_cert_key: /etc/ssl/private/lbcert
extraConfigMaps: []
extraLabels:
consul.hashicorp.com/service-ignore: "true"
extraObjects: []
extraSecrets: []
image:
effectiveSemver: null
pullPolicy: IfNotPresent
repository: docker-hub.artifactory.srv.westpac.com.au/kong/kong-gateway
tag: 3.5.0.1
ingressController:
adminApi:
tls:
client:
caSecretName: ""
certProvided: false
enabled: false
secretName: ""
admissionWebhook:
certificate:
provided: false
enabled: true
failurePolicy: Fail
namespaceSelector: {}
port: 8080
service:
labels: {}
timeoutSeconds: 30
args: []
enabled: true
env:
dump_config: true
enable_reverse_sync: "true"
kong_admin_tls_skip_verify: true
proxy_sync_seconds: "180"
proxy_timeout_seconds: "2400"
update_status: "false"
gatewayDiscovery:
adminApiService:
name: ""
namespace: ""
enabled: false
generateAdminApiService: false
image:
effectiveSemver: null
repository: docker-hub.artifactory.srv.westpac.com.au/kong/kubernetes-ingress-controller
tag: 3.0.1
ingressClass: kong
ingressClassAnnotations: {}
konnect:
apiHostname: us.kic.api.konghq.com
enabled: false
license:
enabled: false
runtimeGroupID: ""
tlsClientCertSecretName: konnect-client-tls
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
rbac:
create: true
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: 10254
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 2000m
memory: 8Gi
requests:
cpu: 2000m
memory: 8Gi
watchNamespaces: []
lifecycle:
preStop:
exec:
command:
- kong
- quit
- --wait=100
livenessProbe:
failureThreshold: 3
httpGet:
path: /status
port: status
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
manager:
annotations: {}
enabled: false
http:
containerPort: 8002
enabled: false
parameters: []
servicePort: 8002
ingress:
annotations: {}
enabled: false
hostname: null
ingressClassName: null
path: /
pathType: ImplementationSpecific
labels: {}
loadBalancerClass: null
tls:
containerPort: 8445
enabled: false
parameters:
- http2
servicePort: 8445
type: NodePort
migrations:
annotations:
sidecar.istio.io/inject: false
backoffLimit: null
jobAnnotations: {}
postUpgrade: true
preUpgrade: true
resources: {}
nodeSelector: {}
plugins: {}
podAnnotations:
consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/proxy-config-map: '{ "xds_fetch_timeout_ms": 300000 }'
consul.hashicorp.com/sidecar-proxy-cpu-limit: 4000m
consul.hashicorp.com/sidecar-proxy-cpu-request: 2000m
consul.hashicorp.com/sidecar-proxy-memory-limit: 2048Mi
consul.hashicorp.com/sidecar-proxy-memory-request: 2048Mi
consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: 8443,8080
consul.hashicorp.com/transparent-proxy-exclude-outbound-cidrs: 10.41.248.10,10.102.248.12,10.100.192.235/32,10.104.192.239/32,172.16.0.1/32,10.38.74.14/32,10.39.74.14/32,10.38.74.44/32,10.39.74.44/32,10.100.192.57/32,10.104.192.57/32,10.100.192.157/32,10.104.192.157/32,10.117.36.0/22
consul.hashicorp.com/transparent-proxy-exclude-outbound-ports: 443,9443
kuma.io/gateway: enabled
podDisruptionBudget:
enabled: true
maxUnavailable: 49%
podLabels: {}
podSecurityPolicy:
annotations: {}
enabled: false
labels: {}
spec:
allowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
hostIPC: false
hostNetwork: false
hostPID: false
privileged: false
readOnlyRootFilesystem: true
runAsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- configMap
- secret
- emptyDir
- projected
portal:
annotations: {}
enabled: false
http:
containerPort: 8003
enabled: false
parameters: []
servicePort: 8003
ingress:
annotations: {}
enabled: false
hostname: null
ingressClassName: null
path: /
pathType: ImplementationSpecific
labels: {}
loadBalancerClass: null
tls:
containerPort: 8446
enabled: false
parameters:
- http2
servicePort: 8446
type: NodePort
portalapi:
annotations: {}
enabled: false
http:
containerPort: 8004
enabled: false
parameters: []
servicePort: 8004
ingress:
annotations: {}
enabled: false
hostname: null
ingressClassName: null
path: /
pathType: ImplementationSpecific
labels: {}
loadBalancerClass: null
tls:
containerPort: 8447
enabled: false
parameters:
- http2
servicePort: 8447
type: NodePort
postgresql:
auth:
database: kong
username: kong
enabled: false
image:
tag: 13.11.0-debian-11-r20
service:
ports:
postgresql: "5432"
priorityClassName: ""
proxy:
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthcheck
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
service.beta.kubernetes.io/azure-load-balancer-internal-subnet: t2-a008e0-prd01-nrg01-vnet01-snet10-aks
enabled: true
http:
containerPort: 8000
enabled: false
parameters: []
servicePort: 80
ingress:
annotations: {}
enabled: false
hostname: null
hosts: []
ingressClassName: null
labels: {}
path: /
pathType: ImplementationSpecific
labels:
consul.hashicorp.com/service-ignore: "false"
enable-metrics: "true"
loadBalancerClass: null
loadBalancerIP: 10.120.11.116
nameOverride: ""
stream: []
tls:
containerPort: 8443
enabled: true
parameters:
- backlog=1024
servicePort: 443
type: LoadBalancer
readinessProbe:
failureThreshold: 3
httpGet:
path: /status/ready
port: status
scheme: HTTP
initialDelaySeconds: 300
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
replicaCount: 12
resources:
limits:
cpu: 4000m
memory: 6Gi
requests:
cpu: 4000m
memory: 6Gi
secretVolumes: []
securityContext: {}
serviceMonitor:
enabled: false
status:
enabled: true
http:
containerPort: 8100
enabled: true
parameters: []
tls:
containerPort: 8543
enabled: false
parameters: []
terminationGracePeriodSeconds: 120
tolerations: []
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: kong
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- labelSelector:
matchLabels:
app.kubernetes.io/instance: kong
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
udpProxy:
annotations: {}
enabled: false
labels: {}
loadBalancerClass: null
stream: []
type: LoadBalancer
updateStrategy:
rollingUpdate:
maxSurge: 49%
maxUnavailable: 1
type: RollingUpdate
waitImage:
enabled: true
pullPolicy: IfNotPresent
@outsinre
Copy link
Author

outsinre commented Jun 5, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment