Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save skoenig/a44dffdde475056d27ecef026cd2f9a0 to your computer and use it in GitHub Desktop.
Save skoenig/a44dffdde475056d27ecef026cd2f9a0 to your computer and use it in GitHub Desktop.
How to change sysctl settings on GKE Nodes: resolving Kubernetes network issues caused by ARP overflow

Optimizing Sysctl Settings for GKE Cluster Scalability

The ARP cache limits control the number of IP-to-MAC address pairs stored in the lookup table on each system. When the cache exceeds its capacity, it can lead to severe incidents involving connection resets and timeouts. If you encounter error messages like 'neighbour: arp_cache: neighbor table overflow!' in the logs, it indicates that the ARP entries have surpassed the limit.

To solve this, the gc_thresh3 value, which represents the hard maximum number of entries to keep in the ARP cache, needs to be raised above the default value of 1024 in GKE.

Unfortunately, in GKE, most of the system settings on the cluster nodes cannot be directly configured. To overcome this limitation, you can utilize DaemonSets. They provide a convenient way to apply consistent system settings to all nodes in the cluster.

To configure system settings using DaemonSets:

  1. Create a DaemonSet definition.
  2. Specify the desired system settings by following the example in this definition file.
  3. Apply the DaemonSet to the GKE cluster.

While this is just one setting, there are many others that can further improve the availability of your cluster under peak load:

# Increase the available connection range
net.ipv4.ip_local_port_range=1024-65000

# Improve socket reuse for closed sockets
net.ipv4.tcp_tw_reuse=1  # GKE default: 0
net.ipv4.tcp_fin_timeout=15  # GKE default: 60

# The maximum number of "backlogged sockets/packets"
net.core.somaxconn=4096  # GKE default: 1024
net.core.netdev_max_backlog=4096  # GKE default: 1000

# 16MB per socket - which sounds like a lot, but will virtually never consume that much.
net.core.rmem_max=16777216  # GKE default: 212992
net.core.wmem_max=16777216  # GKE default: 212992

# Various network tunables
net.ipv4.tcp_max_syn_backlog=20480  # GKE default: 1024
net.ipv4.tcp_max_tw_buckets=400000  # GKE default: 65536
net.ipv4.tcp_no_metrics_save=1  # GKE default: 0
net.ipv4.tcp_syn_retries=2  # GKE default: 6
net.ipv4.tcp_synack_retries=2  # GKE default: 5

Feel free to modify these settings based on your specific requirements and apply them to your GKE cluster for optimal scalability.

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-sysctl-config
spec:
selector:
matchLabels:
name: node-sysctl-config
template:
metadata:
labels:
name: node-sysctl-config
spec:
hostNetwork: true
containers:
- name: node-sysctl-config
image: ubuntu
securityContext:
privileged: true
command:
- /bin/sh
- -c
- |
sysctl net/ipv4/neigh/default/gc_thresh1=256 \
&& sysctl net/ipv4/neigh/default/gc_thresh2=1024 \
&& sysctl net/ipv4/neigh/default/gc_thresh3=2048 \
&& sleep 31536000000
# the sleep is basically forever, to prevent DaemonSet termination
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment