skoenig/_optimizing-sysctl-settings-for-gke-cluster-scalability.md

## _optimizing-sysctl-settings-for-gke-cluster-scalability.md

      
    Raw
  

              _optimizing-sysctl-settings-for-gke-cluster-scalability.md
            
          
    Optimizing Sysctl Settings for GKE Cluster Scalability

The ARP cache limits control the number of IP-to-MAC address pairs stored in the lookup table on each system. When the cache exceeds its capacity, it can lead to severe incidents involving connection resets and timeouts. If you encounter error messages like 'neighbour: arp_cache: neighbor table overflow!' in the logs, it indicates that the ARP entries have surpassed the limit.
To solve this, the gc_thresh3 value, which represents the hard maximum number of entries to keep in the ARP cache, needs to be raised above the default value of 1024 in GKE.
Unfortunately, in GKE, most of the system settings on the cluster nodes cannot be directly configured. To overcome this limitation, you can utilize DaemonSets. They provide a convenient way to apply consistent system settings to all nodes in the cluster.
To configure system settings using DaemonSets:

Create a DaemonSet definition.
Specify the desired system settings by following the example in this definition file.
Apply the DaemonSet to the GKE cluster.

While this is just one setting, there are many others that can further improve the availability of your cluster under peak load:
# Increase the available connection range
net.ipv4.ip_local_port_range=1024-65000

# Improve socket reuse for closed sockets
net.ipv4.tcp_tw_reuse=1  # GKE default: 0
net.ipv4.tcp_fin_timeout=15  # GKE default: 60

# The maximum number of "backlogged sockets/packets"
net.core.somaxconn=4096  # GKE default: 1024
net.core.netdev_max_backlog=4096  # GKE default: 1000

# 16MB per socket - which sounds like a lot, but will virtually never consume that much.
net.core.rmem_max=16777216  # GKE default: 212992
net.core.wmem_max=16777216  # GKE default: 212992

# Various network tunables
net.ipv4.tcp_max_syn_backlog=20480  # GKE default: 1024
net.ipv4.tcp_max_tw_buckets=400000  # GKE default: 65536
net.ipv4.tcp_no_metrics_save=1  # GKE default: 0
net.ipv4.tcp_syn_retries=2  # GKE default: 6
net.ipv4.tcp_synack_retries=2  # GKE default: 5

Feel free to modify these settings based on your specific requirements and apply them to your GKE cluster for optimal scalability.

  
## node-sysctl-config.yaml
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-sysctl-config
spec:
  selector:
    matchLabels:
      name: node-sysctl-config
  template:
    metadata:
      labels:
        name: node-sysctl-config
    spec:
      hostNetwork: true
      containers:
      - name: node-sysctl-config
        image: ubuntu
        securityContext:
          privileged: true
        command:
        - /bin/sh
        - -c
        - |
          sysctl net/ipv4/neigh/default/gc_thresh1=256 \
          && sysctl net/ipv4/neigh/default/gc_thresh2=1024 \
          && sysctl net/ipv4/neigh/default/gc_thresh3=2048 \
          && sleep 31536000000
          # the sleep is basically forever, to prevent DaemonSet termination
	---
	apiVersion: apps/v1
	kind: DaemonSet
	metadata:
	name: node-sysctl-config
	spec:
	selector:
	matchLabels:
	name: node-sysctl-config
	template:
	metadata:
	labels:
	name: node-sysctl-config
	spec:
	hostNetwork: true
	containers:
	- name: node-sysctl-config
	image: ubuntu
	securityContext:
	privileged: true
	command:
	- /bin/sh
	- -c
	- \|
	sysctl net/ipv4/neigh/default/gc_thresh1=256 \
	&& sysctl net/ipv4/neigh/default/gc_thresh2=1024 \
	&& sysctl net/ipv4/neigh/default/gc_thresh3=2048 \
	&& sleep 31536000000
	# the sleep is basically forever, to prevent DaemonSet termination