Skip to content

Instantly share code, notes, and snippets.

@skoenig
skoenig / _optimizing-sysctl-settings-for-gke-cluster-scalability.md
Last active February 15, 2024 15:17
How to change sysctl settings on GKE Nodes: resolving Kubernetes network issues caused by ARP overflow

Optimizing Sysctl Settings for GKE Cluster Scalability

The ARP cache limits control the number of IP-to-MAC address pairs stored in the lookup table on each system. When the cache exceeds its capacity, it can lead to [severe incidents involving connection resets and timeouts][1]. If you encounter error messages like 'neighbour: arp_cache: neighbor table overflow!' in the logs, it indicates that the ARP entries have surpassed the limit.

To solve this, the gc_thresh3 value, which represents the hard maximum number of entries to keep in the ARP cache, needs to be raised above the default value of 1024 in GKE.

Unfortunately, in GKE, [most of the system settings on the cluster nodes cannot be directly configured][2]. To overcome this limitation, you can [utilize DaemonSets][3]. They provide a convenient way to apply consistent system settings to all nodes in the cluster.

To configure system settings using DaemonSets:

#!/usr/bin/python3
# (c) Martin von Gagern 2014
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.