Skip to content

Instantly share code, notes, and snippets.

@raggi
Last active May 10, 2024 17:11
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save raggi/1f8d0b9f45c5b62e7131b03e6e2ffe68 to your computer and use it in GitHub Desktop.
Save raggi/1f8d0b9f45c5b62e7131b03e6e2ffe68 to your computer and use it in GitHub Desktop.
systemd-networkd: Could not set DHCPv4 address: Connection timed out

systemd-networkd Could not set DHCPv4 address: Connection timed out

This bug has reproduced for me on systemd 245 (245.4-4ubuntu3.20). Looking in the source code, the bug appears to still be present (https://github.com/systemd/systemd/blob/45a6a2aace8315137b648193a8265997b3c267fb/src/network/networkd-dhcp4.c#L781). The case of handling a timeout of the netlink reconfiguration stage of a DHCPv4 refresh does not yet appear to be covered.

Steps to reproduce

  1. Configure a machine with a DHCPv4 lease on a network with a DHCPv4 server.
  2. Place machine under unusual load sufficient to cause a timeout on netlink requests.
  3. Observe the interface failing with the following logs:
systemd-networkd[139370]: eth0: Could not set DHCPv4 address: Connection timed out
systemd-networkd[139370]: eth0: Failed

It appears to be much easier/more common to produce this situation with unusually high load in a credit based virualized compute environment. Other users have discussed instances of this problem on both AWS and GCP:

Many of these issues are reported co-incident with OOM events, storage full, and so on, but those co-incidents are distracting and may provide some demonstration that the systems in question are under load and may well be running out of compute credits. The timeout in question should not be affected by storage pressure, nor by OOM unless the OOM terminates systemd-networkd that would result in different symptoms.

Expected behavior

The DHCPv4 client should retry the lease refresh when the issue is a timeout, eventually succeeding in these scenarios.

Actual behavior

Permanent loss of connectivity on the affected interface.

Additional information

In the reproduction condition, the DHCPv4 configuration is also set to preserve addresses on other failure modes. The code path for the address set timeout passes through link_enter_failed that unconditionally passes may_keep_dhcp:false. It is likely desirable for most of these cases for this kind of potentially transient condition to retain the addresses through the retry process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment