Skip to content

Instantly share code, notes, and snippets.

@cfarquhar
Last active July 15, 2020 18:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cfarquhar/20fddf2000a83216021bd15b512f772b to your computer and use it in GitHub Desktop.
Save cfarquhar/20fddf2000a83216021bd15b512f772b to your computer and use it in GitHub Desktop.

Summary

Race condition while processing security_groups_member_updated events (ipset)

Overview

We have a customer that uses heat templates to deploy large environments (e.g. 21 instances) with a significant number of security groups (e.g. 60) that use bi-directional remote group references for both ingress and egress filtering. These heat stacks are deployed using a CI pipeline and intermittently suffer from application layer failures due to broken network connectivity. We found that this was caused by the ipsets used to implement remote_group memberships missing IPs from their member lists. Troubleshooting suggests this is caused by a race condition, which I've attempted to describe in detail below.

Version: 54e1a6b1bc378c0745afc03987d0fea241b826ae (HEAD of stable/rocky as of Jan 26, 2020), though I suspect this issue persists through master.

I'm working on getting some multi-node environments deployed (I don't think it's possible to reproduce this with a single hypervisor) and hope to provide reproduction steps on Rocky and master soon. I wanted to get this report submitted as-is with the hopes that an experienced Neutron dev might be able to spot possible solutions or provide diagnostic insight that I am not yet able to produce.

I suspect this report may be easier to read with some markdown, so please feel free to read it in a gist: https://gist.github.com/cfarquhar/20fddf2000a83216021bd15b512f772b

Also, this diagram is probably critical to following along: https://user-images.githubusercontent.com/1253665/87317744-0a75b180-c4ed-11ea-9bad-085019c0f954.png

Race condition symptoms

Given the following security groups/rules:

| secgroup name | secgroup id                          | direction | remote group                         | dest port |
|---------------|--------------------------------------|-----------|--------------------------------------|-----------|
| server        | fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | ingress   | b52c8c54-b97a-477d-8b68-f4075e7595d9 | 9092      |
| client        | b52c8c54-b97a-477d-8b68-f4075e7595d9 | egress    | fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | 9092      |

And the following instances:

| instance name | hypervisor | ip          | secgroup assignment |
|---------------|------------|-------------|---------------------|
| server01      | compute01  | 192.168.0.1 | server              |
| server02      | compute02  | 192.168.0.2 | server              |
| server03      | compute03  | 192.168.0.3 | server              |
| client01      | compute04  | 192.168.0.4 | client              |

We would expect to find the following ipset representing the server security group members on compute04:

# ipset list NIPv4fcd6cf12-2ac9-4704-9208-
Name: NIPv4fcd6cf12-2ac9-4704-9208-
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 536
References: 4
Number of entries: 3
Members:
192.168.0.1
192.168.0.2
192.168.0.3

What we actually get when the race condition is triggered is an incomplete list of members in the ipset. The member list could contain anywhere between zero and two of the expected IPs.

Triggering the race condition

The problem occurs when security_group_member_updated events arrive between port_update steps 12 and 22 (see diagram and process details below).

  • port_update step 12 retrieves the remote security groups' member lists, which are not necessarily complete yet.
  • port_update step 22 adds the port to IptablesFirewallDriver.ports().

This results in security_group_member_updated step 3 looking for the port to apply the updated member list to (in IptablesFirewallDriver.ports()) BEFORE it has been added by port_update's step 22. This causes the membership update event to effectively be discarded. We are then left with whatever the remote security group's member list was when the port_update process retrieved it at step 12. This state persists until something triggers the port being re-added to the updated_ports list (e.g. agent restart, another remote group membership change, local security group addition/removal, etc).

Race condition details

The race condition occurs in the linuxbridge agent between the two following operations:

  1. Processing a port_update event when an instance is first created
  2. Processing security_group_member_updated events for the instance's remote security groups.

Either of these operations can result in creating or mutating an ipset from IpsetManager.set_members(). The relevant control flow sequence for each operation is listed below. I've left out any branches that did not seem to be relevant to the race condition.

Processing a port_update event:

  1. We receive an RPC port_update event via LinuxBridgeRpcCallbacks.port_update(), which adds the tap device to the LinuxBridgeRpcCallbacks.updated_devices list
  2. Sleep until the next CommonAgentLoop.daemon_loop() iteration
  3. CommonAgentLoop.daemon_loop() calls CommonAgentLoop.scan_devices()
  4. CommonAgentLoop.scan_devices() calls LinuxBridgeRpcCallbacks.get_and_clear_updated_devices() to retrieve and clear the updated_devices list from step 1.
  5. CommonAgentLoop.scan_devices() performs some calculations and returns control to CommonAgentLoop.daemon_loop() along with a list of added or updated devices.
  6. CommonAgentLoop.daemon_loop() calls CommonAgentLoop.process_network_devices() when there are added or updated devices from step 5.
  7. CommonAgentLoop.process_network_devices() calls SecurityGroupAgentRpc.setup_port_filters() with the added or updated devices from step 5.
  8. SecurityGroupAgentRpc.setup_port_filters() checks for devices that were added to the SecurityGroupAgentRpc.devices_to_refilter list (which happens in step 4 of the security_group_member_updated process) and appends them to the list of updated devices. This is where devices from the security_group_member_updated process would have been processed if they weren't lost due to the race condition.
  9. SecurityGroupAgentRpc.setup_port_filters() calls SecurityGroupAgentRpc.prepare_devices_filter() with the updated port IDs
  10. SecurityGroupAgentRpc.prepare_devices_filter() calls SecurityGroupAgentRpc._apply_port_filter() with the updated port IDs
  11. SecurityGroupAgentRpc._apply_port_filter() calls SecurityGroupServerRpcApi.security_group_info_for_devices() with the tap device id
  12. SecurityGroupServerRpcApi.security_group_info_for_devices() retrieves detailed information about the port via RPC. The important detail here is a sg_member_ids dict which contains a list of member IPs for each remote security groups applicable to the port.
  13. SecurityGroupServerRpcApi.security_group_info_for_devices() returns control to SecurityGroupAgentRpc._apply_port_filter() along with the port and security group details.
  14. SecurityGroupAgentRpc._apply_port_filter() calls SecurityGroupAgentRpc._update_security_group_info() with the remote security groups and their member IP list.
  15. SecurityGroupAgentRpc._update_security_group_info() iterates over the remote security groups and calls IptablesFirewallDriver.update_security_group_members() for each one
  16. IptablesFirewallDriver.update_security_group_members() calls IptablesFirewallDriver._update_ipset_members()
  17. IptablesFirewallDriver._update_ipset_members() calls IpsetManager.set_members()
  18. IpsetManager.set_members() calls a number of methods to create or mutate the ipset using linux's ipset commands.
  19. The stack unwinds back up to SecurityGroupAgentRpc._apply_port_filter() (last seen in step 11)
  20. SecurityGroupAgentRpc._apply_port_filter() calls IptablesFirewallDriver.prepare_port_filter() with the port
  21. IptablesFirewallDriver.prepare_port_filter() calls IptablesFirewallDriver._set_ports() with the port details retrieved in step 12
  22. IptablesFirewallDriver._set_ports() adds the port to IptablesFirewallDriver.filtered_ports, which indicates the port is known to exist on the hypervisor. It will now be returned by IptablesFirewallDriver.ports().

Processing a security_group_member_updated event:

  1. We receive an RPC security_group_member_updated event via SecurityGroupAgentRpc.security_groups_member_updated()
  2. SecurityGroupAgentRpc.security_groups_member_updated() calls SecurityGroupAgentRpc._security_group_updated() with the list of security groups whose membership was updated.
  3. SecurityGroupAgentRpc._security_group_updated() consults IptablesFirewallDriver.ports() to find ports affected by the membership updates. In the race condition, port_update has not yet reached step 22 to add the port we need to find, so we stop here and the following step does not occur.
  4. SecurityGroupAgentRpc._security_group_updated() adds each affected port from the previous step to the SecurityGroupAgentRpc.devices_to_refilter list. This list will be processed next time we reach port_event step 8, but in the race condition we never get here.
@cfarquhar
Copy link
Author

cfarquhar commented Jul 10, 2020

ipset race condition diagram 20200713a

@cfarquhar
Copy link
Author

cfarquhar commented Jul 13, 2020

@cfarquhar
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment