Race condition while processing security_groups_member_updated events (ipset)
We have a customer that uses heat templates to deploy large environments (e.g. 21 instances) with a significant number of security groups (e.g. 60) that use bi-directional remote group references for both ingress and egress filtering. These heat stacks are deployed using a CI pipeline and intermittently suffer from application layer failures due to broken network connectivity. We found that this was caused by the ipsets used to implement remote_group memberships missing IPs from their member lists. Troubleshooting suggests this is caused by a race condition, which I've attempted to describe in detail below.
Version: 54e1a6b1bc378c0745afc03987d0fea241b826ae
(HEAD of stable/rocky as of Jan 26, 2020), though I suspect this issue persists through master.
I'm working on getting some multi-node environments deployed (I don't think it's possible to reproduce this with a single hypervisor) and hope to provide reproduction steps on Rocky and master soon. I wanted to get this report submitted as-is with the hopes that an experienced Neutron dev might be able to spot possible solutions or provide diagnostic insight that I am not yet able to produce.
I suspect this report may be easier to read with some markdown, so please feel free to read it in a gist: https://gist.github.com/cfarquhar/20fddf2000a83216021bd15b512f772b
Also, this diagram is probably critical to following along: https://user-images.githubusercontent.com/1253665/87317744-0a75b180-c4ed-11ea-9bad-085019c0f954.png
Given the following security groups/rules:
| secgroup name | secgroup id | direction | remote group | dest port |
|---------------|--------------------------------------|-----------|--------------------------------------|-----------|
| server | fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | ingress | b52c8c54-b97a-477d-8b68-f4075e7595d9 | 9092 |
| client | b52c8c54-b97a-477d-8b68-f4075e7595d9 | egress | fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | 9092 |
And the following instances:
| instance name | hypervisor | ip | secgroup assignment |
|---------------|------------|-------------|---------------------|
| server01 | compute01 | 192.168.0.1 | server |
| server02 | compute02 | 192.168.0.2 | server |
| server03 | compute03 | 192.168.0.3 | server |
| client01 | compute04 | 192.168.0.4 | client |
We would expect to find the following ipset representing the server
security group members on compute04
:
# ipset list NIPv4fcd6cf12-2ac9-4704-9208-
Name: NIPv4fcd6cf12-2ac9-4704-9208-
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 536
References: 4
Number of entries: 3
Members:
192.168.0.1
192.168.0.2
192.168.0.3
What we actually get when the race condition is triggered is an incomplete list of members in the ipset. The member list could contain anywhere between zero and two of the expected IPs.
The problem occurs when security_group_member_updated
events arrive between port_update
steps 12 and 22 (see diagram and process details below).
port_update
step 12 retrieves the remote security groups' member lists, which are not necessarily complete yet.port_update
step 22 adds the port toIptablesFirewallDriver.ports()
.
This results in security_group_member_updated
step 3 looking for the port to apply the updated member list to (in IptablesFirewallDriver.ports()
) BEFORE it has been added by port_update
's step 22. This causes the membership update event to effectively be discarded. We are then left with whatever the remote security group's member list was when the port_update
process retrieved it at step 12. This state persists until something triggers the port being re-added to the updated_ports
list (e.g. agent restart, another remote group membership change, local security group addition/removal, etc).
The race condition occurs in the linuxbridge agent between the two following operations:
- Processing a
port_update
event when an instance is first created - Processing
security_group_member_updated
events for the instance's remote security groups.
Either of these operations can result in creating or mutating an ipset from IpsetManager.set_members()
. The relevant control flow sequence for each operation is listed below. I've left out any branches that did not seem to be relevant to the race condition.
- We receive an RPC port_update event via
LinuxBridgeRpcCallbacks.port_update()
, which adds the tap device to theLinuxBridgeRpcCallbacks.updated_devices
list - Sleep until the next
CommonAgentLoop.daemon_loop()
iteration CommonAgentLoop.daemon_loop()
callsCommonAgentLoop.scan_devices()
CommonAgentLoop.scan_devices()
callsLinuxBridgeRpcCallbacks.get_and_clear_updated_devices()
to retrieve and clear theupdated_devices
list from step 1.CommonAgentLoop.scan_devices()
performs some calculations and returns control toCommonAgentLoop.daemon_loop()
along with a list of added or updated devices.CommonAgentLoop.daemon_loop()
callsCommonAgentLoop.process_network_devices()
when there are added or updated devices from step 5.CommonAgentLoop.process_network_devices()
callsSecurityGroupAgentRpc.setup_port_filters()
with the added or updated devices from step 5.SecurityGroupAgentRpc.setup_port_filters()
checks for devices that were added to theSecurityGroupAgentRpc.devices_to_refilter
list (which happens in step 4 of thesecurity_group_member_updated
process) and appends them to the list of updated devices. This is where devices from thesecurity_group_member_updated
process would have been processed if they weren't lost due to the race condition.SecurityGroupAgentRpc.setup_port_filters()
callsSecurityGroupAgentRpc.prepare_devices_filter()
with the updated port IDsSecurityGroupAgentRpc.prepare_devices_filter()
callsSecurityGroupAgentRpc._apply_port_filter()
with the updated port IDsSecurityGroupAgentRpc._apply_port_filter()
callsSecurityGroupServerRpcApi.security_group_info_for_devices()
with the tap device idSecurityGroupServerRpcApi.security_group_info_for_devices()
retrieves detailed information about the port via RPC. The important detail here is asg_member_ids
dict which contains a list of member IPs for each remote security groups applicable to the port.SecurityGroupServerRpcApi.security_group_info_for_devices()
returns control toSecurityGroupAgentRpc._apply_port_filter()
along with the port and security group details.SecurityGroupAgentRpc._apply_port_filter()
callsSecurityGroupAgentRpc._update_security_group_info()
with the remote security groups and their member IP list.SecurityGroupAgentRpc._update_security_group_info()
iterates over the remote security groups and callsIptablesFirewallDriver.update_security_group_members()
for each oneIptablesFirewallDriver.update_security_group_members()
callsIptablesFirewallDriver._update_ipset_members()
IptablesFirewallDriver._update_ipset_members()
callsIpsetManager.set_members()
IpsetManager.set_members()
calls a number of methods to create or mutate the ipset using linux'sipset
commands.- The stack unwinds back up to
SecurityGroupAgentRpc._apply_port_filter()
(last seen in step 11) SecurityGroupAgentRpc._apply_port_filter()
callsIptablesFirewallDriver.prepare_port_filter()
with the portIptablesFirewallDriver.prepare_port_filter()
callsIptablesFirewallDriver._set_ports()
with the port details retrieved in step 12IptablesFirewallDriver._set_ports()
adds the port toIptablesFirewallDriver.filtered_ports
, which indicates the port is known to exist on the hypervisor. It will now be returned by IptablesFirewallDriver.ports().
- We receive an RPC security_group_member_updated event via
SecurityGroupAgentRpc.security_groups_member_updated()
SecurityGroupAgentRpc.security_groups_member_updated()
callsSecurityGroupAgentRpc._security_group_updated()
with the list of security groups whose membership was updated.SecurityGroupAgentRpc._security_group_updated()
consultsIptablesFirewallDriver.ports()
to find ports affected by the membership updates. In the race condition,port_update
has not yet reached step 22 to add the port we need to find, so we stop here and the following step does not occur.SecurityGroupAgentRpc._security_group_updated()
adds each affected port from the previous step to theSecurityGroupAgentRpc.devices_to_refilter
list. This list will be processed next time we reachport_event
step 8, but in the race condition we never get here.