Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cfarquhar/a3a0cd087bd7c8645bb18fa4e48972e3 to your computer and use it in GitHub Desktop.
Save cfarquhar/a3a0cd087bd7c8645bb18fa4e48972e3 to your computer and use it in GitHub Desktop.

Overview

This document contains the steps to reproduce Neutron bug 1887405 using devstack on Victoria.

Devstack setup

I used two Rackspace public cloud VMs. The VMs had a private network (192.168.23.0/24) in addition to their public IPs, but that was probably unnecessary.

Common

Perform these steps on both nodes

useradd -s /bin/bash -d /opt/stack -m stack
echo "stack ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers

su - stack
git clone https://opendev.org/openstack/devstack
cd devstack

Node 1: Combined controller+compute node

Create local.conf in /opt/stack/devstack. Be sure to update HOST_IP with the public IP.

[[local|localrc]]
HOST_IP=<PUBLIC_IP>
LOGFILE=/opt/stack/logs/stack.sh.log
ADMIN_PASSWORD=redacted
DATABASE_PASSWORD=redacted
RABBIT_PASSWORD=redacted
SERVICE_PASSWORD=redacted

# Neturon
FIXED_RANGE=10.4.128.0/20
FLOATING_RANGE=192.168.23.128/25
Q_USE_SECGROUP=True
Q_AGENT=linuxbridge
IP_VERSION=4

# This was required to avoid errors when creating instances.  It may not be
# necessary in other environments.
[[post-config|$NOVA_CONF]]
[libvirt]
cpu_mode=host-model

Run stack.sh

Node 2: Additional compute node

Create local.conf in /opt/stack/devstack. Be sure to update HOST_IP with the public IP of this node and SERVICE_HOST with the public IP of node 1 (controller+compute).

local.conf:

[[local|localrc]]
HOST_IP=<PUBLIC_IP>
LOGFILE=/opt/stack/logs/stack.sh.log
ADMIN_PASSWORD=redacted
DATABASE_PASSWORD=redacted
RABBIT_PASSWORD=redacted
SERVICE_PASSWORD=redacted
DATABASE_TYPE=mysql
SERVICE_HOST=<CONTROLLER+COMPUTE_PUBLIC_IP>
MYSQL_HOST=$SERVICE_HOST
RABBIT_HOST=$SERVICE_HOST
GLANCE_HOSTPORT=$SERVICE_HOST:9292
ENABLED_SERVICES=n-cpu,q-agt,c-vol,placement-client
NOVA_VNC_ENABLED=True
NOVNCPROXY_URL="http://$SERVICE_HOST:6080/vnc_lite.html"
VNCSERVER_LISTEN=$HOST_IP
VNCSERVER_PROXYCLIENT_ADDRESS=$VNCSERVER_LISTEN

# Neturon
FIXED_RANGE=10.4.128.0/20
FLOATING_RANGE=192.168.23.128/25
Q_USE_SECGROUP=True
Q_AGENT=linuxbridge
IP_VERSION=4

Wait until node 1 has finished building and then run stack.sh.

Registering node 2 (additional compute node)

Wait until node 2 has finished building.

On node 1 (controller+compute)

/opt/stack/devstack/tools/discover_hosts.sh

Create security groups and rules

# Source credentials
source ~/devstack/openrc admin

# Create secgroups
openstack security group create server
openstack security group create client

# Create server security group rules
openstack security group rule create --remote-group client  --dst-port 9092 --protocol tcp server

# Create client security group rules
openstack security group rule create --egress --remote-group server --dst-port 9092 --protocol tcp client

# Save the client secgroup's ID
CLIENT_SG_ID=$(openstack security group show client -cid -fvalue)

Patch neutron-server and neutron-linuxbridge-agent

These patches just make it easier to trigger the race condition. The neutron-server patch slows down port updates for the "client" VM, and the neutron-linuxbridge-agent patch makes the window where the condition can occur longer.

These steps should be performed on the node 1 (combined controller+compute node)

Patch neutron-server

cat << EOF > ~/plugin.py.patch
--- /opt/stack/neutron/neutron/plugins/ml2/plugin.py    2020-07-17 18:19:15.821827538 +0000
+++ /opt/stack/neutron/neutron/plugins/ml2/plugin.py-patched    2020-07-17 18:19:05.142035026 +0000
@@ -1400,6 +1400,16 @@
     @db_api.retry_if_session_inactive()
     def create_port(self, context, port):
         self._before_create_port(context, port)
+        client_sg_id = 'CLIENT_SG_ID'
+        sleeptime = 8
+        try:
+            if client_sg_id in port['port']['security_groups']:
+                import time
+                LOG.info("neutron_diag: sleeping {}s before creating port".format(sleeptime))
+                time.sleep(sleeptime)
+                LOG.info("neutron_diag: done sleeping {}s".format(sleeptime))
+        except TypeError:
+            pass
         result, mech_context = self._create_port_db(context, port)
         return self._after_create_port(context, result, mech_context)
EOF
sed -i "s/CLIENT_SG_ID/${CLIENT_SG_ID}/" ~/plugin.py.patch
patch -b -i ~/plugin.py.patch /opt/stack/neutron/neutron/plugins/ml2/plugin.py
sudo systemctl restart devstack@q-svc.service

Patch neutron-linuxbridge-agent

cat << EOF > ~/securitygroups_rpc.py.patch
--- /opt/stack/neutron/neutron/agent/securitygroups_rpc.py-orig 2020-07-16 15:33:43.055891510 +0000
+++ /opt/stack/neutron/neutron/agent/securitygroups_rpc.py      2020-07-16 15:34:24.979023281 +0000
@@ -148,6 +148,11 @@
                 devices.update(devices_info['devices'])
                 security_groups.update(devices_info['security_groups'])
                 security_group_member_ips.update(devices_info['sg_member_ips'])
+            import time
+            sleeptime = 10
+            LOG.info("neutron_diag: sleeping {}s after security_group_info_for_devices".format(sleeptime))
+            time.sleep(sleeptime)
+            LOG.info("neutron_diag: done sleeping {}s".format(sleeptime))
         else:
             devices = self.plugin_rpc.security_group_rules_for_devices(
                 self.context, list(device_ids))
EOF
patch -b -i ~/securitygroups_rpc.py.patch /opt/stack/neutron/neutron/agent/securitygroups_rpc.py
sudo systemctl restart devstack@q-agt.service

Observe race condition

Create instances

Be sure to change the hypervisor names to match your environment's hostnames.

openstack server create --flavor m1.nano --image cirros-0.5.1-x86_64-disk --network private --security-group client --availability-zone nova:cfarquhar-devstack-master02 client01 > /dev/null & \
openstack server create --flavor m1.nano --image cirros-0.5.1-x86_64-disk --network private --security-group server --availability-zone nova:cfarquhar-devstack-master01 server01 > /dev/null &

Note empty ipset

ipset list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment