Skip to content

Instantly share code, notes, and snippets.

@abagshaw
Created August 21, 2020 17:18
Show Gist options
  • Save abagshaw/a7dba307e5513f35d9314563bcb41ba0 to your computer and use it in GitHub Desktop.
Save abagshaw/a7dba307e5513f35d9314563bcb41ba0 to your computer and use it in GitHub Desktop.
udev rule infiniband nic rename hack
# This udev rule is a hack specifically for Azure instances with RDMA support
#
# Azure instances with RDMA support will expose a NIC to the VM with kernel name eth1. This NIC
# is useless and used for some sort of bonding with the ib0 interface (but we aren't using IPoIB)
# so we don't want it in the first place. Unfortunately this means that our actual secondary NIC
# used for the private network will end up being eth2. Our code has some dependencies on the secondary
# NIC being named eth1 so we need to get rid of this painful useless infiniband NIC and move eth2
# back to eth1. We "get rid" of the useless infiniband NIC by renaming it to a garbage value like
# vfib0. https://www.youtube.com/watch?v=fWweqP_ZWbg
#
# These rules incorporate two heuristics that in testing have proven reliable to identify the
# infiniband NIC while *not* triggering on non-RDMA instances where eth1 is already the expected
# secondary NIC we are looking for.
#
# 1. The infiniband NIC appears to *always* have a Hyper-V auto generated MAC. Hyper-V auto generated
# MACs have the first three octets set to 00:15:5d, while the real NICs will have a different MAC prefix.
# https://support.microsoft.com/en-us/help/2804678/windows-hyper-v-server-has-a-default-limit-of-256-dynamic-mac-addresse
#
# 2. The infiniband NIC appears to *always* have a different speed than real NICs. On NC24rs_v3 instances
# this speed in 5600Mb/s, on NC24rs_v2 it is 5400Mb/s. Real azure NICs _without_ accelerated networking
# appear to have speeds of 40000Mb/s and with accelerated networking 50000Mb/s. Thus it seems logical
# that infiniband NICs will consistently have different speeds than the actual NICs and so we
# select for this condition as well.
#
# If *either* of the first or second conditions are not met, we do not perform the fix.
#
# NOTE: In testing, *either* one of the above conditions have been sufficient to distinguish between
# real NICs and infiniband NICs, however to be extra careful that we don't break non-RDMA instances
# we require *both* conditions to be met.
#
# --------
#
# So...what's the alternative here? One other idea is to piggyback on the /var/lib/waagent/SharedConfig.xml
# file that is populated by walinuxagent. This file includes a rdmaMacAddress property that can be used
# to reliable identify the infiniband NIC, and then you can rename it from there. The problem is that
# this file is only populated by the walinuxagent _after_ the network is up (it needs to fetch the file
# from some azure service)...and if you attempt do the renaming after the network is up you will hit
# the problem where the network service will attempt to bring up eth1 and configure it with DHCP
# _before_ you can rename it resulting in the boot stall. You might be able to reduce the timeout
# for "A start job is running for wait for network to be configured" to get around this, but things
# get hacky again. Good luck!
#
SUBSYSTEM=="net", ACTION=="add", KERNEL=="eth1", ATTRS{address}=="00:15:5d*", \
PROGRAM=="/bin/bash -c 'if [ \"$(ethtool eth0 | grep Speed)\" = \"$(ethtool eth1 | grep Speed)\" ]; then exit 1; else exit 0; fi'", \
NAME="vfib0"
SUBSYSTEM=="net", ACTION=="add", KERNEL=="eth2", ATTRS{address}!="00:15:5d*", \
PROGRAM=="/bin/bash -c 'if [ \"$(ethtool eth0 | grep Speed)\" = \"$(ethtool eth1 | grep Speed)\" ]; then exit 1; else exit 0; fi'", \
NAME="eth1"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment