-
-
Save abagshaw/a7dba307e5513f35d9314563bcb41ba0 to your computer and use it in GitHub Desktop.
udev rule infiniband nic rename hack
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This udev rule is a hack specifically for Azure instances with RDMA support | |
# | |
# Azure instances with RDMA support will expose a NIC to the VM with kernel name eth1. This NIC | |
# is useless and used for some sort of bonding with the ib0 interface (but we aren't using IPoIB) | |
# so we don't want it in the first place. Unfortunately this means that our actual secondary NIC | |
# used for the private network will end up being eth2. Our code has some dependencies on the secondary | |
# NIC being named eth1 so we need to get rid of this painful useless infiniband NIC and move eth2 | |
# back to eth1. We "get rid" of the useless infiniband NIC by renaming it to a garbage value like | |
# vfib0. https://www.youtube.com/watch?v=fWweqP_ZWbg | |
# | |
# These rules incorporate two heuristics that in testing have proven reliable to identify the | |
# infiniband NIC while *not* triggering on non-RDMA instances where eth1 is already the expected | |
# secondary NIC we are looking for. | |
# | |
# 1. The infiniband NIC appears to *always* have a Hyper-V auto generated MAC. Hyper-V auto generated | |
# MACs have the first three octets set to 00:15:5d, while the real NICs will have a different MAC prefix. | |
# https://support.microsoft.com/en-us/help/2804678/windows-hyper-v-server-has-a-default-limit-of-256-dynamic-mac-addresse | |
# | |
# 2. The infiniband NIC appears to *always* have a different speed than real NICs. On NC24rs_v3 instances | |
# this speed in 5600Mb/s, on NC24rs_v2 it is 5400Mb/s. Real azure NICs _without_ accelerated networking | |
# appear to have speeds of 40000Mb/s and with accelerated networking 50000Mb/s. Thus it seems logical | |
# that infiniband NICs will consistently have different speeds than the actual NICs and so we | |
# select for this condition as well. | |
# | |
# If *either* of the first or second conditions are not met, we do not perform the fix. | |
# | |
# NOTE: In testing, *either* one of the above conditions have been sufficient to distinguish between | |
# real NICs and infiniband NICs, however to be extra careful that we don't break non-RDMA instances | |
# we require *both* conditions to be met. | |
# | |
# -------- | |
# | |
# So...what's the alternative here? One other idea is to piggyback on the /var/lib/waagent/SharedConfig.xml | |
# file that is populated by walinuxagent. This file includes a rdmaMacAddress property that can be used | |
# to reliable identify the infiniband NIC, and then you can rename it from there. The problem is that | |
# this file is only populated by the walinuxagent _after_ the network is up (it needs to fetch the file | |
# from some azure service)...and if you attempt do the renaming after the network is up you will hit | |
# the problem where the network service will attempt to bring up eth1 and configure it with DHCP | |
# _before_ you can rename it resulting in the boot stall. You might be able to reduce the timeout | |
# for "A start job is running for wait for network to be configured" to get around this, but things | |
# get hacky again. Good luck! | |
# | |
SUBSYSTEM=="net", ACTION=="add", KERNEL=="eth1", ATTRS{address}=="00:15:5d*", \ | |
PROGRAM=="/bin/bash -c 'if [ \"$(ethtool eth0 | grep Speed)\" = \"$(ethtool eth1 | grep Speed)\" ]; then exit 1; else exit 0; fi'", \ | |
NAME="vfib0" | |
SUBSYSTEM=="net", ACTION=="add", KERNEL=="eth2", ATTRS{address}!="00:15:5d*", \ | |
PROGRAM=="/bin/bash -c 'if [ \"$(ethtool eth0 | grep Speed)\" = \"$(ethtool eth1 | grep Speed)\" ]; then exit 1; else exit 0; fi'", \ | |
NAME="eth1" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment