Hi! If you've ever faced this unholy setup, you may have run into weird issues that prevent access the egpu's display and compute capability. Here are a few instructions on how I solved this issue (~=5-8 hours total of debugging across several days).
$ sudo apt-get update
$ sudo apt-get dist-upgrade
Search 'Thunderbolt' and enable Thunderbolt access in the settings menu for the eGPU - this step can often be overlooked.
Check if you are able to see the egpu
$ lspci | grep -i “nvidia”
$ lsmod | grep -i “nvidia”
If you are not, then this is an issue with the hardware or cables.
Nouveau are open-source drivers with limited functionality. These will interfere with Nvidia GPU drivers (and CUDA).
lsmod | grep nouveau
$ cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
$ blacklist nouveau
$ options nouveau modeset=0
$ EOF
$ sudo update-initramfs -u
-https://docs.nvidia.com/ai-enterprise/deployment-guide-vmware/0.1.0/nouveau.html
Once the repository is updated you can access the 'Software & Updates' tool in the menu and install Nvidia drivers from the additional tab for the graphics card in your laptop - don't worry the driver will work for the egpu as well if both the gpus are Nvidia gpus and the older gpu is not bottle-necking the driver support for the newer gpu. Alternatively, you can also install from the Nvidia website. I used nvidia-driver-535 for my 1070.
If the drivers don't download nvidia-smi already you can install it using
$ sudo apt-get nvidia-smi
This will tell you if the driver is able to access the egpu.
You may now install nvidia-settings by using the following command:
$ sudo apt-get install nvidia-settings
nvidia-settings
enable us to tweak our hardware and provide us with useful information about which devices are being actively used.
Follow the instruction from this github to install egpu-switcher: https://github.com/hertg/egpu-switcher
You can run these commands to grab the vendor id of the egpu:
$ lspci -nn
or
$ lspci -vnn | grep VGA -A 12
The UEFI BIOS on my laptop is accessed through the F12 key. From Security tab, the thunderbolt security option can be set to:
- Unrestricted
- User Authorized
- Secure Boot
- Display Port Only
Pick Unrestricted enable unrestricted GPU direct access for the UEFI.
To me, this seems to be an issue of communication between KMD and UEFI. This means that: if UEFI thunderbolt access is unrestricted and you see this issue: KMD settings are not configured correctly. if XORG(KMD) settings are configured correctly using egpu-switcher and you see this issue: UEFI is not permitting unrestricted access.
If you are stuck in this situation the safest option is to pick Display Port Only in the UEFI menu and undo the last setting you configured.
I hope this saved you a few hours of your precious time. If you have any questions or want me to repro, best way to reach me would be through linkedin or twitter (@tanmayyb).