Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save aki-hp/8816f29bc1340d4c858806be5f17e77e to your computer and use it in GitHub Desktop.
Save aki-hp/8816f29bc1340d4c858806be5f17e77e to your computer and use it in GitHub Desktop.
Minor revision for manual NVIDIA Driver installation

How to install Nvidia driver alongside real-time linux kernel

This method used these other methods as references:

(1) https://github.com/cacao-org/cacao/wiki/OS-install-RTC

(2) https://qiita.com/cielavenir/items/c4afeb7af8f2ba510a59

(3) https://www.if-not-true-then-false.com/2021/debian-ubuntu-linux-mint-nvidia-guide/

(4) https://gist.github.com/pantor/9786c41c03a97bca7a52aa0a72fa9387

The rt kernel I tested at this moment is 5.15.148-rt74 (Feb 2024). Ubuntu version is 20.04. Except for CUDA, I aim to have an installation that is closer to the "stock up-to-date" Ubuntu 20.04.

The caveat of this installation is that VMWare and VirtualBox will not work. Forcing them to be installed will cause a kernel panic, according to (2).

Install CUDA

Run the following to install CUDA 12.1. 12.1 is chosen because it's the latest CUDA version supported by PyTorch (part of my use case).

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2004-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit

Install NVIDIA Driver

There are 2 options available for installing the NVIDIA driver.

Option 1: apt-get

The latest driver for Ubuntu 20.04 is 535 at the time of writing this (Mar 2024).

sudo apt-get install nvidia-driver-535 nvidia-dkms-535

Edit dkms.conf for RT Kernel

Referring to (2), we need to edit dkms.conf to be able to get the NVIDIA driver to run on the RT kernel. The same link also explains the incompatibility with VMWare (and in my personal experience, VirtualBox).

# Change to Nvidia driver source directory /usr/src/nvidia-535.161.07
cd "$(dpkg -L nvidia-kernel-source-535 | grep -m 1 "nvidia-drm" | xargs dirname)"

Open dkms.conf with elevated privilege. Find MAKE[0]= in the file, and in the same line, you should find:

env NV_VERBOSE=1

Change it to:

env IGNORE_PREEMPT_RT_PRESENCE=1 NV_VERBOSE=1

Option 2: Manual Install

Referring to (3) and (4), this is how to install NVIDIA driver manually:

Install dependencies:

sudo apt-get update -y
sudo apt-get install -y libglvnd-dev
# 1. Download NVIDIA driver as a .run file

# 2. Stop X-Server
sudo service lightdm stop # If using LightDM
sudo service gdm3 stop # If using GDM3

# 3. Blacklist Nouveau driver
sudo nano /etc/modprobe.d/blacklist-nouveau.conf

# Insert into file:
#  blacklist nouveau
#  options nouveau modeset=0

# 4. Update kernel initramfs
sudo update-initramfs -u
sudo reboot  # I'm not sure if needed

# 5. Install driver!
chmod +x NVIDIA-Linux-*.run
sudo IGNORE_PREEMPT_RT_PRESENCE=1 bash NVIDIA-Linux-*.run  # Insert downloaded .run file

# 6. Reboot
sudo reboot

If the NVIDIA driver installer create modprobe files to disable Nouveau driver, and if you want to re-enable the Nouveau driver later, you will need to delete these files:

# These may or may not be created by NVIDIA installer
/usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf

# This is what you created when installing NVIDIA driver from .run file above
/etc/modprobe.d/blacklist-nouveau.conf

Run nvidia-smi to check the driver, and 'nvcc --version' to check the CUDA.

Install real-time linux kernel

Follow a community contributed tutorial to install the latest stable RT_PREEMPT version https://docs.ros.org/en/foxy/Tutorials/Building-Realtime-rt_preempt-kernel-for-ROS-2.html

Open Software & Updates. in the Ubuntu Software menu tick the ‘Source code’ box.

Install dependencies:

sudo apt-get update
sudo apt-get build-dep linux
sudo apt-get install build-essential bc curl ca-certificates gnupg2 lsb-release libncurses-dev flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf fakeroot
sudo apt update
sudo apt install zstd
cd ~
mkdir rt_kernel
cd rt_kernel

wget https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.15.148.tar.gz
wget https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.15.148.tar.sign
wget https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/patch-5.15.148-rt74.patch.gz
wget https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/patch-5.15.148-rt74.patch.sign

gunzip linux-5.15.148.tar.gz
gunzip patch-5.15.148-rt74.patch.gz

Verify the kernel files integrity:

gpg2 --verify linux-*.tar.sign
gpg2 --verify patch-*.patch.sign
tar xf linux-*.tar

cd linux-*/
patch -p1 < ../patch-*.patch

cp /boot/config-$(uname -r) .config

yes '' | make oldconfig
make menuconfig

Apply these modifications:

  • general setup / preemption model
    • Fully preemptible Kernel (Real-time)
  • general setup / Timers subsystem
    • High Resolution Timer Support
  • general setup / Timers subsystem / Timer tick handling
    • Full dynticks system (tickless)
  • Processor type and features / Timer frequency
    • 1000 Hz
  • Power management and ACPI options
    • CPU Frequency scaling, CPU Frequency scaling (CPU_FREQ [=y]) -> Default CPUFreq governor ( [=y]) (X) performance
  • Cryptographic API* > Certificates for signature checking (at the very bottom of the list) > Provide system-wide ring of trusted keys > Additional X.509 keys for default system keyring
    • Remove the “debian/canonical-certs.pem” from the prompt and press Ok.
  • Cryptographic API* > Certificates for signature checking (at the very bottom of the list) > Provide system-wide ring of trusted keys > X.509 certificates to be preloaded into system blacklist keyring
    • Remove the “debian/canonical-certs-revoked.pem” from the prompt and press Ok.

Save and exit.

sudo make -j $(nproc)
sudo make bzImage
sudo make INSTALL_MOD_STRIP=1 modules_install -j $(nproc)
sudo make install

According to the post by cacao:

the INSTALL_MOD_STRIP is important - it shrinks the initial ramdisk size by ~90%. In some cases, when not used, the initial ramdisk would not load and the patched kernel would not boot.

If you see these errors:

  • if you see this error: CONFIG_X86_X32 enabled but no binutils support, open .config in a text editor and change config_x86_x32 to n

    OR

  • Run make menuconfig > Binary Emulations

    • Make sure that x32 ABI for 64-bit mode is unchecked
  • sed: can't read modules.order: No such file or directory

    • set CONFIG_SYSTEM_TRUSTED_KEY=""
    • set CONFIG_SYSTEM_REVOCATION_KEYS=""
  • Missing file: arch/x86/boot/bzImage , you need to run sudo make bzImage before modules_install

  • if you see this error:bin/sh: 1: zstd: not found, you need to install Zstandard bysudo apt install zstd

Make sure realtime kernel is set to default

Edit /etc/default/grub file Change GRUB_DEFAULT=0 to GRUB_DEFAULT=saved. Add GRUB_SAVEDEFAULT=true

Allow a user to set real-time permissions for its processes

According to https://frankaemika.github.io/docs/installation_linux.html#setting-up-the-real-time-kernel:

sudo addgroup realtime
sudo usermod -a -G realtime $(whoami)

Afterwards, add the following limits to the realtime group in /etc/security/limits.conf:

@realtime soft rtprio 99
@realtime soft priority 99
@realtime soft memlock 102400
@realtime hard rtprio 99
@realtime hard priority 99
@realtime hard memlock 102400

Restart, boot to the RT kernel (if not automatically, boot it through grub first), run nvidia-smi to check the driver, and nvcc --version to check the CUDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment