Skip to content

Instantly share code, notes, and snippets.

@morgangiraud
Created May 2, 2024 11:56
Show Gist options
  • Save morgangiraud/4f58a62316fac7b4a32b81f0a66893fc to your computer and use it in GitHub Desktop.
Save morgangiraud/4f58a62316fac7b4a32b81f0a66893fc to your computer and use it in GitHub Desktop.
tinygrad-p2p-patched-driver.sh
# this GIST is a follow-up to this previous GIST: https://gist.github.com/morgangiraud/ffa45e76b6891cd4e37e90d75b8be37b
# See the article here: https://morgangiraud.medium.com/multi-gpu-nvidia-p2p-capabilities-and-debugging-tips-fb7597b4e2b5
# It provides some tips and tricks to install Tinygrad patched nvidia open kernel to give P2P capabilities
# to the 40** series!
### Transitioning into complex operations, our aim is to minimize potential issues.
### Important: Verify that the version from nvidia-smi matches exactly what we intend to use.
### At the time of this writing, the reported version is 550.78.
# To ensure consistency between the driver versions, it's crucial to start with a clean slate.
# We will remove the currently installed Nvidia driver and replace it with a custom patched version.
# It's imperative to match the version precisely with the one from the tinygrad repository to avoid dependency conflicts.
# Let's go!
# We remove the driver
sudo apt remove nvidia-driver-550-open
# Obtain the patched driver source code.
git clone git@github.com:tinygrad/open-gpu-kernel-modules.git
cd open-gpu-kernel-modules
# Note: The tinygrad team's modifications are based on version 550.54 of the Nvidia driver.
# It is crucial to integrate Nvidia modifications up to the version of the driver you previously installed.
# to do so
# Add the official Nvidia repository as an upstream source. This allows us to fetch updates directly from Nvidia.
git remote add upstream git@github.com:NVIDIA/open-gpu-kernel-modules.git
# Fetch all branches from the upstream repository to ensure we have the latest changes.
git remote update
# Before merging, review the commit history from the Nvidia repository to identify relevant updates.
# Pay special attention to tags and commit messages that correlate with driver versions.
git log upstream/main
# Merge changes from the Nvidia upstream. Specifically, merge up to the version that matches your previously installed driver.
# In this scenario, we merge up to the most recent update provided by Nvidia.
git merge upstream/main
# It's possible that merging changes may result in conflicts.
# If conflicts arise, carefully determine which files are affected.
# For non-critical files like the README, you can safely resolve conflicts by
# choosing to override the README with tinygrad version.
# Clean up previous build artifacts to prepare for a fresh build.
make clean
# Ensure none of the Nvidia kernel modules are currently loaded to avoid conflicts.
sudo rmmod nvidia_drm nvidia_modeset nvidia_uvm nvidia_fs nvidia
# If the 'nvidia_drm' module cannot be removed due to active use by the GUI, switch to a text-only target.
# This is common if you have a graphical interface like GNOME or KDE running on Ubuntu.
sudo systemctl isolate multi-user.target
# Compile the Nvidia modules with specific compiler settings for compatibility.
# The compiler you use should be the ones used to compile your linux kernel.
CC=gcc-12 CXX=g++ make modules -j$(nproc)
# Package the compiled modules into a Debian package for easy installation and management.
# Use 'checkinstall' to create a Debian package instead of installing the modules directly.
CC=gcc-12 CXX=g++ sudo checkinstall make modules_install -j$(nproc)
# When prompted by 'checkinstall', name the package meaningfully,
#e.g., 'nvidia-driver-550-open-patch-tinygrad',
# and set the version to match the current branch or your custom version,
#such as '550.78-p2p'.
# The installation process will add the new modules to the kernel directory:
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia.ko
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia-uvm.ko
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia-modeset.ko
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia-drm.ko
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia-peermem.ko
# Update the module dependency files to ensure the system recognizes the new modules.
sudo depmod
# Verify the installation by checking the functionality of the Nvidia System Management Interface.
nvidia-smi
# Reboot the system to finalize the installation and start using the new driver modules.
sudo reboot
###
# Troubleshooting
###
# Ensure the driver file used is one of the above
modinfo -F filename nvidia
modinfo -F version nvidia
# If you see a discrepancy, you might need to remove any other driver that might be here
# Disable again nvidia modules
sudo rmmod nvidia_drm nvidia_modeset nvidia_uvm nvidia
# Remove any duplicate Nvidia module files if found.
# Do that for the files installed above (nvidia.ko, nvidia-uvm.ko, etc.)
find /lib/modules/$(uname -r) -type f -name "nvidia.ko"
rm ...
# update the initial RAM filesystem to ensure the system uses the correct driver version at boot
sudo update-initramfs -u
# Rebuild the module dependency map
sudo depmod
# Re-check the module information:
modinfo -F filename nvidia
# If it looks good, reboot one last time
sudo reboot
#and check that you can query the driver
nvidia-smi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment