Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save minhhieutruong0705/e4a82d6b7f1b182dafa8c55090dd19ca to your computer and use it in GitHub Desktop.
Save minhhieutruong0705/e4a82d6b7f1b182dafa8c55090dd19ca to your computer and use it in GitHub Desktop.

Install Nvidia Driver CUDA and cuDNN on Ubuntu

Jan 2nd, 2022

Index

System Specification Check

  • Check your system architecture to select correct installers for your platform
    $ uname -m
    $ dpkg --print-architecture

NVIDIA Driver Installation

  1. Remove old installation
    $ sudo apt-get purge nvidia-*
    $ sudo apt-get update 
    $ sudo apt-get autoremove # DO NOT skip this line
  2. Search for latest version of Nvidia driver
    $ apt search nvidia-driver
  3. Install Nvidia libraries
    $ sudo apt install libnvidia-common-<version>
    $ sudo apt install libnividia-gl-<version>
  4. Install Nvidia driver
    $ sudo apt install nvidia-driver-<version>
  5. Reboot and check for the installation
    $ nvidia-smi

CUDA Toolkit Installation

  1. Intsall kernel headers and developement packages for your currently running kernel
    $ sudo apt-get install linux-headers-$(uname -r)
  2. Download and install CUDA Toolkit
  • CUDA Toolkit from Nvidia Developer
    • Select target platform
    • Recommendation: pick deb [network] option of Installer Type
    • Follow the installation instruction on the download page to install CUDA Toolkit
  • To include GDS package with CUDA Toolkit
    $ sudo apt-get install nvidia-gds 
  1. Setup environment
  • Config $PATH variable with following script:
    CUDA_HOME=/usr/local/cuda
    PATH=${CUDA_HOME}/bin${PATH:+:${PATH}}
    LD_LIBRARY_PATH=${CUDA_HOME}/lib64 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    export LD_LIBRARY_PATH
    export CUDA_HOME
    export PATH
  • Add the script to either:
    • ~/.bashrc for user session usage
    • /etc/profile for system wide usage
  1. Setup POWER9
  • Check NVIDIA Persistence Daemon
    $ systemctl status nvidia-persistenced
  • If it is not loaded
    $ sudo systemctl enable nvidia-persistenced
  1. Reboot and check for installation
    $ nvcc --version

cuDNN Installation

  1. Download cuDNN:
  • Nvidia cuDNN from Nvidia Developer (local installer)
    • NVIDIA Developer Program Membership is required to download
    • Select CUDA matching version and target platform
  1. Install cuDNN
  • Import CUDA GPG key
    $ sudo dpkg -i <downloaded-file>
    $ sudo apt-key add /var/cudnn-local-repo-*/7fa2af80.pub
    $ sudo apt-get update
  • To auto-match version of cuDNN v8 with version of CUDA when installing:
    $ sudo apt-get install libcudnn8
    $ sudo apt-get install libcudnn8-dev 
    $ sudo apt-get install libcudnn8-samples 

Nvidia Documentation

My Installation

  • Operating System: Ubuntu 20.04 x84_64 (64-bit)
  • Architecture: amd64
  • GPU: Nvidia GeForce GTX 1050
  • Installation with success on: Jan 2nd, 2022
@minhhieutruong0705
Copy link
Author

Success on Feb 10th, 2022
Operating System: Ubuntu 20.04 x84_64 (64-bit) - amd64
GPU: Nvidia GeForce RTX 3090

@minhhieutruong0705
Copy link
Author

Success on March 30th, 2022
Operating System: Ubuntu 20.04 x84_64 (64-bit) - amd64
GPU: Nvidia GeForce RTX 2080 Ti

@UTKRISHTPATESARIA
Copy link

UTKRISHTPATESARIA commented Apr 10, 2023

Hello @minhhieutruong0705,

Just wanted to confirm the following:
Was GDS working on all the above RTX models(1050, 3090, 2080) that you specified?

@minhhieutruong0705
Copy link
Author

minhhieutruong0705 commented Apr 10, 2023

Hi @UTKRISHTPATESARIA
GDS worked for the RTX 3090 model. It should work for the 1050 and 2080 also (I did not play with GDS on 1050 and 2080).

@UTKRISHTPATESARIA
Copy link

Thanks for your quick response., will try setting up on 3090
Also I was reading, on 3090 it supports "compatibility" mode, are you aware of how different is it from "pure" GDS?

@minhhieutruong0705
Copy link
Author

Thanks for your quick response., will try setting up on 3090 Also I was reading, on 3090 it supports "compatibility" mode, are you aware of how different is it from "pure" GDS?

@UTKRISHTPATESARIA
They are different in some cases. Maybe you want to have a look at the official docs for GDS from Nvidia:

@UTKRISHTPATESARIA
Copy link

Thanks for sharing!

@UTKRISHTPATESARIA
Copy link

Btw, did you need to enable "compatibility" mode on 3090, or was it working without it too?
As per docs, with "compatibility" mode enabled GDS's IO path will fall back to the traditional CPU path..

@minhhieutruong0705
Copy link
Author

Btw, did you need to enable "compatibility" mode on 3090, or was it working without it too? As per docs, with "compatibility" mode enabled GDS's IO path will fall back to the traditional CPU path..

@UTKRISHTPATESARIA
I did not enable "compatibility" mode on 3090 for my project. Maybe, you can try first without compatibility mode and enable it later if it is truly needed.

@UTKRISHTPATESARIA
Copy link

UTKRISHTPATESARIA commented Apr 25, 2023

Hey @minhhieutruong0705 ,

I have set up GDS, but while running experiments facing this error, by any chance did you face the same errors in dmesg?

nvidia-fs:nvfs_pin_gpu_pages:1292 Error ret -22 invoking nvidia_p2p_get_pages
                va_start=0x7f6792900000/va_end=0x7f67929fffff/rounded_size=0x100000/gpu_buf_length=0x100000

Running test benchmarks:

/gdsio -f /media/nvme/write-test -d 0 -w 4 -s 10G -i 1M -I 1 -x 0
warn: error opening log file: Permission denied, logging will be disabled
cuFile buffer deg-register failed :device pointer lookup failure
cuFile buffer deg-register failed :device pointer lookup failure
cuFile buffer deg-register failed :device pointer lookup failure
cuFile buffer deg-register failed :device pointer lookup failure
IoType: WRITE XferType: GPUD Threads: 4 DataSetSize: 10481664/10485760(KiB) IOSize: 1024(KiB) Throughput: 2.412582 GiB/sec, Avg_Latency: 1615.521672 usecs ops: 10236 total_time 4.143318 secs

Found a few articles where the GPU Direct RDMA is not supported on GeForce, but since you confirmed that GDS was working on RTX 3090 wanted to double-check.

https://www.reddit.com/r/nvidia/comments/irvk1n/does_rtx_30_series_offer_gpu_direct_storage/

Im also getting these errors in fstat, BAR1-map errors:

cat /proc/driver/nvidia-fs/stats                                                                                           chisel-t: Tue Apr 25 11:05:24 2023

GDS Version: 1.6.1.12
NVFS statistics(ver: 4.0)
NVFS Driver(version: 2.15.3)
Mellanox PeerDirect Supported: True
IO stats: Enabled, peer IO stats: Enabled
Logging level: debug

Active Shadow-Buffer (MiB): 0
Active Process: 0
Batches                         : n=0 ok=0 err=0 Avg-Submit-Latency(usec)=0
Reads                           : n=0 ok=0 err=0 readMiB=0 io_state_err=0
Reads                           : Bandwidth(MiB/s)=0 Avg-Latency(usec)=0
Sparse Reads                    : n=0 io=0 holes=0 pages=0
Writes                          : n=0 ok=0 err=0 writeMiB=0 io_state_err=0 pg-cache=0 pg-cache-fail=0 pg-cache-eio=0
Writes                          : Bandwidth(MiB/s)=0 Avg-Latency(usec)=0
Mmap                            : n=72 ok=72 err=0 munmap=72
Bar1-map                        : n=72 ok=0 err=72 free=0 callbacks=0 active=0 delay-frees=0
Error                           : cpu-gpu-pages=0 sg-ext=0 dma-map=0 dma-ref=0
Ops                             : Read=0 Write=0 BatchIO=0

GPU - RTX 3090
CUDA 12.1
CUDA driver 530.xx

@minhhieutruong0705
Copy link
Author

Hi @UTKRISHTPATESARIA ,

You are correct! We can have GSD on RTX 3090 in compatibility mode, but GSD cannot do anything with DMA on RTX 3090. I did not aware of this because I did not use much my RTX 3090. What we will have with sudo apt-get install nvidia-gds are only the GDS packages. Sorry for my previous incorrect information.

https://forums.developer.nvidia.com/t/gpudirect-available-on-ubuntu-18-04/192420/5

@UTKRISHTPATESARIA
Copy link

Thanks for sharing.

Let me see if I can setup GPU Direct RDMA using some custom build or stuff. That's the last hope for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment