spk921/Install NVIDIA Driver and CUDA.md

## Install NVIDIA Driver and CUDA.md

      
    Raw
  

              Install NVIDIA Driver and CUDA.md
            
          
    Here's my experience of installing the NVIDIA CUDA kit 8.0 on a fresh install of Ubuntu Desktop 16.04.3 LTS.
Table of Contents


Install NVIDIA Graphics Driver via apt-get
Install CUDA
Install cuDNN

Table of contents generated with markdown-toc
1. Install NVIDIA Graphics Driver via apt-get

Do not use the CUDA run file to install your driver. Use apt-get instead. This way you do not need to worry about the Nouveau stuff you read about on StackOverflow.
As of 09/30/2017, the latest version of NVIDIA driver for Ubuntu 16.04.3 LTS is 375. To install the driver, excute
sudo apt-get nvidia-375 nvidia-modprobe
, and then reboot the machine.
The nvidia-modprobe utility is used to load NVIDIA kernel modules and create NVIDIA character device files automatically everytime your machine boots up.
It is recommended for new users to install the driver via this way because it is simple. However, it has some drawbacks:

The driver included in official Ubuntu repository is usually not the latest.
There would be some naming conflicts when other repositories (e.g. ones from CUDA) are added to the system.
One has to reinstall the driver after Linux kernel are updated.

Check the Installation

After a succesful installation, nvidia-smi command will report all your CUDA-capable devices in the system.
Common Errors and Solutions


ERROR: Unable to load the 'nvidia-drm' kernel module.


One probable reason is that the system is boot from UEFI but Secure Boot option is turned on in the BIOS setting. Turn it off and the problem will be solved.

Additional Notes

nvidia-smi -pm 1 can enable the persistent mode, which will save some time from loading the driver. It will have significant effect on machines with more than 4 GPUs.
nvidia-smi -e 0 can disable ECC on TESLA products, which will provide about 1/15 more video memory. Reboot is reqired for taking effect. nvidia-smi -e 1 can be used to enable ECC again.
nvidia-smi -pl <some power value> can be used for increasing or decrasing the TDP limit of the GPU. Increasing will encourage higher GPU Boost frequency, but is somehow DANGEROUS and HARMFUL to the GPU. Decreasing will help to same some power, which is useful for machines that does not have enough power supply and will shutdown unintendedly when pull all GPU to their maximum load.
-i <GPUID> can be added after above commands to specify individual GPU.
These commands can be added to /etc/rc.local for excuting at system boot.
2. Install CUDA 8.0

Installing CUDA from runfile is much simpler and smoother than installing the NVIDIA driver. It just involves copying files to system directories and has nothing to do with the system kernel or online compilation. Removing CUDA is simply removing the installation directory. So I personally does not recommend adding NVIDIA's repositories and install CUDA via apt-get or other package managers as it will not reduce the complexity of installation or uninstallation but increase the risk of messing up the configurations for repositories.
The CUDA runfile installer can be downloaded from NVIDIA's websie, or using wget in case you can't find it easily on NVIDIA:
cd
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run

What you download is a package the following three components:

an NVIDIA driver installer, but usually of stale version;
the actual CUDA installer;
the CUDA samples installer;

I suggest extracting the above three components and executing 2 and 3 separately (remember we installed the driver ourselves already). To extract them, execute the runfile installer with --extract option:
cd
chmod +x cuda_8.0.61_375.26_linux.run
./cuda_8.0.61_375.26_linux.run --extract=$HOME

You should have unpacked three components:
NVIDIA-Linux-x86_64-375.26.run (1. NVIDIA driver that we ignore),
cuda-linux64-rel-8.0.61-21551265.run (2. CUDA 8 installer), and
cuda-samples-linux-8.0.61-21551265.run (3. CUDA 8 Samples).
Execute the second one to install the CUDA Toolkit 8.0:
sudo ./cuda-linux64-rel-8.0.61-21551265.run

The default install location is /usr/local/cuda-8.0/.
To verify our CUDA installation, also install the Samples that includes two useful tools: deviceQuery and p2pBandwidthLatencyTest:
sudo ./cuda-samples-linux-8.0.61-21551265.run

After the installation finishes, configure the runtime library.
sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig

It is also recommended for Ubuntu users to append string /usr/local/cuda/bin to system file /etc/environments so that nvcc will be included in $PATH. This will take effect after reboot.
After the reboot, let's test our installation by making and invoking our tests:
cd /usr/local/cuda-8.0/samples
sudo make

It's a long process with many irrelevant warnings about deprecated architectures (sm_20 and such ancient GPUs). After it completes, run deviceQuery and p2pBandwidthLatencyTest:
cd /usr/local/cuda/samples/bin/x86_64/linux/release
./deviceQuery

The result of running deviceQuery should look something like this:
Install cuDNN

The recommended way for installing cuDNN is to first copy the tgz file to /usr/local and then extract it, and then remove the tgz file if necessary. This method will preserve symbolic links.
At last, execute sudo ldconfig to update the shared library cache.
libEGL

Solution for libEGL