hemenkapadia/Ubuntu 18.04 with Nvidia+CUDA on Optimus Laptop.md

## Ubuntu 18.04 with Nvidia+CUDA on Optimus Laptop.md

      
    Raw
  

              Ubuntu 18.04 with Nvidia+CUDA on Optimus Laptop.md
            
          
    Kernel settings for installation

Use a live usb to try Ubuntu before installing. Boot from the live usb. On the GRUB screen highlight the "Try Ubuntu ...." option and press e. Update kernel parameters by adding options before quiet splash such that the line should read as below
nogpumanager nomodeset i915.modeset=1 quiet splash

Note: On HighDPI screen machines there is a known issue whereby Ubiquity (ubuntu installer) craashes at the copying files step i.e. immediatly after the user setup screen. The cause and workaround are discussed in other gist post
Persist GRUB configuration

Post installation, to avoid updating the above mentioned kernel options each time the system is booted, edit GRUB configuration file sudo vi /etc/default/grub and make the following changes
Persist Kernel boot parameters
GRUB_CMDLINE_LINUX_DEFAULT="nomodeset nogpumanager i915.modeset=1 quiet splash"

Set Default Boot OS to Windows
GRUB_DEFAULT="Windows Boot Manager (on /dev/sda1)"

High DPI configuration - may not apply to everyone
On 4K or HighDPI displays, GRUB menu fonts are extremely minute. Update the following lines to fix the fonts issue.
GRUB_GFXMODE=1600x1200
GRUB_GFXPAYLOAD_LINUX=keep 

Note: This setting is specific to the GRUB video modes supported on my laptop Dell 7559.
After making the above configuration changes, update GRUB configuration to persists them using the command sudo update-grub2
Scaling of Virtual Console fonts

On High DPI or Retina displays fonts on the Virtual Console are very small making it difficult
to read. to fix this issue, first create the configuration
sudo dpkg-reconfigure console-setup
select the following options
Encoding - UTF-8 
Character Set - . Combined - Latin; Salvic Cyrillic; Greek 
Font for console - Terminus 
Font Size - 16x32 

Next edit /lib/systemd/system/console-setup.service and in [Service] section add ExecStart=/bin/setupcon to the bottom of the section.
If any time virtual console is showing tiny fonts (e.g. in recovery mode) run /bin/setupconand it will fix the console.
Restart

Restart your system and verify that above points are working as expected.
Update + Upgrade

sudo apt-get update
sudo apt-get upgrade

Install base development packages

sudo apt install build-essential libelf-dev

Remove nouveau and disable gpumanager


Restart the machine in init mode 3. This is done by adding 3 after quiet splash in the GRUB kernel configuration line as explained in the Kernel Settings for Installation point above

OR

Let Ubuntu boot in the GUI mode normally.
Switch to Vconsole Alt + Ctrl + F3/6). Login and enter sudo telinit 3` (to stop the x server)
Remove any existing installs of nvidia, nouveau etc.

sudo apt-get remove --purge nvidia*
sudo apt-get remove --purge bumblebee*
sudo apt-get --purge remove xserver-xorg-video-nouveau*


Blacklist nouveau drivers sudo vi /etc/modprobe.d/blacklist.conf. Add the following lines to the file

# Blacklisting nouveau 
blacklist nouveau
blacklist lbm-nouveau
alias nouveau off
alias lbm-nouveau off
options nouveau modeset=0


Disable gpu-manager.service for systemctl

sudo systemctl mask gpu-manager.service


Create initramfs sudo update-initramfs -u -k all
Reboot the system and verify that nouveau is successfully blacklisted. The below command should not return any lines with the word nouveau in it.

lsmod | grep nouveau


Check if gpu-manager.service is disabled

sudo systemctl status gpu-manager.service

Install nvidia drivers and CUDA

install Nvidia driver

I want to use Nvidia GPU only for CUDA processing (number crunching, data science, deep learning) and the Intel GPU for all display purposes. Hence this time I fallback on the runfile installation process instead of deb (package manager), since the runfile process allows us to install Nvidia drivers without Open GL libraries and DRM support. Such flexibity is not available in the deb/apt based installation.
DRM is mainly needed for rendering, but since we are using nvidia only for CUDA processing we do not need DRM. Additionally there is a known issue whereby DRM blocks the nvidia driver from unloading from the kernel. This also impacts bbswitch's ability to turn the GPU off.

Install DKMS sudo apt installs dkms. See Note below.
Download the runfile for Nvidia CUDA 9.2 toolkit
Extract the three files from this runfile sudo sh cuda_version_linux.run --extract=/home/hemen/Downloads/CUDA. Extract path is required to be absolute.
The above command generates 3 files in the extract directory. CUDA Samples, CUDA toolkit and Nvidia Driver.
We first install the Nvidia driver without the openGL library and drm options sudo ./NVIDIA-Linux...version.run --dkms --no-opengl-files --no-drm. During the time of this writing CUDA version was 9.2 and nvidia driver was 396.26. See note about DKMS below
Restart the machine once the installation of Nvidia driver completes successfully. Verify that the Xserver works as expected, is driven by i915.
Verify that nvidia driver is loaded in the kernel lsmod | grep nvidia. It should list only the nvidia driver loaded. It should not list other drivers like nvidia_drm etc..

DMKS Note
Without the --dkms option Nvidia Kernel modules will not be upgraded when the kernel is upgraded. I noticed it the hard way when the kernel upgraded as part of normal system update. To get it working, re-install the NVIDIA driver as mentioned in step 4 above, and then update-initramfs.
install CUDA toolkit


Restart the machine back into init 3 using kernel paramaeters as explained above
Install the CUDA toolkit using the runfile sudo ./cuda-linux.9.2.88-23920284.run
Accept license. Use default install path, yes for desktop icons and symbolic link.

install CUDA samples


Install CUDA samples using the run file ./cuda-samples.9.2.88-23920284-linux.run. I selected the installation directory as ~/CUDASamples.

install bbswitch


Install bbswitch sudo apt install bbswitch-dkms
Add the following two lines to /etc/modules, to load both these modules on startup

bbswitch
i915


Update initramfs sudo update-initramf -u -k all
Reboot. Ensure Xserver is started as expected. Noticed nvidia drivers are loaded in the kernel. At this point I was able to modprobe -r (remove) as well as modprobe (add) the nvidia driver. I was also abe to manually switch the GPU off after removing the nvidia driver from kernel sudo tee /proc/acpi/bbswitch <<< OFF

GPU in off state on machine startup

Blacklist nvidia

Prevent nvidia drivers from loading in the kernel at startup, add the following lines to /etc/modprobe.d/blacklist.conf
# Blacklist Nvidia
blacklist nvidia
blacklist mvidia-uvm
blacklist mvidia-drm
blacklist nvidia-modeset

bbswitch initial state


Create file /etc/modprobe.d/bbswitch.conf with the below contents to set Nvidia GPU to OFF state when the machine starts up.

options bbswitch load_state=0


Update initramfs sudo update-initramf -u -k all

Test if cuda samples are running

hemen@hemen-Inspiron-7559:~/bin$ cat /proc/acpi/bbswitch
0000:02:00.0 OFF
hemen@hemen-Inspiron-7559:~/bin$ sudo tee /proc/acpi/bbswitch <<< ON
ON
hemen@hemen-Inspiron-7559:~/bin$ lsmod | grep nvidia
hemen@hemen-Inspiron-7559:~/bin$ sudo modprobe nvidia
hemen@hemen-Inspiron-7559:~/bin$ lsmod | grep nvidia
nvidia              14016512  0
ipmi_msghandler        53248  2 nvidia,ipmi_devintf
hemen@hemen-Inspiron-7559:~/bin$ nvidia-smi
Thu May 31 23:00:51 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:02:00.0 Off |                  N/A |
| N/A   49C    P0    N/A /  N/A |      0MiB /  4046MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
hemen@hemen-Inspiron-7559:~/bin$ cat /proc/acpi/bbswitch 
0000:02:00.0 ON
hemen@hemen-Inspiron-7559:~/bin$ lsmod | grep nvidia
nvidia              14016512  0
ipmi_msghandler        53248  2 nvidia,ipmi_devintf
hemen@hemen-Inspiron-7559:~/Workspace$ cd CUDASamples/
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples$ ls
0_Simple  1_Utilities  2_Graphics  3_Imaging  4_Finance  5_Simulations  6_Advanced  7_CUDALibraries  bin  common  EULA.txt  Makefile
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples$ cd 1_Utilities/
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities$ ls
bandwidthTest  deviceQuery  deviceQueryDrv  p2pBandwidthLatencyTest  topologyQuery
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities$ cd deviceQuery
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ ls
deviceQuery  deviceQuery.cpp  deviceQuery.o  Makefile  NsightEclipse.xml  readme.txt
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 960M"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 4046 MBytes (4242604032 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1176 MHz (1.18 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1
Result = PASS
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ lsmod | grep nvidia
nvidia_uvm            782336  0
nvidia              14016512  1 nvidia_uvm
ipmi_msghandler        53248  2 nvidia,ipmi_devintf
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ sudo modprobe -r nvidia_uvm
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ sudo modprobe -r nvidia
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ lsmod | grep nvidia
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ cat /proc/acpi/bbswitch 
0000:02:00.0 ON
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ sudo tee /proc/acpi/bbswitch <<< OFF
OFF
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ cat /proc/acpi/bbswitch 
0000:02:00.0 OFF
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ 

Install CUDANN Library

Follow CUDNN install instructions to install CUDNN library. We followed the tar.gz install steps. Verified as below
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/8_cudnn_samples_v7/mnistCUDNN$ cudarun ./mnistCUDNN
[sudo] password for hemen: 
ON
cudnnGetVersion() : 7104 , CUDNN_VERSION from cudnn.h : 7104 (7.1.4)
Host compiler version : GCC 7.3.0
There are 1 CUDA capable devices on your machine :
device 0 : sms  5  Capabilities 5.0, SmClock 1176.0 Mhz, MemSize (Mb) 4046, MemClock 2505.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.055968 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.056000 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.066528 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.162464 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.257472 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.051936 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.060768 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.061952 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.189792 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.287136 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!
OFF
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/8_cudnn_samples_v7/mnistCUDNN$