Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save venkai/d55e82c17c0bf2e0e913244b8d780746 to your computer and use it in GitHub Desktop.
Save venkai/d55e82c17c0bf2e0e913244b8d780746 to your computer and use it in GitHub Desktop.
[Ubuntu 18.04 with Nvidia+CUDA on Optimus Laptop] Setting up Ubuntu 18.04 with nvidia drivers and CUDA for data science on Dell 7559 Optimus laptop #Ubuntu #Nvidia #CUDA #setup

Kernel settings for installation

Use a live usb to try Ubuntu before installing. Boot from the live usb. On the GRUB screen highlight the "Try Ubuntu ...." option and press e. Update kernel parameters by adding options before quiet splash such that the line should read as below

nogpumanager nomodeset i915.modeset=1 quiet splash

Note: On HighDPI screen machines there is a known issue whereby Ubiquity (ubuntu installer) craashes at the copying files step i.e. immediatly after the user setup screen. The cause and workaround are discussed in other gist post

Persist GRUB configuration

Post installation, to avoid updating the above mentioned kernel options each time the system is booted, edit GRUB configuration file sudo vi /etc/default/grub and make the following changes

Persist Kernel boot parameters

GRUB_CMDLINE_LINUX_DEFAULT="nomodeset nogpumanager i915.modeset=1 quiet splash"

Set Default Boot OS to Windows

GRUB_DEFAULT="Windows Boot Manager (on /dev/sda1)"

High DPI configuration - may not apply to everyone

On 4K or HighDPI displays, GRUB menu fonts are extremely minute. Update the following lines to fix the fonts issue.

GRUB_GFXMODE=1600x1200
GRUB_GFXPAYLOAD_LINUX=keep 

Note: This setting is specific to the GRUB video modes supported on my laptop Dell 7559.

After making the above configuration changes, update GRUB configuration to persists them using the command sudo update-grub2

Scaling of Virtual Console fonts

On High DPI or Retina displays fonts on the Virtual Console are very small making it difficult to read. to fix this issue, first create the configuration

sudo dpkg-reconfigure console-setup

select the following options

Encoding - UTF-8 
Character Set - . Combined - Latin; Salvic Cyrillic; Greek 
Font for console - Terminus 
Font Size - 16x32 

Next edit /lib/systemd/system/console-setup.service and in [Service] section add ExecStart=/bin/setupcon to the bottom of the section.

If any time virtual console is showing tiny fonts (e.g. in recovery mode) run /bin/setupconand it will fix the console.

Restart

Restart your system and verify that above points are working as expected.

Update + Upgrade

sudo apt-get update
sudo apt-get upgrade

Install base development packages

sudo apt install build-essential libelf-dev

Remove nouveau and disable gpumanager

  1. Restart the machine in init mode 3. This is done by adding 3 after quiet splash in the GRUB kernel configuration line as explained in the Kernel Settings for Installation point above

OR

  1. Let Ubuntu boot in the GUI mode normally.
  2. Switch to Vconsole Alt + Ctrl + F3/6). Login and enter sudo telinit 3` (to stop the x server)
  3. Remove any existing installs of nvidia, nouveau etc.
sudo apt-get remove --purge nvidia*
sudo apt-get remove --purge bumblebee*
sudo apt-get --purge remove xserver-xorg-video-nouveau*
  1. Blacklist nouveau drivers sudo vi /etc/modprobe.d/blacklist.conf. Add the following lines to the file
# Blacklisting nouveau 
blacklist nouveau
blacklist lbm-nouveau
alias nouveau off
alias lbm-nouveau off
options nouveau modeset=0
  1. Disable gpu-manager.service for systemctl
sudo systemctl mask gpu-manager.service
  1. Create initramfs sudo update-initramfs -u -k all
  2. Reboot the system and verify that nouveau is successfully blacklisted. The below command should not return any lines with the word nouveau in it.
lsmod | grep nouveau
  1. Check if gpu-manager.service is disabled
sudo systemctl status gpu-manager.service

Install nvidia drivers and CUDA

install Nvidia driver

I want to use Nvidia GPU only for CUDA processing (number crunching, data science, deep learning) and the Intel GPU for all display purposes. Hence this time I fallback on the runfile installation process instead of deb (package manager), since the runfile process allows us to install Nvidia drivers without Open GL libraries and DRM support. Such flexibity is not available in the deb/apt based installation.

DRM is mainly needed for rendering, but since we are using nvidia only for CUDA processing we do not need DRM. Additionally there is a known issue whereby DRM blocks the nvidia driver from unloading from the kernel. This also impacts bbswitch's ability to turn the GPU off.

  1. Download the runfile for Nvidia CUDA 9.2 toolkit
  2. Extract the three files from this runfile sudo sh cuda_version_linux.run --extract=/home/hemen/Downloads/CUDA. Extract path is required to be absolute.
  3. The above command generates 3 files in the extract directory. CUDA Samples, CUDA toolkit and Nvidia Driver.
  4. We first install the Nvidia driver without the openGL library and drm options sudo ./NVIDIA-Linux...version.run --no-opengl-files --no-drm. During the time of this writing CUDA version was 9.2 and nvidia driver was 396.26.
  5. Restart the machine once the installation of Nvidia driver completes successfully. Verify that the Xserver works as expected, is driven by i915.
  6. Verify that nvidia driver is loaded in the kernel lsmod | grep nvidia. It should list only the nvidia driver loaded. It should not list other drivers like nvidia_drm etc..

install CUDA toolkit

  1. Restart the machine back into init 3 using kernel paramaeters as explained above
  2. Install the CUDA toolkit using the runfile sudo ./cuda-linux.9.2.88-23920284.run
  3. Accept license. Use default install path, yes for desktop icons and symbolic link.

install CUDA samples

  1. Install CUDA samples using the run file ./cuda-samples.9.2.88-23920284-linux.run. I selected the installation directory as ~/CUDASamples.

install bbswitch

  1. Install bbswitch sudo apt install bbswitch-dkms
  2. Add the following two lines to /etc/modules, to load both these modules on startup
bbswitch
i915
  1. Update initramfs sudo update-initramf -u -k all
  2. Reboot. Ensure Xserver is started as expected. Noticed nvidia drivers are loaded in the kernel. At this point I was able to modprobe -r (remove) as well as modprobe (add) the nvidia driver. I was also abe to manually switch the GPU off after removing the nvidia driver from kernel sudo tee /proc/acpi/bbswitch <<< OFF

GPU in off state on machine startup

Blacklist nvidia

Prevent nvidia drivers from loading in the kernel at startup, add the following lines to /etc/modprobe.d/blacklist.conf

# Blacklist Nvidia
blacklist nvidia
blacklist mvidia-uvm
blacklist mvidia-drm
blacklist nvidia-modeset

bbswitch initial state

  1. Create file /etc/modprobe.d/bbswitch.conf with the below contents to set Nvidia GPU to OFF state when the machine starts up.
options bbswitch load_state=0
  1. Update initramfs sudo update-initramf -u -k all

Test if cuda samples are running

hemen@hemen-Inspiron-7559:~/bin$ cat /proc/acpi/bbswitch
0000:02:00.0 OFF
hemen@hemen-Inspiron-7559:~/bin$ sudo tee /proc/acpi/bbswitch <<< ON
ON
hemen@hemen-Inspiron-7559:~/bin$ lsmod | grep nvidia
hemen@hemen-Inspiron-7559:~/bin$ sudo modprobe nvidia
hemen@hemen-Inspiron-7559:~/bin$ lsmod | grep nvidia
nvidia              14016512  0
ipmi_msghandler        53248  2 nvidia,ipmi_devintf
hemen@hemen-Inspiron-7559:~/bin$ nvidia-smi
Thu May 31 23:00:51 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:02:00.0 Off |                  N/A |
| N/A   49C    P0    N/A /  N/A |      0MiB /  4046MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
hemen@hemen-Inspiron-7559:~/bin$ cat /proc/acpi/bbswitch 
0000:02:00.0 ON
hemen@hemen-Inspiron-7559:~/bin$ lsmod | grep nvidia
nvidia              14016512  0
ipmi_msghandler        53248  2 nvidia,ipmi_devintf
hemen@hemen-Inspiron-7559:~/Workspace$ cd CUDASamples/
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples$ ls
0_Simple  1_Utilities  2_Graphics  3_Imaging  4_Finance  5_Simulations  6_Advanced  7_CUDALibraries  bin  common  EULA.txt  Makefile
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples$ cd 1_Utilities/
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities$ ls
bandwidthTest  deviceQuery  deviceQueryDrv  p2pBandwidthLatencyTest  topologyQuery
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities$ cd deviceQuery
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ ls
deviceQuery  deviceQuery.cpp  deviceQuery.o  Makefile  NsightEclipse.xml  readme.txt
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 960M"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 4046 MBytes (4242604032 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1176 MHz (1.18 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1
Result = PASS
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ lsmod | grep nvidia
nvidia_uvm            782336  0
nvidia              14016512  1 nvidia_uvm
ipmi_msghandler        53248  2 nvidia,ipmi_devintf
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ sudo modprobe -r nvidia_uvm
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ sudo modprobe -r nvidia
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ lsmod | grep nvidia
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ cat /proc/acpi/bbswitch 
0000:02:00.0 ON
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ sudo tee /proc/acpi/bbswitch <<< OFF
OFF
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ cat /proc/acpi/bbswitch 
0000:02:00.0 OFF
hemen@hemen-Inspiron-7559:~/Workspace/CUDASamples/1_Utilities/deviceQuery$ 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment