Skip to content

Instantly share code, notes, and snippets.

@disulfidebond
Last active January 7, 2021 11:15
Show Gist options
  • Save disulfidebond/5f4801834a4e3e65e8604ccb7df015e8 to your computer and use it in GitHub Desktop.
Save disulfidebond/5f4801834a4e3e65e8604ccb7df015e8 to your computer and use it in GitHub Desktop.
CUDA install on Ubuntu 16.04

Overview

This gist provides the steps to take when setting up CUDA on Ubuntu 16.04, along with comments and advice. Before starting, it is critical to both read through this guide, and to research what NVIDIA drivers are required for your GPU. Do not rely on the installer to do this for you, and in the event of a conflict between the driver that you looked up and the driver that is suggested by the installer, always go with what you looked up.

Type of installation

There are two main ways to install CUDA on Ubuntu.

Easy Way

  • Install using package manager.
  • The NVIDIA instructions have detailed information for the curious. On the download page, this is the 'deb(network)' option.
  • To install using the package manager:
    • Completely uninstall any NVIDIA drivers and any CUDA installation (see Other steps below).
    • Navigate to the download page, then follow the instructions.

Difficult Way, or the Installation in Epic Advanced Mode.

  • Use this option at your own risk.
  • REALLY IMPORTANT: The Order of steps is critical.

Configuration/Installation Steps:

  1. If any CUDA drivers are installed, uninstall them by the method they were installed
  • If they were installed using a package manager like apt-get, use apt-get purge --auto-remove
  • If they were installed using a run file, use uninstall script
  • If you are uncertain, or if either/both of the above steps fail, then as a last resort, remove the directories at /usr/local/cuda and /opt/cuda
  1. Uninstall any/all NVIDIA drivers.
  2. Uninstall nouveau drivers. NOTE that after doing so, you will only have a minimal text-only installation. Make all preparations, including setting up 'undo' steps, before doing this step.
  • Several guides online have several suggestions on how to complete this step. I'd advise taking a minimalist approach, even though it may be more time-consuming. For example, use the Software Update GUI, then if you get caught in the failed login window loop, backtrack and try blacklisting nouveau instead via commandline.
  1. Install the latest NVIDIA drivers
  • Find the correct NVIDIA driver, download it, and install it.
    • You should complete this before disabling nouveau drivers.
  • You can also use the CUDA run installer to install NVIDIA drivers, but YMWV with this approach, so this option should not be your first choice, and you should not use this option if you are using a package manager for installation.
  1. Install the latest CUDA
  • If you are using the run installer:
    • follow the steps/prompts
  • If you are using a package manager:
    • check for the NVIDIA driver that was installed in step 4.
    • create a script or manually change the values as appropriate from nvidia-340 to nvidia-xxx, where xxx is the number of the driver (you DID properly look up what driver you needed and didn't rely on the installer to figure it out, right?)
      • Example: nvidia-352
    • Add CUDA to your path
      • Example: add /usr/local/cuda-7.0/bin: to $PATH and /usr/local/cuda-7.0/lib64: to $LD_LIBRARY_PATH

Other Steps

Install cuDNN

  • download libraries from https://developer.nvidia.com/rdp/cudnn-download
  • Note that a (free) developer registration is required
  • Other Note: Fundamentals of Deep Learning advises to install an archived version, but this is a terrible idea. Install version 5.0 or the most recent version.
  • The installation is fairly straightforward; download, untar, copy the headers to /usr/local/cuda-XX/include and the libraries to /usr/local/cuda-XX/lib64/ where 'XX' is the installed CUDA version

Verify installation

  • This step is technically optional, but realistically required.
  • Create a new directory, then run cuda-install-samples-XX.sh
    • XX is the CUDA version, which should auto-complete if you properly added CUDA to your $PATH
    • You may need to rename the created directories
    • If this does not work, make sure that you added the CUDA binaries to your path.
  • Now run make in the top level of the created sample directory.
  • If you did everything correctly, you'll see a lot of output, and maybe some pretty renderings. If you did not, then check the error messages for instructions on how to proceed. If all else fails, start over at Step 1 and try again.

Uninstall CUDA and NVIDIA

  1. If any CUDA drivers are installed, uninstall them by the method they were installed
  • If they were installed using a package manager like apt-get, use apt-get purge --auto-remove
  • If they were installed using a run file, use uninstall script
  • If you are uncertain, or if either/both of the above steps fail, then as a last resort, remove the directories at /usr/local/cuda and /opt/cuda
  1. Uninstall any/all NVIDIA drivers using a similar approach.

Help! Something went terribly wrong!

  • I didn't see any errors, but it looks like CUDA wasn't installed, because CUDA can't be found, and the command nvidia-smi throws the error 'unknown command'

    • Add CUDA to your path

    • Example:

        export "PATH=$PATH:/usr/local/cuda-10.1/bin"
        export "LD_LIBRRY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64"
      
  • When you login and type your password (correctly), you are suddenly taken to a login screen again.

    • This is a login loop from NVIDIA unique to Ubuntu 16.04, and has been frequently documented, with frequently different solutions. Here is what worked for me:
      • Turn off secure boot, then reboot the computer.
      • Reboot the computer (again), then switch to a text-only login by pressing Ctrl-Alt-F2. You can return to the GUI login by pressing Ctrl-Alt-F7.
      • From this point on, do NOT reboot the computer unless the installer does it for you, or until you reach the bullet point below that says 'Otherwise, reboot the computer'.
      • Stop Xserver with the command sudo service lightdm stop
      • Delete all NVIDIA drivers and any CUDA installations
      • Reinstall CUDA using the package manager.
      • If there are any errors, follow one of the options below under 'The computer booted to a black screen'.
      • Otherwise, reboot the computer.
  • The tests failed from Verify installation failed.

    • If you're feeling lucky, ignore this. Otherwise, start over from the beginning and pick the other installation method.
  • The computer booted to a black screen!

    • You have a few options:
      • Completely erase and reinstall Ubuntu.
      • Try booting into Single User mode from the recovery screen, mount the file system, and then restore the nouveau drivers, similar to what is described here. Then start over.
      • Boot into recovery mode, and then attempt to fix the NVIDIA drivers, using the link in the previous bullet point as a guide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment