JureCerar/oproda.md

## oproda.md

      
    Raw
  

              oproda.md
            
          
    Nvidia GPU installation on OLD computer running CentOS/RHEL

Prepare your CentOS/RHEL machine by updating your system, installing EPEL, installing kernel headers, and installing CUDA drivers.
Update and Reboot

Update the entire system and reboot to activate the latest kernel.
sudo yum update
sudo reboot
EPEL & DKMS

You must install dkms for registering the NVIDA kernel module with DKMS:
sudo yum install epel-release
sudo yum install --enablerepo=epel dkms
sudo yum install libglvnd-devel
Install Kernel Headers

Install kernel headers and development packages and reboot your system to ensure that the kernel is up to date:
sudo yum install kernel-devel-$(uname -r) kernel-headers$(uname -r)
sudo reboot
GCC

The gcc compiler is required for development using the CUDA Toolkit. It is not required for running CUDA applications. It is generally installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly.
yum group install "Development Tools"
gcc --version 
Disable nouveau

If the nouveau kernel driver used by the CentOS 7, installer would not work as this driver is incompatible with the NVIDIA driver and it must be disabled. Edit grub file as follows:
sudo nano /etc/default/grub
Append the following to the GRUB_CMDLINE_LINUX line: modprobe.blacklist=nouveau. Save and close the file. Rebuild the grub config and restart the system as per BIOS or UEFI system.

for BIOS user:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
for UEFI user:
sudo grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
Reboot the Linux box now:
sudo reboot
NOTE: Use alt+ctrl+F2 to switch to terminal only mode once nouveau is disabled.
Install Nvidia drivers

SOURCE: https://centos.org/forums/viewtopic.php?t=70510

The recommended way to do this on CentOS/RHEL is to use the ELRepo third party yum repository. They take the nvidia .run binary blob and package it as an rpm for installation with yum and they do that in such a way that it survives most kernel updates. If you use the .run file then you will need to reinstall the driver after every kernel update.
Import the public key and install ELRepo for RHEL-7, SL-7 or CentOS-7:
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
yum install https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm 
First remove all previous Nvidia drivers:
sudo yum autoremove nvidia*
Detect Nvidia driver package.
yum install nvidia-detect
nvidia-detect
with the result (for Tesla C2075 = kmod-nvidia-390xx ) install ELRepo Nvidia drivers
sudo yum install kmod-nvidia-390xx 
Checkpoint

Run nvidia-smi to verify that your drivers are installed correctly and recognize the GPUs in your environment. Depending on your environment, you should see something like this to verify that your NVIDIA GPUs and drivers are present:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla C2075         Off  | 00000000:01:00.0 Off |                    0 |
| 30%   58C    P0    82W /  N/A |      0MiB /  5301MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Download Nvidia toolkit 8.0 HERE
CUDA Toolkit 8.0 - Feb 2017

Operating System:  Linux
Architecture :     x86_64
Distribution:      CentOS
Version:           7
Instaler Type:     run (local)

To install use:
sudo sh ./cuda_8.0.61_375.26_linux.run
# Follow instructions
sudo reboot
NOTE: Install only CUDA toolkit 8.0
Post-installation actions

To add this path to the PATH and the LD_LIBRARY_PATH variable:
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

Oporoda System MOTD

--------------------------------------------------------------------
Welcome to Oproda(tm) system

"The poor man's version of Vitez(tm) workstation..."
                            -- unknown --

--------------------------------------------------------------------
Navoljo so orodja za prevajanje devtools-7 (gcc-7, gfortran-7, etc.):
$ scl enable devtools-7 bash
$ gcc --version

Za racunanje je navoljo Nvidia graficni pospesevalnik (Tesla C2075). Namescen je Nvidia CUDA toolkit 8.0.
Za uporabo C/C++ prevajalnika od Nvidia-e (nvcc-8.0.16) uporabi naslednje ukaze:
$ export PATH=$PATH:/usr/local/cuda/bin
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
$ nvcc --version

Za graficni pospesevalnik so nalozeni najnovejsi (se delujoci) gonilniki: v390.116
Delovanje graficnega pospesevalnika preveri z ukazom:
$ nvidia-smi

Namescen je GROMACS 2018.6-dev:
$ gmx --version

Namescen je LAMMPS (7 Aug 2019):
$ lmp -h

--------------------------------------------------------------------
V primeru vprasanj se obrni na administratorja:
 Jure Cerar (xxx@yyy.zz)
 Jurij Rescic (xxx@yyy.zz)

--------------------------------------------------------------------

COMMENT

As already pointed out, nvcc depends on gcc 4.4. It is possible to configure nvcc to use the correct version of gcc without passing any compiler parameters by adding softlinks to the bin directory created with the nvcc install.
The default cuda binary directory (the installation default) is /usr/local/cuda/bin, adding a softlink to the correct version of gcc from this directory is sufficient:
sudo ln -s /usr/bin/gcc-4.4 /usr/local/cuda/bin/gcc