Skip to content

Instantly share code, notes, and snippets.

View dualvtable's full-sized avatar

Pramod Ramarao dualvtable

View GitHub Profile

Overview

The purpose of this document is to summarize the steps for building and installing NVIDIA driver precompiled modules on RHEL7 based distributions.

Flavors

For RHEL7 and derivatives, there are three sets of packages with different package dependencies:

  1. latest-dkms - always update to the highest versioned driver. These packages are not precompiled against a specific kernel.
  2. latest - always update to the highest versioned driver. These packages are precompiled against a specific kernel
  3. branch-XXX - locks driver updates to the specified driver branch
#!/bin/sh
set -x
export BASE_URL=https://us.download.nvidia.com/tesla \
&& export DRIVER_VERSION=470.103.01 \
&& curl -fSsl -O $BASE_URL/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run
@dualvtable
dualvtable / k8s-device-plugin-issue-200.md
Last active March 3, 2021 00:25
Using the GPU id instead of uuid in the NVIDIA device plugin

Introduction

This is a short-writeup to explain how to verify the fix to Issue 200 reported for the NVIDIA device plugin: NVIDIA/k8s-device-plugin#200

This issue happens when the NVIDIA device plugin is configured to allow only privileged access to all GPUs to containers (rather than allow unprivileged containers from getting access to GPUs that the container did not request). A detailed write up on this aspect is described here.

Issue #200 is specifically observed on IaaS cloud where VMs could be stopped and then restarted - any pods that had GPUs assigned can fail since in a cloud environment, different physical GPUs could be attached to VMs on restart. The issue was that the device plugin only supported enumerating GPUs to containers using UUIDs (which are unique), but these can change when VMs are restarted. The fix was to add a new option called deviceIDStartegy to the plugin to allow

@dualvtable
dualvtable / setup_nvidia_drivers_rhel7.sh
Created January 7, 2021 23:00
Setting up drivers and fabric-manager on RHEL7
#!/bin/sh
set -x
USERNAME=$1
PASSWORD=$2
sudo subscription-manager register --username ${USERNAME} --password ${PASSWORD} --auto-attach \
&& sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm \
&& sudo subscription-manager repos --enable="rhel-*-optional-rpms" --enable="rhel-*-extras-rpms" --enable="rhel-ha-for-rhel-*-server-rpms"
@dualvtable
dualvtable / setup_vmi_nvidia.sh
Last active February 22, 2022 23:51
Simple script for setting up NVIDIA software on bare-metal
#!/usr/bin/env bash
# author: com/github/dualvtable
set -eo pipefail
check_root()
{
if [[ $EUID -ne 0 ]]; then
echo "This installer must be run as root."
exit 1
@dualvtable
dualvtable / setup_nvidia_centos8.sh
Last active October 21, 2020 06:02
Simple script for setting up NVIDIA drivers and Docker on CentOS 8
#!/usr/bin/env bash
set -euo pipefail
ARCH=$( /bin/arch )
distribution=rhel8
setup_prereq()
{
sudo dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils elfutils-libelf-devel libglvnd-devel && \
set nocompatible
source $VIMRUNTIME/vimrc_example.vim
set incsearch ic hlsearch
set backspace=eol,indent,start
set hidden
set tabstop=4 expandtab shiftwidth=4 smarttab
set textwidth=75
set autowrite
set wildmenu wildmode=longest,list,list:full