s41m0n/linuxNvidia_guide.md

## linuxNvidia_guide.md

      
    Raw
  

              linuxNvidia_guide.md
            
          
    Linux - Nvidia switchable setup guide

The aim of this guide is to provide a working strategy to make your dedicated graphic card turn on/off correctly in a Linux environment (with xorg).
The following scripts have been created by tyrells and this guide is a remake of Graff's one.
Required Packages

The following two packages are stricly required:

nvidia
bumblebee (to use optirun)

In addition, this guide covers even the scenario in which also these packages are installed:

tlp
powertop (mostly used for verification)

Configuration

First of all, if you have tlp installed you need to teach him not to manage the Nvidia power consumption, since we would not be able to turn it on/off. To find out the pci of your Nvidia graphic card:
➜  ~ lspci | grep NVIDIA
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev a1)
Let's insert the value 01:00.0 in the blacklist of tlp:
/etc/default/tlp

RUNTIME_PM_BLACKLIST="01:00.0"

Once done, we need to modify the bumblebee configuration file to specify the used method for saving power by disabling the nvidia card:
/etc/bumblebee/bumblebee.conf

...
Driver=nvidia
...

And in the Nvidia section:
...
PMMETHOD=none
...

Then, we need to create the following file in order to allow GPU to poweroff on boot. Before, be sure to retrieve the correct value to insert by typing:
➜  ~ ls /sys/bus/pci/devices | grep 01:00.0        
0000:01:00.0
And now create the file:
/etc/tmpfiles.d/nvidia_pm.conf

w /sys/bus/pci/devices/0000:01:00.0/power/control - - - - auto

The following two configurations are supposed to configure your xorg environment to not automatically add a GPU when detected. Moreover, you have to specify your integrated card driver (in my case Intel):
/etc/X11/xorg.conf.d/01-noautogpu.conf

Section "ServerFlags"
	Option "AutoAddGPU" "off"
EndSection

/etc/X11/xorg.conf.d/20-intel.conf

Section "Device"
 Identifier  "Intel Graphics"
 Driver      "intel"
EndSection

Blacklist files

Now that the general configuration has been correctly made, let's focus on blacklisting some modules. It is required to prevent some modules to be loaded, and since we want to manually turn on/off the gpu we have to add this file:
/etc/modprobe.d/blacklist.conf

blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
blacklist nv
blacklist nvidia
blacklist nvidia-drm
blacklist nvidia-modeset
blacklist nvidia-uvm
blacklist ipmi_msghandler
blacklist ipmi_devintf 

Moreover, there are many modules which are automatically loaded together with nvidia and block its unloading. Since we do not want to find ourself in this scenario, we disable them by creating:
/etc/modprobe.d/disable-ipmi.conf

install ipmi_msghandler /usr/bin/false
install ipmi_devintf /usr/bin/false

And the same thing for the nvidia module:
/etc/modprobe.d/disable-nvidia.conf

install nvidia /bin/false

GPU management scripts

The following two scripts are used to switch on/off the GPU by just calling them in a terminal. They not only are responsible of switching the state of the correct pci, but they also unload/reload all the needed modules.
/bin/enableGpu.sh

#!/bin/sh
# allow to load nvidia module
mv /etc/modprobe.d/disable-nvidia.conf /etc/modprobe.d/disable-nvidia.conf.disable

# remove NVIDIA card (currently in power/control = auto)
echo -n 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
sleep 1
# change PCIe power control
echo -n on > /sys/bus/pci/devices/0000\:00\:01.0/power/control
sleep 1
# rescan for NVIDIA card (defaults to power/control = on)
echo -n 1 > /sys/bus/pci/rescan
/bin/disableGpu.sh

modprobe -r nvidia_drm
modprobe -r nvidia_uvm
modprobe -r nvidia_modeset
modprobe -r nvidia

# change NVIDIA card power control
echo -n auto > /sys/bus/pci/devices/0000\:01\:00.0/power/control
sleep 1
# change PCIe power control
echo -n auto > /sys/bus/pci/devices/0000\:00\:01.0/power/control
sleep 1

# lock system form loading nvidia module
mv /etc/modprobe.d/disable-nvidia.conf.disable /etc/modprobe.d/disable-nvidia.conf
Please note: to make them work correctly set the execute permission. chmod +x /bin/enableGpu.sh /bin/disableGpu.sh
Service to lock GPU on shutdown

The unit we are going to create represent a service which locks the GPU on shutdown/restart in case it is not disable yet. This is necessary, otherwise on the next boot both nvidia and ipmi modules will be loaded and it would not be possible to unload them anymore, even though we have created the blacklist file.
/etc/systemd/system/disable-nvidia-on-shutdown.service

Description=Disables Nvidia GPU on OS shutdown

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash -c "mv /etc/modprobe.d/lock-nvidia.conf.disable /etc/modprobe.d/lock-nvidia.conf || true"

[Install]
WantedBy=multi-user.target

Reload systemd daemons and enable the new service:
systemctl daemon-reload 
systemctl enable disable-nvidia-on-shutdown.service
Final remarks

To make the system work:

Reboot and verity that nvidia module is not loaded lsmod | grep nvidia
Verify with powertop under Device stats that the Nvidia card has 0% of Power supply
Enable your GPU by using the script enableGpu.sh 
Verify again that this time the power supply is 100%
Check if GPU is loaded by using nvidia-smi
Try to run a program with optirun optirun glxsphere64 no matter what
Disable the Nvidia card disableGpu.sh
Check once again the power consumption to be sure

IMPORTANT: if you have a dual boot installation of Windows which uses the Nvidia card and of course manage it differently, I noticed that on the next boot of Linux it would be possible that Nvidia card is loaded. In this case, since we would not be able to manually unload the nvidia module, we have to simulate the opposite action performed by the created service disable-nvidia-on-shutdown.service which is in particular
mv /etc/modprobe.d/disable-nvidia.conf.disable /etc/modprobe.d/disable-nvidia.conf . So all we have to do is to open a terminal and perform the opposite renaming:
sudo mv /etc/modprobe.d/disable-nvidia.conf /etc/modprobe.d/disable-nvidia.conf.disable
Restart the computer and notice if effectively the Nvidia card is loaded or not; if not, try to disable it by using the script and hopefully it will work! I did not dig deeper this scenario, so every tips are welcome :D