- Laptop: MSI GL66 Pulse with Nvidia RTX 3060
- System: 6.5.0-15-generic 22.04.1-Ubuntu x86_64 GNU/Linux
- Previous drivers:
xserver-xorg-video-noveau
- Cause: External monitor not detected !link
Ubuntu didn't recognize my external monitor, although it was correctly connected to the laptop with an HDMI cable.
I had installed Nvidia drivers before on this same operative system and laptop and it didn't end well, so I always had to revert. Every time, I installed the driver provided by Ubuntu. Installing a driver directly provided by the vendor was a different approach, so I went for it.
The installer warned me that drivers provided by Ubuntu could be more stable as were tested by Ubuntu maintainers, hence it was recommended installing them that way. I aborted the installation and installed those instead. The installation was apparently successful, so I rebooted the system to double-check, but it didn't boot back. It didn't surprise me as I had had this problem in the past installing this same driver this same way.
The booting process got stuck on a black screen with a blinking cursor. First I did was to check the journal to look for the cause that prevents the boot from continuing.
sudo journalctl --reverse --since=today --grep=error
I quickly noticed a yellow block saying that gdm
had a few of fatal errors. Nice, ain't it?
/usr/libexec/gdm-x-session[2117]: (EE) Server terminated with error (1). Closing log file.
/usr/libexec/gdm-x-session[2117]: Fatal server error:
/usr/libexec/gdm-x-session[2117]: (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
/usr/libexec/gdm-x-session[2057]: (EE) Server terminated with error (1). Closing log file.
/usr/libexec/gdm-x-session[2057]: Fatal server error:
/usr/libexec/gdm-x-session[2057]: (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
/usr/libexec/gdm-x-session[1997]: (EE) Server terminated with error (1). Closing log file.
/usr/libexec/gdm-x-session[1997]: Fatal server error:
/usr/libexec/gdm-x-session[1997]: (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
/usr/libexec/gdm-x-session[1937]: (EE) Server terminated with error (1). Closing log file.
/usr/libexec/gdm-x-session[1937]: Fatal server error:
/usr/libexec/gdm-x-session[1937]: (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
/usr/libexec/gdm-x-session[1877]: (EE) Server terminated with error (1). Closing log file.
/usr/libexec/gdm-x-session[1877]: Fatal server error:
/usr/libexec/gdm-x-session[1877]: (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
/usr/libexec/gdm-x-session[1699]: (EE) Server terminated with error (1). Closing log file.
/usr/libexec/gdm-x-session[1699]: Fatal server error:
/usr/libexec/gdm-x-session[1699]: (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Googling gdm fatal server error
took me a to a forum for Arch Linux that included instructions to enable DRM kernel mode (whatever it is). I ran cat /sys/module/nvidia_drm/parameters/modeset
and verified hat it was enabled (Y
). Afterwards, it asked for a initframs
update and a reboot for changes to take effect. !link
sudo update-initframs -u
reboot
The problem persisted, so I continued searching and found a thread in the Nvidia developer forums with a title that summed up my problem quite well. !link
There was not only a clear guide about how to revert the installed drivers, but also about how to install them.
- Change your boot mode in grub to not boot into the GUI but console mode only
- Purge all existing NVIDIA drivers
- Run
sudo update-initramfs -u
- Reboot
- Install the new drivers
- Run
sudo update-initramfs -u
- Reboot
- Run
nvidia-smi
to check if the GPU is recognized correctly- Only then revert the changes in grub to get your GUI back.
Once you have a shell/terminal/console open, start by making sure no Nvidia modules are loaded
sudo lsmod | grep nvidia
If there are still modules loaded, unload them with
sudo modprobe -r
and the name of the module. You might need to do it in certain order. Then uninstall the Nvidia driver for example withsudo apt-get purge "nvidia*" sudo apt-get autoremove sudo update-initramfs -u sudo rebootReboot and get back into a shell. Now install the NVIDIA driver you want to install. Make sure to follow the instructions of the installer exactly! If you have secure boot enabled you MUST follow the correct authentication process, otherwise the kernel module will not be loaded.
After the installation you can re-enable the boot to GUI.
My laptop booted up again and took me to the login screen. Soon after that, I noticed that the operative system ran quite slow, and that the Bluetooth and Wi-Fi were gone. At this point, I only could not connect an external monitor that I could not use essential features that were working before. I also had had this problem before, caused by the installation of the Nvidia drivers, so I knew exactly what to do: reboot with a previous Kernel version.
For some reason, the latest Kernel (as of time of this post, 6.0.5-17
) image is automatically set as default at boot. I had to select a previous kernel version at GRUB.
6.0.5-14
worked fine so I kept that one. In order to prevent this happening again, I looked for instructions about how to disable "automatic kernel updates", which raised useful results.
Open Terminal:
Open a terminal on your Ubuntu system. You can use the keyboard shortcut
Ctrl+Alt> +T
to open the terminal.Hold the Current Kernel Package:
Run the following command to hold the current kernel package to prevent it from being automatically updated:
sudo apt-mark hold linux-image-generic linux-headers-genericCheck Held Packages:
You can verify that the packages are held by running:
dpkg --get-selections | grep hold
You should see output similar to:
linux-headers-generic hold linux-image-generic hold
Reverting the Hold:
If you want to revert this and allow automatic updates for the kernel again, you can un-hold the packages using:
sudo apt-mark unhold linux-image-generic linux-headers-generic
Next step in the guide was to install the drivers with the GUI disabled, but it didn't explain how to do so arguing that there were loads of tutorials in the net. So I did. !link
sudo systemctl disable gdm
reboot
Next rock on the path was the installer failing miserably, which took me to another Nvidia developer forum's thread. !link
sudo sh NVIDIA-Linux-x86_64-535.154.05.run
...
cc: error: unrecognized command-line option ‘-ftrivial-auto-var-init=zero’
I followed the thread, applied the instructions to use GCC-12 by default and tried again.
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 11
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 12
sudo update-alternatives --config gcc
sudo sh NVIDIA-Linux-x86_64-535.154.05.run
This time the installation completed successfully. I didn't change any of the default values provided by the installer and refused to automatically configure the windows' manager (I could always do it later if the installation worked).
After reboot, I checked the system worked as expected, including wireless technologies. My GPU was also detected and Nvidia drivers were in use.
reboot
nvidia-smi
It was time for clean-up, which meant enabling GDM again. This turned out into another puzzle, as surprisingly systecmtl
won't enable it.
sudo systemctl enable gdm
sudo systemctl -f enable gdm
sudo systemctl -f enable gdm.service
None of these worked.
In this case, I found a thread on AskUbuntu about this matter, only that it was for KDM. Interestingly enough, the answer was not in the Answers section, but as a comment on the question, as it referenced an edit of the original author with a fix that worked for him, but not completely in my case. To sum up, the service had to be reconfigured back only after it was started.
sudo systemctl start gdm3
sudo dpkg-reconfigure gdm3
sudo systemctl status gdm
nvidia-smi
Nvidia's drivers were in use an GDM was back on. As a last touch, I wanted to make sure that the kernel 6.0.5-17
could never come back.
Once again, Google and developer forums had the solution. I edited GRUB's config file to add these 2 lines, apply the changes and reboot.
GRUB_SAVEDEFAULT=true
GRUB_DEFAULT=saved
sudo update-grub
reboot
Finally, I listed the installed Kernel images and removed these I didn't want.
dpkg --list | grep linux-image | grep ii
sudo apt-get remove linux-image-6.5.0-17-generic
sudo rm -rf /lib/modules/6.5.0-17-generic
reboot
nvidia-smi
Wed Feb 7 23:07:08 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 ... Off | 00000000:01:00.0 On | N/A |
| N/A 40C P8 11W / 80W | 59MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2595 G /usr/lib/xorg/Xorg 55MiB |
+---------------------------------------------------------------------------------------+
And this is how I managed to install the nvidia-535
driver in Ubuntu 22.04 after 2 hours.