Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Set up the Nvidia GeForce GT 710 on Raspberry Pi Compute Module 4
#!/bin/bash
# Attempt to set up the Nvidia GeForce GT 710 on a Pi CM4.
#
# I have tried both armv7l and aarch64 versions of the proprietary driver, in
# addition to the nouveau open source driver (which needs to be compiled into
# a custom Raspberry Pi kernel).
#
# tl;dr - None of the drivers worked :P
# First, expand the BAR space, following the directions in this gist:
# https://gist.github.com/geerlingguy/9d78ea34cab8e18d71ee5954417429df
#####
# Option A - Proprietary Driver
#####
# Install kernel-headers so kernel module can be built.
sudo apt-get update
sudo apt upgrade -y # if necessary
sudo reboot # if necessary
sudo apt install -y raspberrypi-kernel-headers
# Download driver from Nvidia's website.
# 32-bit: https://www.nvidia.com/en-us/drivers/unix/linux-arm-display-archive/
# wget https://us.download.nvidia.com/XFree86/Linux-x86-ARM/390.138/NVIDIA-Linux-armv7l-gnueabihf-390.138.run
# 64-bit: https://www.nvidia.com/en-us/drivers/unix/linux-aarch64-archive/
wget https://us.download.nvidia.com/XFree86/aarch64/455.28/NVIDIA-Linux-aarch64-455.28.run
# TODO: Any way to get the latest version and get the download URL in a script? Manual download is sooo annoying.
# (If running) stop X server.
sudo systemctl stop lightdm
# Run the driver .run file we just downloaded.
chmod +x ./NVIDIA-Linux-aarch64-455.28.run
sudo ./NVIDIA-Linux-aarch64-455.28.run
# For 32-bit: sudo ./NVIDIA-Linux-armv7l-gnueabihf-390.138.run --kernel-source-path /usr/src/linux-headers-5.4.51-v7l+
# Reboot and (sadly) see the card fail to initialize.
sudo reboot
#####
# Option B - compile nouveau module into custom Pi Kernel
#####
# Install dependencies
sudo apt install -y git bc bison flex libssl-dev make
# Clone source
git clone --depth=1 https://github.com/raspberrypi/linux
# Apply default configuration
cd linux
export KERNEL=kernel7l # use kernel8 for 64-bit, or kernel7l for 32-bit
make bcm2711_defconfig
# Customize the .config further with menuconfig
sudo apt install -y libncurses5-dev
make menuconfig
# (search for /nouveau, enable in the proper section, save, then exit)
nano .config
# (edit CONFIG_LOCALVERSION and add a suffix that helps you identify your build)
# Build the kernel and copy everything into place
make -j4 zImage modules dtbs # 'Image' on 64-bit
sudo make modules_install
sudo cp arch/arm/boot/dts/*.dtb /boot/
sudo cp arch/arm/boot/dts/overlays/*.dtb* /boot/overlays/
sudo cp arch/arm/boot/dts/overlays/README /boot/overlays/
sudo cp arch/arm/boot/zImage /boot/$KERNEL.img
# Reboot, but it locks up if you have the card in :(
sudo reboot
@geerlingguy
Copy link
Author

geerlingguy commented Oct 22, 2020

@borancar - In this case, I'm getting no output whatsoever on the HDMI port from the Zotac card, nor on the VGA :(

It would be interesting to see if that works on the Pi (I don't have time currently to watch that whole video, but is there some more information / a blog post I could look at separately?

@geerlingguy
Copy link
Author

geerlingguy commented Oct 22, 2020

After seeing a few notes here and there about PCIe ASPM being enabled on the Pi and causing issues, I added pcie_aspm=off to /boot/cmdline.txt to see if that would make any difference... (I confirmed the setting was picked up in the kernel command, and with sudo lspci -vv | grep ASPM—it was disabled on the GPU).

@borancar
Copy link

borancar commented Oct 22, 2020

tl;dr If my suspicion is right, there is some code on that GPU written in x86 assembly that does some initial configuration and that's the GPU's BIOS. Since RPi can't execute x86 assembly it can't init the card fully so you need QEMU execute that code and init the card.

I managed to track the slides on slideshare - https://www.slideshare.net/linaroorg/hkg18505-qemu-in-uefi. Specifically, one of the examples from the presentation was a GPU on an ARM server. https://github.com/ardbiesheuvel/X86EmulatorPkg is the github repo referenced there, but that deals with the full integration into ARM UEFI, so you might need more modifications, or, alternatively, there is this to marry U-Boot, UEFI and grub - https://web.archive.org/web/20180404183425if_/http://schd.ws/hosted_files/openiotelcna2017/c4/Marrying%20U-Boot%2C%20UEFI%20and%20grub.pdf.

@geerlingguy
Copy link
Author

geerlingguy commented Oct 22, 2020

Ah... when the X server tries starting, it runs into an error:

[    11.959] (II) LoadModule: "nvidia"
[    11.959] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[    11.960] (II) Module nvidia: vendor="NVIDIA Corporation"
[    11.960] 	compiled for 1.6.99.901, module version = 1.0.0
[    11.960] 	Module class: X.Org Video Driver
[    11.960] (II) NVIDIA dlloader X Driver  455.28  Wed Sep 30 00:57:48 UTC 2020
[    11.960] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[    11.961] (II) Loading sub module "fb"
[    11.961] (II) LoadModule: "fb"
[    11.961] (II) Loading /usr/lib/xorg/modules/libfb.so
[    11.961] (II) Module fb: vendor="X.Org Foundation"
[    11.961] 	compiled for 1.20.4, module version = 1.0.0
[    11.961] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    11.961] (II) Loading sub module "wfb"
[    11.961] (II) LoadModule: "wfb"
[    11.961] (II) Loading /usr/lib/xorg/modules/libwfb.so
[    11.962] (II) Module wfb: vendor="X.Org Foundation"
[    11.962] 	compiled for 1.20.4, module version = 1.0.0
[    11.962] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    11.962] (II) Loading sub module "ramdac"
[    11.962] (II) LoadModule: "ramdac"
[    11.962] (II) Module "ramdac" already built-in
[    11.963] (II) NVIDIA(0): nvCommonPlatformProbe: Device is NULL
[    11.963] (II) NVIDIA(0): nvCommonPlatformProbe: Device is NULL
[    11.963] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[    11.963] (==) NVIDIA(0): RGB weight 888
[    11.963] (==) NVIDIA(0): Default visual is TrueColor
[    11.963] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[    11.963] (**) NVIDIA(0): Enabling 2D acceleration
[    11.963] (II) Loading sub module "glxserver_nvidia"
[    11.963] (II) LoadModule: "glxserver_nvidia"
[    11.963] (II) Loading /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
[    11.974] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[    11.974] 	compiled for 1.6.99.901, module version = 1.0.0
[    11.974] 	Module class: X.Org Server Extension
[    11.974] (II) NVIDIA GLX Module  455.28  Wed Sep 30 01:00:55 UTC 2020
[    11.974] (II) NVIDIA: The X server does not support PRIME Render Offload.
[    12.139] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0.  Please
[    12.139] (EE) NVIDIA(GPU-0):     check your system's kernel log for additional error
[    12.139] (EE) NVIDIA(GPU-0):     messages and refer to Chapter 8: Common Problems in the
[    12.139] (EE) NVIDIA(GPU-0):     README for additional information.
[    12.139] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
[    12.139] (EE) NVIDIA(0): Failing initialization of X screen
[    12.139] (II) UnloadModule: "nvidia"
[    12.140] (II) UnloadSubModule: "glxserver_nvidia"
[    12.140] (II) Unloading glxserver_nvidia
[    12.140] (II) UnloadSubModule: "wfb"
[    12.140] (II) UnloadSubModule: "fb"
[    12.140] (EE) Screen(s) found, but none have a usable configuration.
[    12.140] (EE) 
Fatal server error:
[    12.140] (EE) no screens found(EE) 
[    12.140] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[    12.140] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[    12.140] (EE) 
[    12.190] (EE) Server terminated with error (1). Closing log file.

@geerlingguy
Copy link
Author

geerlingguy commented Oct 22, 2020

Output from running sudo nvidia-bug-report.sh: https://gist.github.com/geerlingguy/895c56d2a2e3788ff932b9959c128b5c

@elFarto
Copy link

elFarto commented Oct 22, 2020

Looks like that error has been reported before[1], on an aarch64 system too. So it might be an actual bug in the driver.

[1] https://forums.developer.nvidia.com/t/gtx-1080-drivers-fail-to-load-with-nvrm-gpu-000400-0-rminitadapter-failed-0x251211/156902/1

@geerlingguy
Copy link
Author

geerlingguy commented Oct 22, 2020

@elFarto (lol nice name) - I posted in that issue on the Nvidia forums... we'll see if it gets anywhere. At this point, I'm kinda tempted to go to Micro Center and try an inexpensive AMD GPU and see if the experience is better (everyone online is saying their drivers included in the kernel should work so much better... but I wonder if they have good ARM support?).

@geerlingguy
Copy link
Author

geerlingguy commented Oct 22, 2020

Testing CUDA support (install took a while, and the .run file is like 2.5 GB!):

$ wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux_sbsa.run
$ chmod +x cuda_11.1.0_455.23.05_linux_sbsa.run 
$ sudo ./cuda_11.1.0_455.23.05_linux_sbsa.run 
===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-11.1/
Samples:  Installed in /home/pi/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-11.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.1/lib64, or, add /usr/local/cuda-11.1/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.1/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log

$ export PATH=$PATH:/usr/local/cuda-11.1/bin
$ export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64

@geerlingguy
Copy link
Author

geerlingguy commented Oct 22, 2020

And running some samples:

$ cd ~/NVIDIA_CUDA-11.1_Samples/1_Utilities/deviceQuery
$ make
$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
-> no CUDA-capable device is detected
Result = FAIL

Dangit. At the same time, over in dmesg:

[ 9195.790222] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[ 9195.790310] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

But I'm wondering if @borancar's idea might help, if it truly is an initialization bug with the board. It would be nice, though, if the AARCH64 driver would work without having some sort of extra emulation layer on top. I wonder what ARM devices they use for testing at Nvidia HQ?

@geerlingguy
Copy link
Author

geerlingguy commented Oct 22, 2020

Well there's also this:

According to the internet, there seem to have been multiple GPU models sold under that name: one had compute capability 2.x and the other had compute capability 3.0. Neither are supported by CUDA 11 which requires compute capability >= 3.5.

So apparently even without that failure, the GT 710 may not work with CUDA 11, le sigh. It isn't included in the current version of the CUDA GPU list either. But it does have '192 CUDA cores' and some databases say it's running 3.5, so ¯_(ツ)_/¯

For now, I'm going to switch gears and test out a cheap (but x16, so not drop-in compatible) Radeon 5450! Separate thread for that: geerlingguy/raspberry-pi-pcie-devices#4

@geerlingguy
Copy link
Author

geerlingguy commented Oct 23, 2020

Since I was closer to success with the Zotac card than I got with the Radeon (see linked thread above), I'm going to try it again, this time with the https://nouveau.freedesktop.org driver.

@geerlingguy
Copy link
Author

geerlingguy commented Oct 23, 2020

Hmm... looking at the install instructions it seems like everyone assumes the nouveau driver is already present on Debian—but I don't see it loaded on Raspberry Pi (lsmod doesn't list it, and I don't see it available in the output of find /lib/modules/$(uname -r) -type f -name '*.ko').

@geerlingguy
Copy link
Author

geerlingguy commented Oct 23, 2020

I was going to look into rebuilding the kernel but now I'm hitting Could not connect to raspbian.raspberrypi.org:80 (93.93.128.193) and I think that's a sign it's time to stop... for at least a few minutes ;)

@geerlingguy
Copy link
Author

geerlingguy commented Oct 23, 2020

I'm going to try a few of the older 32-bit versions... maybe the compilation bug was introduced recently. Trying the following:

  • 390.77: lots more errors
  • 390.48: lots more errors

Seeing errors like:

error: redefinition of ‘list_is_first’
error: "NV_BUILD_MODULE_INSTANCES" is not defined, evaluates to 0 [-Werror=undef]

I opened a new issue in the Nvidia forums here: Can’t install ARM (32-bit) driver on Debian 10 / Raspberry Pi OS

@geerlingguy
Copy link
Author

geerlingguy commented Oct 23, 2020

At this point, I'm going to try to build a custom kernel with the Nouveau driver compiled in. For some silly reason, the Pi group decided they didn't need that in the default distribution... why would anyone want to try using an external GPU with the Pi!?

Anyways, sarcasm aside, I'm running:

# Install dependencies
sudo apt install -y git bc bison flex libssl-dev make libncurses5-dev

# Clone source
git clone --depth=1 https://github.com/raspberrypi/linux

# Apply default configuration
cd linux
export KERNEL=kernel7l # use kernel8 for 64-bit, or kernel7l for 32-bit
make bcm2711_defconfig

# Customize the .config further with menuconfig
make menuconfig
# (search for /nouveau, enable in the proper section, save, then exit)
nano .config
# (edit CONFIG_LOCALVERSION and add a suffix that helps you identify your build)

# Build the kernel and copy everything into place
make -j4 zImage modules dtbs # 'Image' on 64-bit
sudo make modules_install
sudo cp arch/arm/boot/dts/*.dtb /boot/
sudo cp arch/arm/boot/dts/overlays/*.dtb* /boot/overlays/
sudo cp arch/arm/boot/dts/overlays/README /boot/overlays/
sudo cp arch/arm/boot/zImage /boot/$KERNEL.img

@geerlingguy
Copy link
Author

geerlingguy commented Oct 23, 2020

One other thing I'm going to try (after two unsuccessful kernel builds, ha!) is Ubuntu 20.04... maybe it has Nouveau installed by default.

Some notes:

  • Ubuntu's /boot/firmware doesn't have a CM4 dtb, just bcm2711-rpi-4-b.dtb.
  • Ubuntu does an unattended upgrade on first boot, and that takes a looooong time.
  • Aaaand looks like it doesn't have nouveau or nvidia driver in 5.4.0-1015-raspi kernel modules either. Drat.

@elFarto
Copy link

elFarto commented Oct 23, 2020

Just a note, the nvidia and nouveau drivers don't get along well together. If you're using the same install, you'll need to remove the modprobe blacklist the nvidia drivers install to stop the nouveau driver snatching up the hardware. You'll likely want to blacklist the nvidia ones when using the nouveau ones, or use a separate install.

@geerlingguy
Copy link
Author

geerlingguy commented Oct 23, 2020

@elFarto - I've read that elsewhere; don't worry, I am currently bouncing between four microSD cards, one for nouveau, one for radeon/amdgpu (just found out in geerlingguy/raspberry-pi-pcie-devices#4 that my Radeon is so old it needs the radeon driver), and one with the nvidia driver installed. I've also re-flashed Pi OS to these things probably 30 times this week.

@geerlingguy
Copy link
Author

geerlingguy commented Oct 23, 2020

I'm going to try one more time, on a fresh new OS install, to compile the kernel with nouveau on 32-bit Pi OS, and see if I can get it to boot. After that, I think I have to give up. Over in the issue linked in the comment above, I found the Radeon driver definitively looks for the IO BAR for BIOS support to initialize the card, and without it, it fatals and doesn't initialize the card :(

@geerlingguy
Copy link
Author

geerlingguy commented Oct 24, 2020

Didn't work. Same thing as last time, if I rebuild the kernel with the nouveau driver, the PCI bus just goes to 'link down' on boot, and then if I try botting with the Zotac card in the slot, it locks up after the first few seconds of boot.

So... going to go out on a limb and say at least for 32-bit Pi OS, the nouveau driver is a bust :(

@geerlingguy
Copy link
Author

geerlingguy commented Oct 24, 2020

I moved the instructions for increasing the BAR space out to its own gist, since it's also necessary for other GPUs, and even the Marvell SATA adapter I'm testing now: Increase the BAR memory address space for PCIe devices on the Raspberry Pi Compute Module 4.

@geerlingguy
Copy link
Author

geerlingguy commented Oct 26, 2020

In a strange turn of events, today I tried doing this again on Pi OS 64-bit beta and am running into an error from the Nvidia AARCH64 installer:

  LD [M]  /tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia-drm.o
  Building modules, stage 2.
  MODPOST 4 modules
ERROR: "__stack_chk_guard" [/tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia-drm.ko] undefined!
ERROR: "__stack_chk_guard" [/tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia-modeset.ko] undefined!
ERROR: "__stack_chk_guard" [/tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia-uvm.ko] undefined!
ERROR: "__stack_chk_guard" [/tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:94: __modpost] Error 1
make[1]: *** [Makefile:1645: modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.4.72-v8+'
make: *** [Makefile:81: modules] Error 2

How annoying.

@borancar
Copy link

borancar commented Oct 26, 2020

Think that should be easy with -fno-stack-protector

@kilograham
Copy link

kilograham commented Oct 27, 2020

Seeing errors like:

error: "NV_BUILD_MODULE_INSTANCES" is not defined, evaluates to 0 [-Werror=undef]

Maybe try adding NV_BUILD_MODULE_INSTANCES=1 to the end of your make ... modules ...

@deltabeard
Copy link

deltabeard commented Oct 28, 2020

This is exactly the sort of project that I've been thinking about, and your work is great!

At this point, I'm going to try to build a custom kernel with the Nouveau driver compiled in.

Consider cross-compiling the kernel and operating system with something like Buildroot as it may be faster than compiling the kernel on the Pi itself. Also, consider using the latest 5.10 Linux kernel at https://github.com/raspberrypi/linux/tree/rpi-5.10.y as it may have fixes that have not been backported to 5.4.

If you haven't use buildroot before, I can create a custom image with Linux 5.10 + nouveau for you to try if you would like?

Keep up the good work. 😄

@pgwipeout
Copy link

pgwipeout commented May 2, 2021

So I'm doing early bringup of a rk3566 development board on linux-next.
It has a PCIe controller with sufficient memory space to address the entire BAR region for a GTX645.

Nouveau probes the card successfully but locks up:

[   25.407754] nouveau 0000:01:00.0: DRM: core caps notifier timeout
[   25.751412] nouveau 0000:01:00.0: DRM: allocated 1920x1080 fb: 0xa0000, bo 00000000b6416e4c
[   25.972262] nouveau 0000:01:00.0: disp: chid 0 stat 00004780 reason 4 [INVALID_ARG] mthd 0780 data 04380780 code 00000080
[   25.973397] nouveau 0000:01:00.0: disp: chid 0 stat 1000778c reason 7 [UNRESOLVABLE_HANDLE] mthd 078c data 04650898 code 00000000
[   25.973682] nouveau 0000:01:00.0: disp: chid 0 stat 10007790 reason 7 [UNRESOLVABLE_HANDLE] mthd 0790 data 0004002b code 00000000
[   25.973727] nouveau 0000:01:00.0: disp: chid 0 stat 10004794 reason 4 [INVALID_ARG] mthd 0794 data 002800bf code 00000003
[   25.973773] nouveau 0000:01:00.0: disp: chid 0 stat 10004798 reason 4 [INVALID_ARG] mthd 0798 data 0460083f code 00000001
[   25.973816] nouveau 0000:01:00.0: disp: chid 0 stat 1000379c reason 3 [RESERVED_METHOD] mthd 079c data 00000001 code 00000000
[   25.973857] nouveau 0000:01:00.0: disp: chid 0 stat 100047a0 reason 4 [INVALID_ARG] mthd 07a0 data 0004042c code 00000004
[   25.973898] nouveau 0000:01:00.0: disp: chid 0 stat 100037a4 reason 3 [RESERVED_METHOD] mthd 07a4 data 00000000 code 00000000
[   25.973942] nouveau 0000:01:00.0: disp: chid 0 stat 100037a8 reason 3 [RESERVED_METHOD] mthd 07a8 data 000c0450 code 00000000
[   25.973983] nouveau 0000:01:00.0: disp: chid 0 stat 100037ac reason 3 [RESERVED_METHOD] mthd 07ac data 08d9ee20 code 00000000
[   25.974024] nouveau 0000:01:00.0: disp: chid 0 stat 100037b4 reason 3 [RESERVED_METHOD] mthd 07b4 data 08d9ee20 code 00000000
[   25.974198] nouveau 0000:01:00.0: disp: chid 0 stat 100037cc reason 3 [RESERVED_METHOD] mthd 07cc data 0000cf00 code 00000000
[   25.974249] nouveau 0000:01:00.0: disp: chid 0 stat 100047d0 reason 4 [INVALID_ARG] mthd 07d0 data f0000001 code 00000000
[   25.974293] nouveau 0000:01:00.0: disp: chid 0 stat 100047d4 reason 4 [INVALID_ARG] mthd 07d4 data 000404b0 code 0000000a
[   25.974335] nouveau 0000:01:00.0: disp: chid 0 stat 100037d8 reason 3 [RESERVED_METHOD] mthd 07d8 data 00000000 code 00000000
[   25.974375] nouveau 0000:01:00.0: disp: chid 0 stat 100037dc reason 3 [RESERVED_METHOD] mthd 07dc data 000404d0 code 00000000
[   25.974417] nouveau 0000:01:00.0: disp: chid 0 stat 100047e0 reason 4 [INVALID_ARG] mthd 07e0 data 00020301 code 00000000
[   25.974458] nouveau 0000:01:00.0: disp: chid 0 stat 100047f4 reason 4 [INVALID_ARG] mthd 07f4 data 00080404 code 00000000
[   25.974499] nouveau 0000:01:00.0: disp: chid 0 stat 100047fc reason 4 [INVALID_ARG] mthd 07fc data 31ec6000 code 00000000
[   25.974540] nouveau 0000:01:00.0: disp: chid 0 stat 10004804 reason 4 [INVALID_ARG] mthd 0804 data 80000000 code 00000000
[   25.974582] nouveau 0000:01:00.0: disp: chid 0 stat 10004808 reason 4 [INVALID_ARG] mthd 0808 data 00080080 code 00000000
[   27.972958] nouveau 0000:01:00.0: DRM: core notifier timeout

The 465 nvidia downstream driver won't build as is against Ubuntu 20.04 and linux-next.
After fixing the build issues, the module probes the card but once again locks up.
It also can't handle ACPI not existing, and eats itself in a hilarious way.
Enabling the modeset module leads to a soft lock, without it we get a null pointer error when X starts.

[  448.403536] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000001
[  448.403553] Mem abort info:
[  448.403555]   ESR = 0x96000044
[  448.403558]   EC = 0x25: DABT (current EL), IL = 32 bits
[  448.403562]   SET = 0, FnV = 0
[  448.403564]   EA = 0, S1PTW = 0
[  448.403566] Data abort info:
[  448.403568]   ISV = 0, ISS = 0x00000044
[  448.403570]   CM = 0, WnR = 1
[  448.403572] user pgtable: 4k pages, 48-bit VAs, pgdp=000000011464e000
[  448.403575] [0000000000000001] pgd=0000000000000000, p4d=0000000000000000
[  448.403585] Internal error: Oops: 96000044 [#1] SMP
[  448.403590] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c nfnetlink bridge stp llc nls_iso8859_1 nvidia_drm(POE) nvidia_modeset(POE) snd_hda_codec_hdmi snd_hda_intel snd_soc_simple_card snd_intel_dspcfg snd_soc_simple_card_utils snd_hda_codec snd_soc_core snd_hwdep snd_hda_core ac97_bus snd_pcm_dmaengine snd_pcm nvidia(POE) snd_seq_midi rockchipdrm snd_seq_midi_event aes_ce_blk crypto_simd snd_rawmidi cryptd snd_seq analogix_dp xhci_plat_hcd realtek aes_ce_cipher dwc3 dwmac_rk dw_mipi_dsi snd_seq_device crct10dif_ce snd_timer stmmac_platform ulpi ghash_ce dw_hdmi snd stmmac udc_core sha2_ce drm_kms_helper sha256_arm64 soundcore sha1_ce optee pwrseq_simple pcs_xpcs syscopyarea dwc3_of_simple phylink tee sysfillrect sysimgblt rtc_rk808 syscon_reboot_mode clk_rk808 fb_sys_fops reboot_mode cec uio_pdrv_genirq ohci_platform ehci_platform
[  448.403742]  uio sch_fq_codel drm bpf_preload ip_tables x_tables autofs4
[  448.403758] CPU: 2 PID: 1246 Comm: Xorg Tainted: P           OE     5.12.0-rc6-next-20210409+ #10
[  448.403764] Hardware name: Pine64 RK3566 Quartz64-A Board (DT)
[  448.403767] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--)
[  448.403772] pc : _nv012396rm+0x18/0x50 [nvidia]
[  448.406580] lr : _nv034665rm+0x30/0xc0 [nvidia]
[  448.409378] sp : ffff80001414b4d0
[  448.409382] x29: ffff80001414b4d0 x28: ffff00010fea7000
[  448.409391] x27: ffff000113296008 x26: ffff0001091f1408
[  448.409397] x25: ffff00010fea6808 x24: ffff000106d12008
[  448.409402] x23: 0000000000000000 x22: ffff8000097978e0
[  448.409408] x21: ffff00010fea6808 x20: 0000000000000001
[  448.409413] x19: 0000000000000001 x18: 0000000010000000
[  448.409418] x17: ffff800009654850 x16: ffff8000096549b8
[  448.409423] x15: ffff8000096547b8 x14: ffff800009653fd0
[  448.409428] x13: ffff00010fea7000 x12: 0000000000000020
[  448.409434] x11: ffff000101ddc080 x10: 0000000000000000
[  448.409439] x9 : ffff8000091df054 x8 : 0000000000000004
[  448.409444] x7 : 0000000001000000 x6 : 0000000000000000
[  448.409449] x5 : ffff800013f00000 x4 : 0000000000000000
[  448.409454] x3 : 0000000000000010 x2 : 0000000000000001
[  448.409459] x1 : 00000000000000fd x0 : 0000000000000000
[  448.409465] Call trace:
[  448.409469]  _nv012396rm+0x18/0x50 [nvidia]
[  448.412265]  _nv034665rm+0x30/0xc0 [nvidia]
[  448.415079]  _nv010214rm+0x1c/0xc0 [nvidia]
[  448.417888]  _nv031560rm+0x124/0x918 [nvidia]
[  448.420699]  _nv031593rm+0xe4/0x2b0 [nvidia]
[  448.423515]  _nv031559rm+0x60/0x258 [nvidia]
[  448.426325]  _nv032133rm+0x8c/0xb8 [nvidia]
[  448.429120]  _nv012315rm+0x50/0x6d8 [nvidia]
[  448.431912]  _nv022379rm+0xd0/0x1d0 [nvidia]
[  448.434704]  _nv022613rm+0x30/0x80 [nvidia]
[  448.437505]  _nv000664rm+0xec4/0x1ad8 [nvidia]
[  448.440300]  rm_init_adapter+0xb0/0xc0 [nvidia]
[  448.443100]  nv_open_device+0x434/0x700 [nvidia]
[  448.445899]  nvidia_open+0x100/0x3d4 [nvidia]
[  448.448700]  nvidia_frontend_open+0x74/0xc0 [nvidia]
[  448.451510]  chrdev_open+0xe8/0x2e4
[  448.451521]  do_dentry_open+0x134/0x3a0
[  448.451529]  vfs_open+0x34/0x40
[  448.451534]  path_openat+0x490/0xf80
[  448.451539]  do_filp_open+0x80/0x130
[  448.451544]  do_sys_openat2+0xbc/0x164
[  448.451550]  __arm64_sys_openat+0x6c/0xb4
[  448.451556]  el0_svc_common+0x74/0x1b0
[  448.451563]  do_el0_svc+0x30/0xa0
[  448.451568]  el0_svc+0x2c/0x70
[  448.451574]  el0_sync_handler+0xb0/0xb4
[  448.451578]  el0_sync+0x174/0x180
[  448.451588] Code: b4000102 b40001e1 39400021 52800000 (39000041)
[  448.451594] ---[ end trace a7c9eaaf0a58eff6 ]---

I don't have an IOMMU guarding the PCIe controller, but with vfio_pci in unsafe mode I attached it to QEMU running Tianocore with a qemu module loaded to execute option roms.
I got a partial (corrupt) display this way.
Loading linux from this stage in qemu leads to a panic in efi-stub due to the framebuffer being incorrect.
Dropping qemu and attempting to load nouveau it locks up at the same point as before.
Loading nvidia after dropping qemu leads to the same null pointer error.
Interestingly enough, if you let the nvidia driver load first, unload the nvidia driver, load vfio_pci, load qemu, the display is corrupt in a different way.
Without nvidia's interference, the display is dark grey with horizontal blue lines with red dots.
Text typed into the console was visible, but text displayed from the console is not.
With nvidia's interference, the display is bright white, with black boxes around the text typed into the console and text returned is white.

@geerlingguy
Copy link
Author

geerlingguy commented May 2, 2021

@pgwipeout - That's great debugging info — even though it's not explicitly Pi-related, do you think you could open an issue on the Pi PCI DB site I have and others could hopefully assist with the debugging? Sounds like similar issues: https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues

@mi-hol
Copy link

mi-hol commented Jul 19, 2021

Do today's Nvidia announcements (summary copied below) improve the situation at least for new graphics card?
"NVIDIA also used this first day of GDC week to release their 470.57.02 stable Linux driver as well as official DLSS SDK support for Linux.
The NVIDIA 470.57.02 Linux driver is out today as the first stable version in the NVIDIA 470 driver series. This carries forward the earlier beta changes around XWayland acceleration, new Vulkan extensions, and numerous other improvements."

"The RTXDI, NRD and RTXMU SDKs for Arm with Linux and Chromium are available now. RTXGI and DLSS will be coming soon. For more information, contact NVIDIA’s developer relations team or visit https://developer.nvidia.com."

@mi-hol
Copy link

mi-hol commented Jul 19, 2021

@FUIT1985
Copy link

FUIT1985 commented Sep 6, 2021

Hi unfortunately I have never tested a raspberry, but I gladly follow your tests. I don't know if the below link can help you since raspberry doesn't support grub. Do you remember the old Macbook Pro with nvidia video card and the problems with linux? In order to use nvidia's proprietary driver it was necessary to disable the integrated video card of the intel processor through the xorg.conf file and grub. I hope that even if this advice doesn't help, it encourages you in your research.

https://askubuntu.com/questions/716565/macbook-pro-efi-and-nvidia-setpci-ids

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment