Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Set up the Nvidia GeForce GT 710 on Raspberry Pi Compute Module 4
#!/bin/bash
# Attempt to set up the Nvidia GeForce GT 710 on a Pi CM4.
#
# I have tried both armv7l and aarch64 versions of the proprietary driver, in
# addition to the nouveau open source driver (which needs to be compiled into
# a custom Raspberry Pi kernel).
#
# tl;dr - None of the drivers worked :P
# First, expand the BAR space, following the directions in this gist:
# https://gist.github.com/geerlingguy/9d78ea34cab8e18d71ee5954417429df
#####
# Option A - Proprietary Driver
#####
# Install kernel-headers so kernel module can be built.
sudo apt-get update
sudo apt upgrade -y # if necessary
sudo reboot # if necessary
sudo apt install -y raspberrypi-kernel-headers
# Download driver from Nvidia's website.
# 32-bit: https://www.nvidia.com/en-us/drivers/unix/linux-arm-display-archive/
# wget https://us.download.nvidia.com/XFree86/Linux-x86-ARM/390.138/NVIDIA-Linux-armv7l-gnueabihf-390.138.run
# 64-bit: https://www.nvidia.com/en-us/drivers/unix/linux-aarch64-archive/
wget https://us.download.nvidia.com/XFree86/aarch64/455.28/NVIDIA-Linux-aarch64-455.28.run
# TODO: Any way to get the latest version and get the download URL in a script? Manual download is sooo annoying.
# (If running) stop X server.
sudo systemctl stop lightdm
# Run the driver .run file we just downloaded.
chmod +x ./NVIDIA-Linux-aarch64-455.28.run
sudo ./NVIDIA-Linux-aarch64-455.28.run
# For 32-bit: sudo ./NVIDIA-Linux-armv7l-gnueabihf-390.138.run --kernel-source-path /usr/src/linux-headers-5.4.51-v7l+
# Reboot and (sadly) see the card fail to initialize.
sudo reboot
#####
# Option B - compile nouveau module into custom Pi Kernel
#####
# Install dependencies
sudo apt install -y git bc bison flex libssl-dev make
# Clone source
git clone --depth=1 https://github.com/raspberrypi/linux
# Apply default configuration
cd linux
export KERNEL=kernel7l # use kernel8 for 64-bit, or kernel7l for 32-bit
make bcm2711_defconfig
# Customize the .config further with menuconfig
sudo apt install -y libncurses5-dev
make menuconfig
# (search for /nouveau, enable in the proper section, save, then exit)
nano .config
# (edit CONFIG_LOCALVERSION and add a suffix that helps you identify your build)
# Build the kernel and copy everything into place
make -j4 zImage modules dtbs # 'Image' on 64-bit
sudo make modules_install
sudo cp arch/arm/boot/dts/*.dtb /boot/
sudo cp arch/arm/boot/dts/overlays/*.dtb* /boot/overlays/
sudo cp arch/arm/boot/dts/overlays/README /boot/overlays/
sudo cp arch/arm/boot/zImage /boot/$KERNEL.img
# Reboot, but it locks up if you have the card in :(
sudo reboot
@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 21, 2020

This is an offshoot from the initial Raspberry Pi Compute Module 4 Review I posted Monday. By far the number one question was, "can you get an external GPU to work through PCIe," and so I did some research and found the Zotac GeForce GT 710 might be one of the best bets, since it's not too new, only utilizes x1 lane of bandwidth, yadda yadda.

Here it is plugged into the CM4 IO Board:

IMG_2501

And here's what I get with lspci -v after the Pi boots:

$ sudo lspci -v
00:00.0 PCI bridge: Broadcom Limited Device 2711 (rev 20) (prog-if 00 [Normal decode])
	Flags: fast devsel
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 00000000-00000fff
	Memory behind bridge: f8000000-f97fffff
	Capabilities: [48] Power Management version 3
	Capabilities: [ac] Express Root Port (Slot-), MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [180] Vendor Specific Information: ID=0000 Rev=0 Len=028 <?>
	Capabilities: [240] L1 PM Substates

01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710B] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ZOTAC International (MCO) Ltd. GK208B [GeForce GT 710]
	Flags: fast devsel
	Memory at 600000000 (32-bit, non-prefetchable) [disabled] [size=16M]
	Memory at <unassigned> (64-bit, prefetchable) [disabled]
	Memory at <unassigned> (64-bit, prefetchable) [disabled]
	I/O ports at <unassigned> [disabled]
	[virtual] Expansion ROM at 601000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>

01:00.1 Audio device: NVIDIA Corporation GK208 HDMI/DP Audio Controller (rev a1)
	Subsystem: ZOTAC International (MCO) Ltd. GK208 HDMI/DP Audio Controller
	Flags: fast devsel
	Memory at 601080000 (32-bit, non-prefetchable) [disabled] [size=16K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00

Module is being built but it seems it's not happy with the PCIe slot/bus on the Pi currently:

[  590.824049] nvidia: loading out-of-tree module taints kernel.
[  590.824072] nvidia: module license 'NVIDIA' taints kernel.
[  590.824076] Disabling lock debugging due to kernel taint
[  591.080381] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[  591.082088] pci 0000:00:00.0: enabling device (0000 -> 0002)
[  591.082105] nvidia 0000:01:00.0: enabling device (0000 -> 0002)
[  591.082117] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:01:00.0)
[  591.082121] NVRM: The system BIOS may have misconfigured your GPU.
[  591.082134] nvidia: probe of 0000:01:00.0 failed with error -1
[  591.082191] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  591.082195] NVRM: None of the NVIDIA devices were initialized.
[  591.083314] nvidia-nvlink: Unregistered the Nvlink Core, major device number 234

And earlier in the boot:

[    1.009621] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    1.009658] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[    1.009734] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
[    1.009806] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x00ffffffff -> 0x0100000000
[    1.059209] brcm-pcie fd500000.pcie: link up, 5 GT/s x1 (SSC)
[    1.059516] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[    1.059546] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.059575] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
[    1.059647] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[    1.059878] pci 0000:00:00.0: PME# supported from D0 D3hot
[    1.063487] pci 0000:00:00.0: bridge configuration invalid ([bus ff-ff]), reconfiguring
[    1.063697] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000
[    1.063785] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[    1.063838] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff 64bit pref]
[    1.063892] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref]
[    1.063933] pci 0000:01:00.0: reg 0x24: [io  0x0000-0x007f]
[    1.063972] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[    1.064203] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x1 link at 0000:00:00.0 (capable of 32.000 Gb/s with 5 GT/s x8 link)
[    1.064368] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    1.064461] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300
[    1.064534] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[    1.068210] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.068267] pci 0000:00:00.0: BAR 9: no space for [mem size 0x0c000000 64bit pref]
[    1.068297] pci 0000:00:00.0: BAR 9: failed to assign [mem size 0x0c000000 64bit pref]
[    1.068328] pci 0000:00:00.0: BAR 8: assigned [mem 0x600000000-0x6017fffff]
[    1.068363] pci 0000:01:00.0: BAR 1: no space for [mem size 0x08000000 64bit pref]
[    1.068391] pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x08000000 64bit pref]
[    1.068422] pci 0000:01:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[    1.068450] pci 0000:01:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[    1.068479] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600ffffff]
[    1.068514] pci 0000:01:00.0: BAR 6: assigned [mem 0x601000000-0x60107ffff pref]
[    1.068544] pci 0000:01:00.1: BAR 0: assigned [mem 0x601080000-0x601083fff]
[    1.068576] pci 0000:01:00.0: BAR 5: no space for [io  size 0x0080]
[    1.068600] pci 0000:01:00.0: BAR 5: failed to assign [io  size 0x0080]
[    1.068627] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.068658] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x6017fffff]
[    1.068807] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 21, 2020

Tried doing a rescan after trying to install the driver but still no luck:

# echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/remove
# echo 1 > /sys/bus/pci/rescan

(Then dmesg shows same progression of BAR 9, 1, 3, and 5 not having space.)

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

(Also leaving a note here that when building the driver on armv7l+ (on 32-bit Raspberry Pi OS), I was running into this error: https://pastebin.com/3X8CN4mU

   In file included from /tmp/selfgz1694/NVIDIA-Linux-armv7l-gnueabihf-390.138/kernel/nvidia/os-interface.c:15:
   /tmp/selfgz1694/NVIDIA-Linux-armv7l-gnueabihf-390.138/kernel/nvidia/os-interface.c: In function 'os_flush_cpu_write_combine_buffer':
   /tmp/selfgz1694/NVIDIA-Linux-armv7l-gnueabihf-390.138/kernel/common/inc/nv-linux.h:464:43: error: implicit declaration of function 'outer_sync'; did you mean 'outer_resume'? [-Werror=implicit-function-declaration]
    #define WRITE_COMBINE_FLUSH()    { dsb(); outer_sync(); }
                                              ^~~~~~~~~~
   /tmp/selfgz1694/NVIDIA-Linux-armv7l-gnueabihf-390.138/kernel/nvidia/os-interface.c:946:5: note: in expansion of macro 'WRITE_COMBINE_FLUSH'
        WRITE_COMBINE_FLUSH();
        ^~~~~~~~~~~~~~~~~~~
     CC [M]  /tmp/selfgz1694/NVIDIA-Linux-armv7l-gnueabihf-390.138/kernel/nvidia/os-registry.o
     CC [M]  /tmp/selfgz1694/NVIDIA-Linux-armv7l-gnueabihf-390.138/kernel/nvidia/os-usermap.o
   cc1: some warnings being treated as errors
@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

I might look into the nouveau driver instead...

Or also see if the drivers in Debian's non-free repo work: https://wiki.debian.org/NvidiaGraphicsDrivers

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

But before I go too deep down that rabbit hole... it seems like the aarch64 driver was building well and should work, but for the lack of PCIe support for the graphics card on this little Pi.

I was searching around for anyone having experience with GPUs on ARM boards, and found good comments from floatboth and cesarb on HN:

UPD: yeah, someone in the thread mentioned BAR space issues wrt NXP i.MX SoCs, that's probably what's happening on Rockchip. Would be amazing if the Broadcom chip in the Pi turns out to be the one with enough BAR space! :D

In PCI, BAR is Base Address Register, which is a register in the PCI device's configuration space which defines where in the machine's physical memory address space that particular window of memory and/or I/O will be mapped (a single device can have several BARs, for instance a simple graphics card could have one for its control registers and one for the framebuffer). So the "BAR space" would be a shorthand for "the region of the physical memory address space which can be used to map the PCI devices memory through their Base Address Registers". The size of this region is limited, and graphics cards in particular tend to have somewhat large BARs.
(See for yourself in your machine: run "lspci -v", the lines starting with "Memory at ..." or "I/O ports at ..." are the BARs.)

And also from mntmn:

You might run into address space issues. I haven’t checked Broadcom PCIe documentation for RPi4 (is there any?), but I tried a very similar hack with i.MX6 and older AMD and nVidia cards. They get recognized fine, but BARs cannot be mapped because they don’t fit in i.MX6’s tiny 16MB PCIe space.

I need to figure out if/how to fix the issue of the lack of BAR space on the Pi. Can it be done?

Gartral on Hackaday says no way:

That will never ever happen, the cpu in the Pi4 doesn’t have enough BAR space. Maybe they’ll fix that with a refresh or the Pi5!

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Similar error on the NXP LS1012A-RDB development board which has a Cortex A53: ARM64 Unable to load module amdgpu on LS1012A-RDB board.

A shot in the dark, but I'm going to do a dist-upgrade and reboot and see if anything's different. I realized I'm running the image straight from 2020-08-20, which is a bit behind. (Edit: No difference.)

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Could be reading this all wrong, but a quick translation:

BAR 9 size 0x0c000000 ≈ 200 MB (fails)
BAR 1 size 0x08000000 ≈ 128 MB (fails)
BAR 3 size 0x02000000 ≈ 32 MB (fails)
BAR 5 size 0x0080 ≈ 128 bytes (fails with `no space for [io  size 0x0080]`)
BAR 8 ≈ 25 MB (succeeds)
BAR 0 ≈ 16 MB (succeeds)
BAR 6 ≈ 512 KB (succeeds)

Also nothing is output over VGA either (I have VGA and HDMI displays but no DVI displays or adapters that can take DVI and put it into anything useful.

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Also: https://twitter.com/domipheus/status/1168448167832117248?ref_src=twsrc%5Etfw

Quick update: default config doesn't have enough pcie BAR space to map a GPU; however, RaspberryPi engineers have said it may be possible with some custom configuration. I'll give those experiments a try next. No promises of course! Many thanks for the input,
@ebenupton!

So maybe there is hope, after all?

Elsewhere in parts of that thread, I found a reference to increasing the MEM size in the Pi's [Device Tree](https://www.raspberrypi.org/documentation/configuration/device-tree.

Edit: also found where the Broadcom PCIe device tree patch seems to have been discussed: https://lore.kernel.org/linux-arm-kernel/20200115234112.30746-1-f.fainelli@gmail.com/T/ — and the docs for it in the kernel: https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Also posted about this on the Pi forums here: https://www.raspberrypi.org/forums/viewtopic.php?f=98&t=288902

@borancar

This comment has been minimized.

Copy link

@borancar borancar commented Oct 22, 2020

Might need some early OptionROM setup. I've first heard about at a KVM Forum - https://www.youtube.com/watch?v=uxvAH1Q4Mx0

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Thanks to PhilE's help on the Pi Forums, it seems I can recompile the device tree to get enough BAR space:

# Go into the /boot directory.
cd /boot

# Back up current Device Tree for the CM4.
sudo cp bcm2711-rpi-cm4.dtb bcm2711-rpi-cm4.dtb.bak

# Decompile the current Device Tree to a dts (source) file.
dtc -I dtb -O dts ./bcm2711-rpi-cm4.dtb -o ~/test.dts

# Edit the file and change the PCIe bus range as mentioned in:
# https://www.raspberrypi.org/forums/viewtopic.php?p=1746665#p1746665
nano ~/test.dts

# Recompile the Device Tree from the dts (source) file.
dtc -I dts -O dtb ~/test.dts -o ~/test.dtb

# Copy the new Device Tree into place.
# (Make sure it is owned by root and executable after the mv).
sudo rm ./bcm2711-rpi-cm4.dtb
sudo mv ~/test.dtb ./bcm2711-rpi-cm4.dtb

# Reboot.
sudo reboot

And now, when the card is loaded in during boot:

[    1.068031] pci 0000:00:00.0: BAR 9: assigned [mem 0x600000000-0x60bffffff 64bit pref]
[    1.068063] pci 0000:00:00.0: BAR 8: assigned [mem 0x60c000000-0x60d7fffff]
[    1.068097] pci 0000:01:00.0: BAR 1: assigned [mem 0x600000000-0x607ffffff 64bit pref]
[    1.068151] pci 0000:01:00.0: BAR 3: assigned [mem 0x608000000-0x609ffffff 64bit pref]
[    1.068201] pci 0000:01:00.0: BAR 0: assigned [mem 0x60c000000-0x60cffffff]
[    1.068235] pci 0000:01:00.0: BAR 6: assigned [mem 0x60d000000-0x60d07ffff pref]
[    1.068265] pci 0000:01:00.1: BAR 0: assigned [mem 0x60d080000-0x60d083fff]
[    1.068297] pci 0000:01:00.0: BAR 5: no space for [io  size 0x0080]
[    1.068322] pci 0000:01:00.0: BAR 5: failed to assign [io  size 0x0080]

Hopefully BAR 5 doesn't store anything that useful :D

When re-installing the driver again, now, I do get this error (related to X window system, so not too pertinent to my current needs):

WARNING: nvidia-installer was forced to guess the X library path '/usr/lib' and X module path '/usr/lib/xorg/modules';       
           these paths were not queryable from the system.  If X fails to find the NVIDIA X driver module, please install the  
           `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.

But... it installed correctly, supposedly, so I'm going to reboot and see where that leads!

Screen Shot 2020-10-22 at 9 56 34 AM

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Promising:

[    4.635500] nvidia: loading out-of-tree module taints kernel.
[    4.635551] nvidia: module license 'NVIDIA' taints kernel.
[    4.635560] Disabling lock debugging due to kernel taint
[    5.217308] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[    5.220347] pci 0000:00:00.0: enabling device (0000 -> 0002)
[    5.220383] nvidia 0000:01:00.0: enabling device (0000 -> 0002)
[    5.220453] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    5.340502] NVRM: loading NVIDIA UNIX aarch64 Kernel Module  455.28  Wed Sep 30 01:40:15 UTC 2020
[    5.422630] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  455.28  Wed Sep 30 01:16:42 UTC 2020
[    5.434974] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    5.434992] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 2
@gerhardqux

This comment has been minimized.

Copy link

@gerhardqux gerhardqux commented Oct 22, 2020

Maybe some BAR mapping luck with an UEFI firmware.

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

So... I need to go deeper, but realized I'm using the 64-bit Pi OS 'lite' version, and it doesn't have a window system, so some of the utilities seem to not be able to work with the video card or not let it do things like identify attached displays:

$ sudo apt install -y mesa-utils
$ glxinfo -B
Error: unable to open display

So I'm going to re-flash the drive with the full 64-bit beta OS and then re-do everything above on it (yay for actually documenting things!) and then see if I can get the HDMI port to output something. It's still blank. Is there a way to get the Pi itself to use an external display connector instead of the HDMI0 or HDMI1 ports?

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Hmm... now when I try to boot into the GUI, I see this in dmesg:

[    7.382976] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[    7.383054] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    7.653632] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[    7.653729] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    8.320717] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[    8.320815] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    8.570981] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[    8.571066] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    9.221865] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[    9.221957] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    9.471713] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[    9.471802] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[    9.955252] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[    9.955333] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   10.206025] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[   10.206109] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   10.856503] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[   10.856600] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   11.106188] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[   11.106280] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

If I try running sudo nvidia-persistenced, I get the following in syslog:

Oct 22 17:33:38 raspberrypi nvidia-persistenced: Started (746)
Oct 22 17:33:39 raspberrypi kernel: [  452.994785] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
Oct 22 17:33:39 raspberrypi kernel: [  452.994904] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Oct 22 17:33:39 raspberrypi nvidia-persistenced: device 0000:01:00.0 - failed to open.
@borancar

This comment has been minimized.

Copy link

@borancar borancar commented Oct 22, 2020

Do you think it's wortwhile trying the OptionROM initialization? A GPU when powering on first displays its own BIOS, then starts displaying messages coming from the system (which on Raspberry Pi wouldn't happen as it does nothing with the x86 VIDEO RAM address 0xA0.../0xB8..., but I think you would still see the GPU posting its BIOS with Option ROM working)

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

@borancar - In this case, I'm getting no output whatsoever on the HDMI port from the Zotac card, nor on the VGA :(

It would be interesting to see if that works on the Pi (I don't have time currently to watch that whole video, but is there some more information / a blog post I could look at separately?

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

After seeing a few notes here and there about PCIe ASPM being enabled on the Pi and causing issues, I added pcie_aspm=off to /boot/cmdline.txt to see if that would make any difference... (I confirmed the setting was picked up in the kernel command, and with sudo lspci -vv | grep ASPM—it was disabled on the GPU).

@borancar

This comment has been minimized.

Copy link

@borancar borancar commented Oct 22, 2020

tl;dr If my suspicion is right, there is some code on that GPU written in x86 assembly that does some initial configuration and that's the GPU's BIOS. Since RPi can't execute x86 assembly it can't init the card fully so you need QEMU execute that code and init the card.

I managed to track the slides on slideshare - https://www.slideshare.net/linaroorg/hkg18505-qemu-in-uefi. Specifically, one of the examples from the presentation was a GPU on an ARM server. https://github.com/ardbiesheuvel/X86EmulatorPkg is the github repo referenced there, but that deals with the full integration into ARM UEFI, so you might need more modifications, or, alternatively, there is this to marry U-Boot, UEFI and grub - https://web.archive.org/web/20180404183425if_/http://schd.ws/hosted_files/openiotelcna2017/c4/Marrying%20U-Boot%2C%20UEFI%20and%20grub.pdf.

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Ah... when the X server tries starting, it runs into an error:

[    11.959] (II) LoadModule: "nvidia"
[    11.959] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[    11.960] (II) Module nvidia: vendor="NVIDIA Corporation"
[    11.960] 	compiled for 1.6.99.901, module version = 1.0.0
[    11.960] 	Module class: X.Org Video Driver
[    11.960] (II) NVIDIA dlloader X Driver  455.28  Wed Sep 30 00:57:48 UTC 2020
[    11.960] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[    11.961] (II) Loading sub module "fb"
[    11.961] (II) LoadModule: "fb"
[    11.961] (II) Loading /usr/lib/xorg/modules/libfb.so
[    11.961] (II) Module fb: vendor="X.Org Foundation"
[    11.961] 	compiled for 1.20.4, module version = 1.0.0
[    11.961] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    11.961] (II) Loading sub module "wfb"
[    11.961] (II) LoadModule: "wfb"
[    11.961] (II) Loading /usr/lib/xorg/modules/libwfb.so
[    11.962] (II) Module wfb: vendor="X.Org Foundation"
[    11.962] 	compiled for 1.20.4, module version = 1.0.0
[    11.962] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    11.962] (II) Loading sub module "ramdac"
[    11.962] (II) LoadModule: "ramdac"
[    11.962] (II) Module "ramdac" already built-in
[    11.963] (II) NVIDIA(0): nvCommonPlatformProbe: Device is NULL
[    11.963] (II) NVIDIA(0): nvCommonPlatformProbe: Device is NULL
[    11.963] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[    11.963] (==) NVIDIA(0): RGB weight 888
[    11.963] (==) NVIDIA(0): Default visual is TrueColor
[    11.963] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[    11.963] (**) NVIDIA(0): Enabling 2D acceleration
[    11.963] (II) Loading sub module "glxserver_nvidia"
[    11.963] (II) LoadModule: "glxserver_nvidia"
[    11.963] (II) Loading /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
[    11.974] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[    11.974] 	compiled for 1.6.99.901, module version = 1.0.0
[    11.974] 	Module class: X.Org Server Extension
[    11.974] (II) NVIDIA GLX Module  455.28  Wed Sep 30 01:00:55 UTC 2020
[    11.974] (II) NVIDIA: The X server does not support PRIME Render Offload.
[    12.139] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0.  Please
[    12.139] (EE) NVIDIA(GPU-0):     check your system's kernel log for additional error
[    12.139] (EE) NVIDIA(GPU-0):     messages and refer to Chapter 8: Common Problems in the
[    12.139] (EE) NVIDIA(GPU-0):     README for additional information.
[    12.139] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
[    12.139] (EE) NVIDIA(0): Failing initialization of X screen
[    12.139] (II) UnloadModule: "nvidia"
[    12.140] (II) UnloadSubModule: "glxserver_nvidia"
[    12.140] (II) Unloading glxserver_nvidia
[    12.140] (II) UnloadSubModule: "wfb"
[    12.140] (II) UnloadSubModule: "fb"
[    12.140] (EE) Screen(s) found, but none have a usable configuration.
[    12.140] (EE) 
Fatal server error:
[    12.140] (EE) no screens found(EE) 
[    12.140] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[    12.140] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[    12.140] (EE) 
[    12.190] (EE) Server terminated with error (1). Closing log file.
@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Output from running sudo nvidia-bug-report.sh: https://gist.github.com/geerlingguy/895c56d2a2e3788ff932b9959c128b5c

@elFarto

This comment has been minimized.

Copy link

@elFarto elFarto commented Oct 22, 2020

Looks like that error has been reported before[1], on an aarch64 system too. So it might be an actual bug in the driver.

[1] https://forums.developer.nvidia.com/t/gtx-1080-drivers-fail-to-load-with-nvrm-gpu-000400-0-rminitadapter-failed-0x251211/156902/1

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

@elFarto (lol nice name) - I posted in that issue on the Nvidia forums... we'll see if it gets anywhere. At this point, I'm kinda tempted to go to Micro Center and try an inexpensive AMD GPU and see if the experience is better (everyone online is saying their drivers included in the kernel should work so much better... but I wonder if they have good ARM support?).

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Testing CUDA support (install took a while, and the .run file is like 2.5 GB!):

$ wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux_sbsa.run
$ chmod +x cuda_11.1.0_455.23.05_linux_sbsa.run 
$ sudo ./cuda_11.1.0_455.23.05_linux_sbsa.run 
===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-11.1/
Samples:  Installed in /home/pi/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-11.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.1/lib64, or, add /usr/local/cuda-11.1/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.1/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log

$ export PATH=$PATH:/usr/local/cuda-11.1/bin
$ export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64
@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

And running some samples:

$ cd ~/NVIDIA_CUDA-11.1_Samples/1_Utilities/deviceQuery
$ make
$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
-> no CUDA-capable device is detected
Result = FAIL

Dangit. At the same time, over in dmesg:

[ 9195.790222] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x54:1211)
[ 9195.790310] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

But I'm wondering if @borancar's idea might help, if it truly is an initialization bug with the board. It would be nice, though, if the AARCH64 driver would work without having some sort of extra emulation layer on top. I wonder what ARM devices they use for testing at Nvidia HQ?

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 22, 2020

Well there's also this:

According to the internet, there seem to have been multiple GPU models sold under that name: one had compute capability 2.x and the other had compute capability 3.0. Neither are supported by CUDA 11 which requires compute capability >= 3.5.

So apparently even without that failure, the GT 710 may not work with CUDA 11, le sigh. It isn't included in the current version of the CUDA GPU list either. But it does have '192 CUDA cores' and some databases say it's running 3.5, so ¯_(ツ)_/¯

For now, I'm going to switch gears and test out a cheap (but x16, so not drop-in compatible) Radeon 5450! Separate thread for that: geerlingguy/raspberry-pi-pcie-devices#4

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 23, 2020

Since I was closer to success with the Zotac card than I got with the Radeon (see linked thread above), I'm going to try it again, this time with the https://nouveau.freedesktop.org driver.

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 23, 2020

Hmm... looking at the install instructions it seems like everyone assumes the nouveau driver is already present on Debian—but I don't see it loaded on Raspberry Pi (lsmod doesn't list it, and I don't see it available in the output of find /lib/modules/$(uname -r) -type f -name '*.ko').

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 23, 2020

I was going to look into rebuilding the kernel but now I'm hitting Could not connect to raspbian.raspberrypi.org:80 (93.93.128.193) and I think that's a sign it's time to stop... for at least a few minutes ;)

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 23, 2020

I'm going to try a few of the older 32-bit versions... maybe the compilation bug was introduced recently. Trying the following:

  • 390.77: lots more errors
  • 390.48: lots more errors

Seeing errors like:

error: redefinition of ‘list_is_first’
error: "NV_BUILD_MODULE_INSTANCES" is not defined, evaluates to 0 [-Werror=undef]

I opened a new issue in the Nvidia forums here: Can’t install ARM (32-bit) driver on Debian 10 / Raspberry Pi OS

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 23, 2020

At this point, I'm going to try to build a custom kernel with the Nouveau driver compiled in. For some silly reason, the Pi group decided they didn't need that in the default distribution... why would anyone want to try using an external GPU with the Pi!?

Anyways, sarcasm aside, I'm running:

# Install dependencies
sudo apt install -y git bc bison flex libssl-dev make libncurses5-dev

# Clone source
git clone --depth=1 https://github.com/raspberrypi/linux

# Apply default configuration
cd linux
export KERNEL=kernel7l # use kernel8 for 64-bit, or kernel7l for 32-bit
make bcm2711_defconfig

# Customize the .config further with menuconfig
make menuconfig
# (search for /nouveau, enable in the proper section, save, then exit)
nano .config
# (edit CONFIG_LOCALVERSION and add a suffix that helps you identify your build)

# Build the kernel and copy everything into place
make -j4 zImage modules dtbs # 'Image' on 64-bit
sudo make modules_install
sudo cp arch/arm/boot/dts/*.dtb /boot/
sudo cp arch/arm/boot/dts/overlays/*.dtb* /boot/overlays/
sudo cp arch/arm/boot/dts/overlays/README /boot/overlays/
sudo cp arch/arm/boot/zImage /boot/$KERNEL.img
@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 23, 2020

One other thing I'm going to try (after two unsuccessful kernel builds, ha!) is Ubuntu 20.04... maybe it has Nouveau installed by default.

Some notes:

  • Ubuntu's /boot/firmware doesn't have a CM4 dtb, just bcm2711-rpi-4-b.dtb.
  • Ubuntu does an unattended upgrade on first boot, and that takes a looooong time.
  • Aaaand looks like it doesn't have nouveau or nvidia driver in 5.4.0-1015-raspi kernel modules either. Drat.
@elFarto

This comment has been minimized.

Copy link

@elFarto elFarto commented Oct 23, 2020

Just a note, the nvidia and nouveau drivers don't get along well together. If you're using the same install, you'll need to remove the modprobe blacklist the nvidia drivers install to stop the nouveau driver snatching up the hardware. You'll likely want to blacklist the nvidia ones when using the nouveau ones, or use a separate install.

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 23, 2020

@elFarto - I've read that elsewhere; don't worry, I am currently bouncing between four microSD cards, one for nouveau, one for radeon/amdgpu (just found out in geerlingguy/raspberry-pi-pcie-devices#4 that my Radeon is so old it needs the radeon driver), and one with the nvidia driver installed. I've also re-flashed Pi OS to these things probably 30 times this week.

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 23, 2020

I'm going to try one more time, on a fresh new OS install, to compile the kernel with nouveau on 32-bit Pi OS, and see if I can get it to boot. After that, I think I have to give up. Over in the issue linked in the comment above, I found the Radeon driver definitively looks for the IO BAR for BIOS support to initialize the card, and without it, it fatals and doesn't initialize the card :(

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 24, 2020

Didn't work. Same thing as last time, if I rebuild the kernel with the nouveau driver, the PCI bus just goes to 'link down' on boot, and then if I try botting with the Zotac card in the slot, it locks up after the first few seconds of boot.

So... going to go out on a limb and say at least for 32-bit Pi OS, the nouveau driver is a bust :(

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 24, 2020

I moved the instructions for increasing the BAR space out to its own gist, since it's also necessary for other GPUs, and even the Marvell SATA adapter I'm testing now: Increase the BAR memory address space for PCIe devices on the Raspberry Pi Compute Module 4.

@geerlingguy

This comment has been minimized.

Copy link
Owner Author

@geerlingguy geerlingguy commented Oct 26, 2020

In a strange turn of events, today I tried doing this again on Pi OS 64-bit beta and am running into an error from the Nvidia AARCH64 installer:

  LD [M]  /tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia-drm.o
  Building modules, stage 2.
  MODPOST 4 modules
ERROR: "__stack_chk_guard" [/tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia-drm.ko] undefined!
ERROR: "__stack_chk_guard" [/tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia-modeset.ko] undefined!
ERROR: "__stack_chk_guard" [/tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia-uvm.ko] undefined!
ERROR: "__stack_chk_guard" [/tmp/selfgz618/NVIDIA-Linux-aarch64-455.28/kernel/nvidia.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:94: __modpost] Error 1
make[1]: *** [Makefile:1645: modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.4.72-v8+'
make: *** [Makefile:81: modules] Error 2

How annoying.

@borancar

This comment has been minimized.

Copy link

@borancar borancar commented Oct 26, 2020

Think that should be easy with -fno-stack-protector

@kilograham

This comment has been minimized.

Copy link

@kilograham kilograham commented Oct 27, 2020

Seeing errors like:

error: "NV_BUILD_MODULE_INSTANCES" is not defined, evaluates to 0 [-Werror=undef]

Maybe try adding NV_BUILD_MODULE_INSTANCES=1 to the end of your make ... modules ...

@deltabeard

This comment has been minimized.

Copy link

@deltabeard deltabeard commented Oct 28, 2020

This is exactly the sort of project that I've been thinking about, and your work is great!

At this point, I'm going to try to build a custom kernel with the Nouveau driver compiled in.

Consider cross-compiling the kernel and operating system with something like Buildroot as it may be faster than compiling the kernel on the Pi itself. Also, consider using the latest 5.10 Linux kernel at https://github.com/raspberrypi/linux/tree/rpi-5.10.y as it may have fixes that have not been backported to 5.4.

If you haven't use buildroot before, I can create a custom image with Linux 5.10 + nouveau for you to try if you would like?

Keep up the good work. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment