Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
patch -p0 < fix-vega-reset.patch
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44c4ae1abd00..27840129e4b0 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3433,6 +3433,14 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset);
*/
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);
+/*
+ * Radeon RX Vega and Navi devices break on bus reset. Oi...
+ * This is *not a real workaround* - disabling bus reset
+ * for your GPU may have unintended consequences.
+ */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0x687f, quirk_no_bus_reset);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0xaaf8, quirk_no_bus_reset);
+
static void quirk_no_pm_reset(struct pci_dev *dev)
{
/*
@gnif

This comment has been minimized.

Copy link

@gnif gnif commented Aug 23, 2018

Awesome, but please do not upstream this patch. I am working with AMD to produce a proper reset of the device also as a PCI quirk.

@kiljacken

This comment has been minimized.

Copy link

@kiljacken kiljacken commented Aug 23, 2018

@gnif Anywhere to follow the progress on/get updates about that?

@numinit

This comment has been minimized.

Copy link
Owner Author

@numinit numinit commented Sep 3, 2018

@gnif Excellent. I have been following your work on that, and that was the plan.

@dialed

This comment has been minimized.

Copy link

@dialed dialed commented Oct 4, 2018

@gnif, thanks for this, I hope it works for my vega 64 fe, but I'm afraid my question might make me unqualified to use the patch. On r/VFIO you state, "This should work for 1002:687f and 1002:aaf8. If your IDs are different, you'll have to edit the patch." Question, where in the patch would I edit my device ID's? I don't see currently where it states 1002:687f or 1002:aaf8. If my IDs are different what would I remove and what would I put in its place? Would you mind telling me in such a way as to say something like," On line number 2, after the word index swap the numbers 10684b17d0bd for whatever the numbers of your device are?
Thanks again!

@lemrouch

This comment has been minimized.

Copy link

@lemrouch lemrouch commented Oct 20, 2018

@gnif, thanks for this, I hope it works for my vega 64 fe, but I'm afraid my question might make me unqualified to use the patch. On r/VFIO you state, "This should work for 1002:687f and 1002:aaf8. If your IDs are different, you'll have to edit the patch." Question, where in the patch would I edit my device ID's? I don't see currently where it states 1002:687f or 1002:aaf8. If my IDs are different what would I remove and what would I put in its place? Would you mind telling me in such a way as to say something like," On line number 2, after the word index swap the numbers 10684b17d0bd for whatever the numbers of your device are?
Thanks again!

1002 is hidden behind the PCI_VENDOR_ID_ATI. This has to stay in place.
Change just the 0x687f and/or 0xaaf8. 0x prefix means it's hexadecimal number.

@slade87

This comment has been minimized.

Copy link

@slade87 slade87 commented Jul 17, 2019

@gnif Why can't we just upstream this while the other patch is being worked on? This fixes a serious issue even if it's just working around the problem. Once the other patch is ready this one can easily be reverted.

@gnif

This comment has been minimized.

Copy link

@gnif gnif commented Jul 17, 2019

Because this issue exists in all modern Radeon cards and if we just implement a quirk AMD will just keep releasing broken cards.

@gnif

This comment has been minimized.

@numinit

This comment has been minimized.

Copy link
Owner Author

@numinit numinit commented Jul 18, 2019

Please don't upstream my terrible patch. :-(

@numinit

This comment has been minimized.

Copy link
Owner Author

@numinit numinit commented Jul 18, 2019

Unless there's a really good reason to work around broken hardware (i.e. it's non-fixable even in the VBIOS), this problem is AMD's to solve.

@slade87

This comment has been minimized.

Copy link

@slade87 slade87 commented Jul 18, 2019

So the alternative is to wait for something that might never happen?

@gnif

This comment has been minimized.

Copy link

@gnif gnif commented Jul 18, 2019

To include this is to promote bad behaviour on AMD's part. Not only that, this is not a fix, if your guest VM crashes or fails to shutdown, or the guest AMDGPU driver crashes (which happens often) or your physical bios posts the AMD GPU before you can post it inside your VM, this patch does nothing.

@numinit

This comment has been minimized.

Copy link
Owner Author

@numinit numinit commented Jul 18, 2019

Renamed the patch file to hopefully be more clear that this is not a real workaround. Sure, it works for some people, hopefully people get mileage out of it, but bus reset itself being broken is a bad problem that needs to be fixed.

@gnif

This comment has been minimized.

@slade87

This comment has been minimized.

Copy link

@slade87 slade87 commented Jul 29, 2019

Incredible work thank you. I'll test this out on my Vega 56 on the weekend!

@pdc4444

This comment has been minimized.

Copy link

@pdc4444 pdc4444 commented Aug 7, 2019

FYI: I'm new to kernel patching so this might just be my inexperience, but the current patch fails when running the patch command on Ubuntu 18.04 using Kernel 5.2.7.

I ended up grabbing the code from your initial commit which worked.

peter@ElephantBox:~/Downloads/linux-5.2.7$ patch -p1 < ~/Downloads/patch_for_vega/fix-vega-reset.patch 
patching file drivers/pci/quirks.c
patch: **** malformed patch at line 18:  {

peter@ElephantBox:~/Downloads/linux-5.2.7$ nano ~/Downloads/patch_for_vega/fix-vega-reset.patch 
peter@ElephantBox:~/Downloads/linux-5.2.7$ patch -p1 < ~/Downloads/patch_for_vega/fix-vega-reset.patch 
patching file drivers/pci/quirks.c
Hunk #1 succeeded at 3433 with fuzz 1 (offset 60 lines).
peter@ElephantBox:~/Downloads/linux-5.2.7$ 
@stefanleh

This comment has been minimized.

Copy link

@stefanleh stefanleh commented Sep 19, 2019

Is it correct to just add

/*
 * Radeon RX Vega and Navi devices break on bus reset. Oi...
 * This is *not a real workaround* - disabling bus reset
 * for your GPU may have unintended consequences.
 */
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0x687f, quirk_no_bus_reset);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0xaaf8, quirk_no_bus_reset);

after

DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_no_bus_reset);

?

(I've got Kernel 5.2.13 so the patch wont apply in its current state.)

@gnif

This comment has been minimized.

Copy link

@gnif gnif commented Sep 19, 2019

yes

@aiberia

This comment has been minimized.

Copy link

@aiberia aiberia commented Sep 20, 2019

@numinit This patch has invalid syntax since you added two comment lines without updating the line count (12->14 on the @@ line). Here is a fixed version (diffed from 5.3.0): https://gist.github.com/aiberia/dee39e883defbcb430994c2abc7d9fff

@numinit

This comment has been minimized.

Copy link
Owner Author

@numinit numinit commented Sep 22, 2019

@aiberia Thank you, fixed it.

@methanoid

This comment has been minimized.

Copy link

@methanoid methanoid commented Dec 12, 2019

Awesome, but please do not upstream this patch. I am working with AMD to produce a proper reset of the device also as a PCI quirk.

This is unfixed a year or more later. Who should we be chasing at AMD? Are they working with you on a proper fix?

@salcin

This comment has been minimized.

Copy link

@salcin salcin commented Dec 15, 2019

Hi,

Is this patch still relevant? I have compiled a kernel 5.4.3 with this patch and my system even doesn't detect anymore the graphic card.

On the standard kernel 5.3.0-3 (without patch), Winows boot but i get frequently a error "pci header type '127' for device" when i try to reboot my VM.

I have to disconnect the power wire of my pc before to reboot

@ImreBrassai

This comment has been minimized.

Copy link

@ImreBrassai ImreBrassai commented Dec 26, 2019

this isnt working for me, it keeps asking for 

[imre@localhost Desktop]$ patch -p0 < fix-vega-reset.patch
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
|index 44c4ae1abd00..27840129e4b0 100644
|--- a/drivers/pci/quirks.c
|+++ b/drivers/pci/quirks.c
--------------------------
File to patch: 

@gnif

This comment has been minimized.

Copy link

@gnif gnif commented Dec 26, 2019

@ImreBrassai patch -p0 != patch -p1

@ImreBrassai

This comment has been minimized.

Copy link

@ImreBrassai ImreBrassai commented Dec 27, 2019

what

@ImreBrassai

This comment has been minimized.

Copy link

@ImreBrassai ImreBrassai commented Dec 27, 2019

[imre@localhost ~]$ patch -p1 < vega.patch 
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
|index 44c4ae1abd00..27840129e4b0 100644
|--- a/drivers/pci/quirks.c
|+++ b/drivers/pci/quirks.c
--------------------------
File to patch: 
@gnif

This comment has been minimized.

Copy link

@gnif gnif commented Dec 27, 2019

You need to patch the kernel source and recompile, you don't just run the command provided...

Please stop bolding your text too, just use the insert code button.

@ImreBrassai

This comment has been minimized.

Copy link

@ImreBrassai ImreBrassai commented Dec 28, 2019

ok sorry about that, it bolds automatically dont know why

@ImreBrassai

This comment has been minimized.

Copy link

@ImreBrassai ImreBrassai commented Dec 29, 2019

so i did what you said, i downloaded the kernel and patched it, i installed the kernel, and when i run your script to test if it worked it gives me this

[imre@localhost linux-5.4.1]$ ~/Downloads/reset-test 0000:0a:00.0
============================================================================

AMD Vega 10/12 Reset Application (Version: 1.0)
Copyright (c) 2019 Geoffrey McRae <geoff@hostfission.com>

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

This tool is intended as an interim workaround while I port this into the
kernel driver. If you like my work and want to support it you can contribute
using the following methods:

* Ko-Fi   - https://ko-fi.com/lookingglass
* Patreon - https://www.patreon.com/gnif
* BTC     - 14ZFcYjsKPiVreHqcaekvHGL846u3ZuT13

============================================================================

Unsupported device 1002:731f

@gnif

This comment has been minimized.

Copy link

@gnif gnif commented Dec 30, 2019

@ImreBrassai

This comment has been minimized.

Copy link

@ImreBrassai ImreBrassai commented Dec 30, 2019

OH! i see now

@NateTheGreatt

This comment has been minimized.

Copy link

@NateTheGreatt NateTheGreatt commented Jan 10, 2020

@gnif just wanted to thank you for your hard work, you saved my brand new threadripper build. really can't thank you enough. new patreon incoming.

has there been any progress made on an upstream fix for this?

@gnif

This comment has been minimized.

Copy link

@gnif gnif commented Jan 11, 2020

Thanks mate.

any progress made on an upstream fix for this

Not yet, things slowed down across the holiday break, contacts have gone quiet for now ;)
In the interim work is progressing on Looking Glass :)

@Transistor4aCPU

This comment has been minimized.

Copy link

@Transistor4aCPU Transistor4aCPU commented Jan 11, 2020

Which file should you patch? How do I apply the patch?

@NateTheGreatt

This comment has been minimized.

Copy link

@NateTheGreatt NateTheGreatt commented Jan 17, 2020

Which file should you patch? How do I apply the patch?

first you must download the source code of the linux kernel. the patch is applied in the root directory of the linux kernel source, before compiling. please google how to apply patches to the linux kernel using your distro of choice. this thread should be for information pertinent to the patch, not generic questions about the linux kernel itself.

@c0d3st0rm

This comment has been minimized.

Copy link

@c0d3st0rm c0d3st0rm commented Apr 12, 2020

Could this be applied with kpatch/live patching?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.