Skip to content

Instantly share code, notes, and snippets.

@runlevel5
Created November 8, 2022 08:37
Show Gist options
  • Save runlevel5/480c6a21934fc5886feaf5c4a7b4ff73 to your computer and use it in GitHub Desktop.
Save runlevel5/480c6a21934fc5886feaf5c4a7b4ff73 to your computer and use it in GitHub Desktop.
Loading AMDGPU driver (Kernel 6.0.3 ppc64le 64K pagesize) for Radeon 6600 XT card on Raptor Blackbird POWER9 computer
$ uname -ar
Linux shrimp-paste 6.0.6-300.fc37.ppc64le #1 SMP Tue Nov 1 19:24:50 UTC 2022 ppc64le ppc64le ppc64le GNU/Linux
$ lspci | grep ATI
0000:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c1)
0000:02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
0000:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c1)
0000:03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller
$ sudo modprobe amdgpu
$ dmesg -wH
[Nov 8 19:21] [drm] amdgpu kernel modesetting enabled.
[ +0.001587] amdgpu: CRAT table disabled by module option
[ +0.000002] amdgpu: DSDT table not found for OEM information
[ +0.000001] amdgpu: IO link not available for non x86 platforms
[ +0.000001] amdgpu: Virtual CRAT table created for CPU
[ +0.000015] amdgpu: Topology: Add CPU node
[ +0.000133] amdgpu 0000:03:00.0: enabling device (0140 -> 0142)
[ +0.000012] [drm] initializing kernel modesetting (DIMGREY_CAVEFISH 0x1002:0x73FF 0x148C:0x2412 0xC1).
[ +0.000013] [drm] register mmio base: 0x00000000
[ +0.000002] [drm] register mmio size: 1048576
[ +0.003530] [drm] add ip block number 0 <nv_common>
[ +0.000003] [drm] add ip block number 1 <gmc_v10_0>
[ +0.000002] [drm] add ip block number 2 <navi10_ih>
[ +0.000002] [drm] add ip block number 3 <psp>
[ +0.000002] [drm] add ip block number 4 <smu>
[ +0.000002] [drm] add ip block number 5 <dm>
[ +0.000002] [drm] add ip block number 6 <gfx_v10_0>
[ +0.000002] [drm] add ip block number 7 <sdma_v5_2>
[ +0.000002] [drm] add ip block number 8 <vcn_v3_0>
[ +0.000002] [drm] add ip block number 9 <jpeg_v3_0>
[ +0.029936] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ +0.000006] amdgpu: ATOM BIOS: 113-D53201XT-016
[ +0.000013] [drm] VCN(0) decode is enabled in VM mode
[ +0.000002] [drm] VCN(0) encode is enabled in VM mode
[ +0.000003] [drm] JPEG decode is enabled in VM mode
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ +0.000010] amdgpu 0000:03:00.0: amdgpu: PCIE atomic ops is not supported
[ +0.000008] [drm] GPU posting now...
[ +0.000034] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ +0.000043] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0x6000010000000-0x60000101fffff 64bit pref]
[ +0.000005] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0x6000000000000-0x600000fffffff 64bit pref]
[ +0.000026] pci 0000:02:00.0: BAR 15: releasing [mem 0x6000000000000-0x600001fffffff 64bit pref]
[ +0.000003] pci 0000:01:00.0: BAR 15: releasing [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]
[ +0.000003] pci 0000:00:00.0: BAR 15: releasing [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]
[ +0.000009] pci 0000:00:00.0: BAR 15: assigned [mem 0x6000000000000-0x60002ffffffff 64bit pref]
[ +0.000005] pci 0000:01:00.0: BAR 15: assigned [mem 0x6000000000000-0x60002ffffffff 64bit pref]
[ +0.000003] pci 0000:02:00.0: BAR 15: assigned [mem 0x6000000000000-0x60002ffffffff 64bit pref]
[ +0.000004] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x6000000000000-0x60001ffffffff 64bit pref]
[ +0.000010] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x6000200000000-0x60002001fffff 64bit pref]
[ +0.000009] pci 0000:00:00.0: PCI bridge to [bus 01-03]
[ +0.000005] pci 0000:00:00.0: bridge window [mem 0x600c000000000-0x600c07fefffff]
[ +0.000004] pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]
[ +0.000005] pci 0000:01:00.0: PCI bridge to [bus 02-03]
[ +0.000004] pci 0000:01:00.0: bridge window [mem 0x600c000000000-0x600c07fefffff]
[ +0.000004] pci 0000:01:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]
[ +0.000006] pci 0000:02:00.0: PCI bridge to [bus 03]
[ +0.000004] pci 0000:02:00.0: bridge window [mem 0x600c000000000-0x600c0003fffff]
[ +0.000004] pci 0000:02:00.0: bridge window [mem 0x6000000000000-0x60002ffffffff 64bit pref]
[ +0.000009] amdgpu 0000:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[ +0.000005] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ +0.000007] [drm] Detected VRAM RAM=8176M, BAR=8192M
[ +0.000002] [drm] RAM width 128bits GDDR6
[ +0.000003] amdgpu 0000:03:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
[ +0.000047] [drm] amdgpu: 8176M of VRAM memory ready
[ +0.000003] [drm] amdgpu: 32475M of GTT memory ready.
[ +0.000026] [drm] GART: num cpu pages 8192, num gpu pages 131072
[ +0.000098] [drm] PCIE GART of 512M enabled (table at 0x00000081FEB00000).
[ +0.014632] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist
[ +0.000006] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist
[ +12.233422] amdgpu 0000:03:00.0: amdgpu: STB initialized to 2048 entries
[ +0.006888] [drm] Loading DMUB firmware via PSP: version=0x02020013
[ +0.022845] [drm] use_doorbell being set to: [true]
[ +0.000025] [drm] use_doorbell being set to: [true]
[ +0.035346] [drm] Found VCN firmware Version ENC: 1.24 DEC: 2 VEP: 0 Revision: 0
[ +0.000020] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[ +0.080318] [drm] reserve 0xa00000 from 0x81fd000000 for PSP TMR
[ +0.115209] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ +0.022328] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ +0.000027] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b2900 (59.41.0)
[ +0.000008] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[ +0.000031] amdgpu 0000:03:00.0: amdgpu: use vbios provided pptable
[ +0.053791] amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
[ +0.000254] [drm] Display Core initialized with v3.2.198!
[ +0.001267] [drm] DMUB hardware initialized: version=0x02020013
[ +0.027608] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ +0.199231] [drm] kiq ring mec 2 pipe 1 q 0
[ +0.004087] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[ +0.000805] [drm] JPEG decode initialized successfully.
[ +0.001002] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ +0.000113] amdgpu: sdma_bitmap: ffff
[ +0.087988] memmap_init_zone_device initialised 130816 pages in 0ms
[ +0.000006] amdgpu: HMM registered 8176MB device memory
[ +0.000241] amdgpu: Virtual CRAT table created for GPU
[ +0.000252] amdgpu: Topology: Add dGPU node [0x73ff:0x1002]
[ +0.000007] kfd kfd: amdgpu: added device 1002:73ff
[ +0.000020] amdgpu 0000:03:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 32
[ +0.000163] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ +0.000004] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ +0.000004] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ +0.000004] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ +0.000004] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[ +0.000004] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[ +0.009773] EEH: Recovering PHB#0-PE#0
[ +0.000008] EEH: PE location: UOPWR.D100020-Node0-SLOT2 PCIE 4.0 X8, PHB location: N/A
[ +0.000003] EEH: Frozen PHB#0-PE#0 detected
[ +0.000002] EEH: Call Trace:
[ +0.000002] EEH: [000000004506a2b7] __eeh_send_failure_event+0x7c/0x160
[ +0.000010] EEH: [00000000aa35f118] eeh_dev_check_failure+0x2c0/0x690
[ +0.000010] EEH: [00000000cc882e0c] amdgpu_device_rreg.part.0+0x160/0x200 [amdgpu]
[ +0.000325] EEH: [0000000070f02d7f] mmhub_v2_0_set_clockgating+0xdc/0x5c0 [amdgpu]
[ +0.000343] EEH: [000000003d16de9d] gmc_v10_0_set_clockgating_state+0x74/0x170 [amdgpu]
[ +0.000343] EEH: [0000000009796c0b] amdgpu_device_set_cg_state+0xfc/0x1d0 [amdgpu]
[ +0.000315] EEH: [00000000a4d55ea9] amdgpu_device_ip_late_init+0x104/0x400 [amdgpu]
[ +0.000315] EEH: [000000002a9476f8] amdgpu_device_init+0x20c8/0x2440 [amdgpu]
[ +0.000317] EEH: [00000000a85b5b57] amdgpu_driver_load_kms+0x30/0x1e0 [amdgpu]
[ +0.000315] EEH: [00000000f3b898df] amdgpu_pci_probe+0x1c8/0x540 [amdgpu]
[ +0.000314] EEH: [00000000884d50fc] local_pci_probe+0x68/0xe0
[ +0.000005] EEH: [0000000042223d63] work_for_cpu_fn+0x38/0x60
[ +0.000006] EEH: [000000000c94498c] process_one_work+0x2b0/0x560
[ +0.000004] EEH: [0000000016e46df8] worker_thread+0x280/0x620
[ +0.000004] EEH: [000000009a393885] kthread+0x124/0x130
[ +0.000003] EEH: [00000000cc189609] ret_from_kernel_thread+0x5c/0x64
[ +0.000005] EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
[ +0.000003] EEH: Notify device drivers to shutdown
[ +0.000003] EEH: Beginning: 'error_detected(IO frozen)'
[ +0.245987] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ +0.000004] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ +0.000979] [drm] Initialized amdgpu 3.48.0 20150101 for 0000:03:00.0 on minor 1
[ +0.107409] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ +0.000007] amdgpu 0000:03:00.0: amdgpu: Failed to enable gfxoff!
[ +0.813702] [drm] Register(0) [mmUVD_PGFSM_STATUS] failed to reach value 0x00800000 != 0x00c00000
[ +0.000004] [drm:jpeg_v3_0_set_powergating_state [amdgpu]] *ERROR* amdgpu: JPEG enable power gating failed
[ +0.000226] [drm:amdgpu_device_ip_set_powergating_state [amdgpu]] *ERROR* set_powergating_state of IP block <jpeg_v3_0> failed -110
[ +0.230868] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000003
[ +0.230588] [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x7fffffff != 0xffffffff
[ +0.230879] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000003
[ +0.000007] amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx_0.0.0 (-110).
[ +0.000191] [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).
[ +0.005333] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ +0.000004] amdgpu 0000:03:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ +0.000004] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:34 param:0x00000001 message:SetWorkloadMask?
[Nov 8 19:22] amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
[ +0.008863] amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
[ +30.014303] amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
[ +0.000000] amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
[ +0.022291] fbcon: Deferring console take-over
[ +0.000005] amdgpu 0000:03:00.0: [drm] fb1: amdgpudrmfb frame buffer device
[ +0.001205] PCI 0000:03:00.0#0000: EEH: Invoking amdgpu->error_detected(IO frozen)
[ +0.000006] [drm] PCI error: detected callback, state(2)!!
[Nov 8 19:23] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=0, emitted seq=2
[ +0.000323] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
[ +0.000312] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[ +19.605497] amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
[ +0.015078] PCI 0000:03:00.0#0000: EEH: amdgpu driver reports: 'need reset'
[ +0.000008] PCI 0000:03:00.1#0000: EEH: driver not EEH aware
[ +0.000004] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ +0.000006] EEH: Collect temporary log
[ +0.000041] EEH: of node=0000:03:00.0
[ +0.000005] EEH: PCI device/vendor: 73ff1002
[ +0.000003] EEH: PCI cmd/status register: 00100546
[ +0.000003] EEH: PCI-E capabilities and status follow:
[ +0.000011] EEH: PCI-E 00: 0012a010 00008fa1 00002930 00440d04
[ +0.000009] EEH: PCI-E 10: 11040040 00000000 00000000 00000000
[ +0.000002] EEH: PCI-E 20: 00000000
[ +0.000002] EEH: PCI-E AER capability register set follows:
[ +0.000010] EEH: PCI-E AER 00: 20020001 00000000 00000000 00462030
[ +0.000009] EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000
[ +0.000009] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ +0.000004] EEH: PCI-E AER 30: 00000000 00000000
[ +0.000002] EEH: of node=0000:03:00.1
[ +0.000004] EEH: PCI device/vendor: ab281002
[ +0.000003] EEH: PCI cmd/status register: 00100546
[ +0.000002] EEH: PCI-E capabilities and status follow:
[ +0.000011] EEH: PCI-E 00: 0012a010 00008fa1 00002930 00440d04
[ +0.000008] EEH: PCI-E 10: 11040040 00000000 00000000 00000000
[ +0.000003] EEH: PCI-E 20: 00000000
[ +0.000001] EEH: PCI-E AER capability register set follows:
[ +0.000010] EEH: PCI-E AER 00: 2a020001 00000000 00000000 00462030
[ +0.000009] EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000
[ +0.000009] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ +0.000004] EEH: PCI-E AER 30: 00000000 00000000
[ +0.000002] PHB4 PHB#0 Diag-data (Version: 1)
[ +0.000002] brdgCtl: 00000002
[ +0.000002] RootSts: 00000020 00402000 a0840008 00100107 00001000
[ +0.000004] RootErrSts: 00000000 00008000 00000000
[ +0.000002] PhbSts: 0000001c00000000 0000001c00000000
[ +0.000003] Lem: 0000000100280000 0000000000000000 0000000100000000
[ +0.000002] PhbErr: 0000088000000000 0000008000000000 2148000098000240 a008400000000000
[ +0.000004] RxeArbErr: 8000200000000000 0000200000000000 02409fde30000000 0000000000000000
[ +0.000003] PblErr: 0000000008000000 0000000008000000 0000000000000000 0000000000000000
[ +0.000003] PcieDlp: 0000000000000000 0000000000000000 0088000000000000
[ +0.000003] RegbErr: 0000004000000000 0000004000000000 4800003c00000000 0000000000000200
[ +0.000003] PE[000] A/B: a480002a03000000 8000000000000000
[ +0.000004] EEH: Reset without hotplug activity
[ +0.000003] EEH: Removing 0000:03:00.1 without EEH sensitive driver
[ +0.133595] snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
[ +0.460045] snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
[ +0.706076] snd_hda_codec_hdmi hdaudioC0D0: Unable to sync register 0x2f0d00. -5
[ +0.198981] pci 0000:03:00.1: Removing from iommu group 0
[ +2.161239] amdgpu 0000:03:00.0: enabling device (0140 -> 0142)
[ +0.007844] EEH: Sleep 5s ahead of partial hotplug
[ +5.031611] pci 0000:03:00.1: [1002:ab28] type 00 class 0x040300
[ +0.000024] pci 0000:03:00.1: reg 0x10: [mem 0x600c000120000-0x600c000123fff]
[ +0.000059] pci 0000:03:00.1: BAR0 [mem size 0x00004000]: requesting alignment to 0x10000
[ +0.000065] pci 0000:03:00.1: PME# supported from D1 D2 D3hot D3cold
[ +0.000158] pci 0000:03:00.1: can't claim BAR 0 [mem size 0x00004000]: no address assigned
[ +0.000010] pci 0000:03:00.1: BAR 0: assigned [mem 0x600c000120000-0x600c000123fff]
[ +0.000007] pci 0000:02:00.0: PCI bridge to [bus 03]
[ +0.000006] pci 0000:02:00.0: bridge window [mem 0x600c000000000-0x600c0003fffff]
[ +0.000004] pci 0000:02:00.0: bridge window [mem 0x6000000000000-0x60002ffffffff 64bit pref]
[ +0.000014] pci 0000:03:00.1: Added to existing PE#0
[ +0.000006] pci 0000:03:00.1: Adding to iommu group 0
[ +0.000269] pci 0000:03:00.1: D0 power state depends on 0000:03:00.0
[ +0.001048] snd_hda_intel 0000:03:00.1: enabling device (0140 -> 0142)
[ +0.000013] snd_hda_intel 0000:03:00.1: Force to snoop mode by module option
[ +0.000060] EEH: Beginning: 'slot_reset'
[ +0.000004] PCI 0000:03:00.0#0000: EEH: Invoking amdgpu->slot_reset()
[ +0.000005] [drm] PCI error: slot reset callback!!
[ +0.003474] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ +0.002511] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:00.0/0000:01:00.0/0000:02:00.0/0000:03:00.1/sound/card0/input21
[ +0.000122] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:00.0/0000:01:00.0/0000:02:00.0/0000:03:00.1/sound/card0/input22
[ +0.000093] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:00.0/0000:01:00.0/0000:02:00.0/0000:03:00.1/sound/card0/input23
[ +0.000091] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:00.0/0000:01:00.0/0000:02:00.0/0000:03:00.1/sound/card0/input24
[ +0.000088] input: HDA ATI HDMI HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:00.0/0000:01:00.0/0000:02:00.0/0000:03:00.1/sound/card0/input25
[ +0.761095] [drm] free PSP TMR buffer
[ +0.021031] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
[ +0.000323] [drm] PCIE GART of 512M enabled (table at 0x00000081FEB00000).
[ +0.000058] [drm] VRAM is lost due to GPU reset!
[ +0.000001] [drm] PSP is resuming...
[ +0.095226] [drm] reserve 0xa00000 from 0x81fd000000 for PSP TMR
[ +0.116426] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ +0.022079] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ +0.000011] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[ +0.000009] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b2900 (59.41.0)
[ +0.000014] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[ +0.000041] amdgpu 0000:03:00.0: amdgpu: use vbios provided pptable
[ +0.054731] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
[ +0.001174] [drm] DMUB hardware initialized: version=0x02020013
[ +0.041966] [drm] kiq ring mec 2 pipe 1 q 0
[ +0.002958] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[ +0.000006] [drm] JPEG decode initialized successfully.
[ +0.000012] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[ +0.000002] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[ +0.000003] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[ +0.355874] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ +0.000010] amdgpu 0000:03:00.0: amdgpu: Failed to enable gfxoff!
[ +0.918993] amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx_0.0.0 (-110).
[ +0.000232] amdgpu 0000:03:00.0: amdgpu: ib ring test failed (-110).
[ +0.000004] [drm:amdgpu_pci_slot_reset [amdgpu]] *ERROR* PCIe error recovery failed, err:-11
[ +0.000406] PCI 0000:03:00.0#0000: EEH: amdgpu driver reports: 'disconnect'
[ +0.000008] PCI 0000:03:00.1#0000: EEH: driver not EEH aware
[ +0.000004] EEH: Finished:'slot_reset' with aggregate recovery state:'disconnect'
[ +0.000004] EEH: Unable to recover from failure from PHB#0-PE#0.
Please try reseating or replacing it
[ +0.000051] EEH: of node=0000:03:00.0
[ +0.000006] EEH: PCI device/vendor: 73ff1002
[ +0.000004] EEH: PCI cmd/status register: 00100546
[ +0.000002] EEH: PCI-E capabilities and status follow:
[ +0.000012] EEH: PCI-E 00: 0012a010 00008fa1 00002930 00440d04
[ +0.000009] EEH: PCI-E 10: 11040040 00000000 00000000 00000000
[ +0.000003] EEH: PCI-E 20: 00000000
[ +0.000002] EEH: PCI-E AER capability register set follows:
[ +0.000010] EEH: PCI-E AER 00: 20020001 00000000 00000000 00462030
[ +0.000008] EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000
[ +0.000008] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ +0.000004] EEH: PCI-E AER 30: 00000000 00000000
[ +0.000003] EEH: of node=0000:03:00.1
[ +0.000004] EEH: PCI device/vendor: ab281002
[ +0.000003] EEH: PCI cmd/status register: 00100546
[ +0.000002] EEH: PCI-E capabilities and status follow:
[ +0.000004] amdgpu 0000:03:00.0: amdgpu: Failed to power gate JPEG!
[ +0.000006] EEH: PCI-E 00: 0012a010 00008fa1 00002930 00440d04
[ +0.000008] EEH: PCI-E 10: 11040000 00000000 00000000 00000000
[ +0.000003] EEH: PCI-E 20: 00000000
[ +0.000002] EEH: PCI-E AER capability register set follows:
[ +0.000009] EEH: PCI-E AER 00: 2a020001 00000000 00000000 00462030
[ +0.000009] EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000
[ +0.000009] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ +0.000004] EEH: PCI-E AER 30: 00000000 00000000
[ +0.000003] PHB4 PHB#0 Diag-data (Version: 1)
[ +0.000002] brdgCtl: 00000002
[ +0.000002] RootSts: 00000020 00402000 a0840008 00100107 00001000
[ +0.000004] RootErrSts: 00000000 00008000 00000000
[ +0.000002] PhbSts: 0000001c00000000 0000001c00000000
[ +0.000003] Lem: 0000000100280000 0000000000000000 0000000100000000
[ +0.000003] PhbErr: 0000088000000000 0000008000000000 2148000098000240 a008400000000000
[ +0.000003] RxeArbErr: 8000200000000000 0000200000000000 02409fde30000000 0000000000000000
[ +0.000004] PblErr: 0000000008000000 0000000008000000 0000000000000000 0000000000000000
[ +0.000003] PcieDlp: 0000000000000000 0000000000000000 0088000000000000
[ +0.000002] RegbErr: 0000004000000000 0000004000000000 61000c4800000000 0000000000000000
[ +0.000003] PE[000] A/B: a480002a03000000 8000000000000000
[ +0.000006] EEH: Beginning: 'error_detected(permanent failure)'
[ +0.000003] PCI 0000:03:00.0#0000: EEH: not actionable (1,1,1)
[ +0.000004] PCI 0000:03:00.1#0000: EEH: not actionable (1,1,1)
[ +0.000004] EEH: Finished:'error_detected(permanent failure)'
[ -0.000098] [drm:amdgpu_dpm_enable_jpeg [amdgpu]] *ERROR* Dpm disable jpeg failed, ret = -5.
[ +2.070504] snd_hda_intel 0000:03:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x001f0500
[ +0.001831] pci 0000:03:00.1: Removing from iommu group 0
[ +0.000135] ------------[ cut here ]------------
[ +0.000002] WARNING: CPU: 13 PID: 2141 at kernel/kthread.c:659 kthread_park+0xe0/0x110
[ +0.000008] Modules linked in: amdgpu mfd_core gpu_sched drm_buddy drm_display_helper cec snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink sunrpc snd_hda_codec_hdmi snd_hda_intel snd_usb_audio snd_intel_dspcfg snd_hda_codec snd_hda_core snd_usbmidi_lib snd_rawmidi snd_hwdep snd_seq joydev snd_seq_device mc snd_pcm at24 regmap_i2c ast ofpart drm_vram_helper snd_timer ipmi_powernv drm_ttm_helper powernv_flash ipmi_devintf ttm snd crct10dif_vpmsum mtd ipmi_msghandler rtc_opal opal_prd i2c_opal soundcore zram hid_logitech_hidpp nvme tg3 nvme_core vmx_crypto crc32c_vpmsum hid_logitech_dj nvme_common ip6_tables ip_tables fuse
[ +0.000064] CPU: 13 PID: 2141 Comm: kworker/u64:4 Not tainted 6.0.6-300.fc37.ppc64le #1
[ +0.000004] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[ +0.000006] NIP: c00000000018a440 LR: c008000007181f40 CTR: c00000000018a360
[ +0.000002] REGS: c000000023ecf6e0 TRAP: 0700 Not tainted (6.0.6-300.fc37.ppc64le)
[ +0.000002] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44222222 XER: 200400f8
[ +0.000009] CFAR: c00000000018a3a4 IRQMASK: 0
GPR00: c008000007181f40 c000000023ecf980 c000000002a10b00 c000000028d19500
GPR04: c0000000be85f800 c000000023ecf8a8 c000000023ecf8a0 0000000000000000
GPR08: c0000000ffffdfff 0000000000000004 0000000000000000 c008000007185f98
GPR12: c00000000018a360 c000000fff700980 0000000000000001 c00000001092c280
GPR16: 0000000000000000 c0000000b0e860b8 c0000000b0e90000 000000000000001c
GPR20: 00000000ee6b2800 c0000000b0ea0000 c0000000b0ea0000 0000000000000000
GPR24: c0000000b0e89528 c0000000be85f800 c0000000b0e89690 0000000000000000
GPR28: c0000000b0e89528 c000000023ecfab0 c000000006939680 c000000028d19500
[ +0.000035] NIP [c00000000018a440] kthread_park+0xe0/0x110
[ +0.000003] LR [c008000007181f40] drm_sched_stop+0x48/0x310 [gpu_sched]
[ +0.000005] Call Trace:
[ +0.000001] [c000000023ecf980] [c000000023ecf9d0] 0xc000000023ecf9d0 (unreliable)
[ +0.000003] [c000000023ecf9b0] [c000000023ecfa30] 0xc000000023ecfa30
[ +0.000003] [c000000023ecfa30] [c0080000089b7a34] amdgpu_device_gpu_recover+0x47c/0xbf8 [amdgpu]
[ +0.000157] [c000000023ecfb50] [c008000008c190f4] amdgpu_job_timedout+0x1fc/0x280 [amdgpu]
[ +0.000177] [c000000023ecfc40] [c00800000718367c] drm_sched_job_timedout+0xd4/0x280 [gpu_sched]
[ +0.000005] [c000000023ecfc90] [c00000000017da80] process_one_work+0x2b0/0x560
[ +0.000004] [c000000023ecfd30] [c00000000017ddd8] worker_thread+0xa8/0x620
[ +0.000003] [c000000023ecfdc0] [c00000000018b184] kthread+0x124/0x130
[ +0.000002] [c000000023ecfe10] [c00000000000ceb0] ret_from_kernel_thread+0x5c/0x64
[ +0.000004] Instruction dump:
[ +0.000001] ebe1fff8 4e800020 60000000 60420000 38210030 3860ffda 7c6307b4 ebc1fff0
[ +0.000007] ebe1fff8 4e800020 60000000 60420000 <0fe00000> 7c0802a6 f8010040 60420000
[ +0.000007] ---[ end trace 0000000000000000 ]---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment