| Component | Details |
|---|---|
| GPU | AMD Radeon RX 7700 XT / 7800 XT (Navi 32, gfx1101) |
| OS | Ubuntu 25.10 |
| Desktop | KDE Plasma (Wayland) |
| Kernel | 6.17.0-6-generic (issue present), 6.14.0-15-generic (testing) |
| CPU | Intel Core i7-12700F |
| RAM | 64 GB |
| VRAM | 12 GB GDDR6 |
- Full system lockups (hard freeze, requires power cycle)
- Random occurrence - not tied to specific GPU load
- Started after upgrading from Ubuntu 25.04 to 25.10 (kernel 6.14 → 6.17)
- No recovery possible once frozen (SysRq keys unresponsive)
[drm:drm_sched_entity_push_job [gpu_sched]] *ERROR* Trying to push to a killed entity
These appear frequently in syslog, often in pairs, throughout normal operation.
workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x00000040
amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
amdgpu 0000:03:00.0: amdgpu: DF poison setting is inconsistent(1:0:0:0)!
amdgpu 0000:03:00.0: amdgpu: Poison setting is inconsistent in DF/UMC(0:1)!
amdgpu: Freeing queue vital buffer 0x..., queue evicted
amdgpu.gpu_recovery=1
amdgpu.ppfeaturemask=0xfffd7fff
amdgpu.dpm=1
amdgpu.runpm=0
amdgpu.dc=1
amdgpu.dcfeaturemask=0x8
amdgpu.tmz=0
amdgpu.sg_display=0
amdgpu.dcdebugmask=0x10
iommu=soft
amdgpu.dpm=1
amdgpu.gpu_recovery=1
amdgpu.vm_fragment_size=9
amdgpu.lockup_timeout=60000
amdgpu.noretry=0
amdgpu.ppfeaturemask=0xffffffff
Had two conflicting systemd services limiting GPU power:
amdgpu-power-limit.service→ 220Wgpu-power-limit.service→ 180W
Resolution: Removed both services, reset to default 200W power cap.
sudo systemctl disable --now amdgpu-power-limit.service gpu-power-limit.service
sudo rm /etc/systemd/system/amdgpu-power-limit.service /etc/systemd/system/gpu-power-limit.service
echo 200000000 | sudo tee /sys/class/drm/card1/device/hwmon/hwmon*/power1_capSet kernel 6.14.0-15-generic as default boot option to test if issue is a 6.17 regression:
sudo sed -i 's|GRUB_DEFAULT=.*|GRUB_DEFAULT="gnulinux-advanced-UUID>gnulinux-6.14.0-15-generic-advanced-UUID"|' /etc/default/grub
sudo update-grubStatus: Testing in progress
- RDNA3 TLB Fence Issues Guide
- Arch Forums - System freeze because of amdgpu driver
- Arch Forums - AMD GPU crashing (SOLVED)
- freedesktop.org Bug #111481 - AMD Navi GPU frequent freezes
Per community reports, there's a documented kernel bug in memory management affecting RX 7000 series GPUs on kernels 6.14-6.17, causing:
- GPU freezes
- System hangs
- "Fence timeout" errors in logs
- Test stability on kernel 6.14
- If stable on 6.14, monitor upstream for 6.17+ fixes
- Consider testing
amdgpu.gfx_off=0parameter if issues persist - File bug report with AMD/kernel team if reproducible case identified
amdgpu 0000:03:00.0: amdgpu: initializing kernel modesetting (IP DISCOVERY 0x1002:0x747E 0x1DA2:0x475F 0xFF)
VRAM: 12272M (12272M used)
GART: 512M
RAM width 192bits GDDR6
SE 3, SH per SE 2, CU per SH 10, active_cu_number 54
This gist documents ongoing debugging efforts for AMD RDNA3 GPU stability issues on recent Linux kernels. Information may help upstream developers working on compatibility fixes.
Generated with Claude Code - please verify all information independently.