Errors cause graphics output to lock up but mouse still moves, keyboard is dead, libvirt guests are OK, SSH access is OK. Can't shutdown cleanly over SSH, keyboard input doesn't work. Requires hard power-off via power button hold.
syslog looks like:
Dec 19 08:10:48 kaim-eeyore kernel: [88492.249393] radeon 0000:03:00.0: ring 0 stalled for more than 10248msec
Dec 19 08:10:48 kaim-eeyore kernel: [88492.249395] radeon 0000:03:00.0: ring 3 stalled for more than 10248msec
Dec 19 08:10:48 kaim-eeyore kernel: [88492.249398] radeon 0000:03:00.0: GPU lockup (current fence id 0x000000000007de00 last fence id 0x000000000007df67 on ring 3)
Dec 19 08:10:48 kaim-eeyore kernel: [88492.249402] radeon 0000:03:00.0: GPU lockup (current fence id 0x0000000000035dda last fence id 0x0000000000035e12 on ring 0)
Dec 19 08:10:49 kaim-eeyore kernel: [88492.761408] radeon 0000:03:00.0: ring 0 stalled for more than 10760msec
Dec 19 08:10:49 kaim-eeyore kernel: [88492.761410] radeon 0000:03:00.0: ring 3 stalled for more than 10760msec
Dec 19 08:10:49 kaim-eeyore kernel: [88492.761412] radeon 0000:03:00.0: GPU lockup (current fence id 0x000000000007de00 last fence id 0x000000000007df67 on ring 3)
Dec 19 08:10:49 kaim-eeyore kernel: [88492.761415] radeon 0000:03:00.0: GPU lockup (current fence id 0x0000000000035dda last fence id 0x0000000000035e12 on ring 0)
Dec 19 08:10:49 kaim-eeyore kernel: [88493.448697] radeon 0000:03:00.0: Saved 4196 dwords of commands on ring 0.
Dec 19 08:10:49 kaim-eeyore kernel: [88493.448865] radeon 0000:03:00.0: GPU softreset: 0x0000004C
Happens with both radeon and amdgpu drivers. Seems to happen faster if using Chromium, regardless of if hardware accel is enabled/disabled.
Current kernel command line attempt to fix this by disabling DPM:
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on radeon.si_support=0 amdgpu.si_support=1 amdgpu.dc=1 amdgpu.dpm=0 isolcpus=10,11,22,23"
Some people complain that disabling DPM results in very high fan speeds and lots of noise. I don't notice any difference for my setup. My setup is AMD FirePro W5000 with dual display port outputs inside of a Dell Precision T5810 with Xeon E5-2687Wv4 and a USB powered 6 inch fan on top to push hot air out from under my desk.
Current theory is that with DPM (dynamic power management) enabled that there are power state transitions which the Linux drivers aren't handling correctly which cause a lockup internal to the card. Hopefully this will be fixed with new firmware or with updates to the Linux driver stack.
Pro tip: if you have an initrd which doesn't work and was built using an automated tool, don't go and rebuild all of your other initrds using the same tool as you'll likely end up crying ;)
Regarding graphics things, I'm giving up now on my AMD card. Switched back to using nouveau on my Quadro NVS 310 as although it's not perfect, at least it doesn't crash randomly all the time.