Skip to content

Instantly share code, notes, and snippets.

@gnif
Created May 15, 2024 00:03
Show Gist options
  • Save gnif/9448044927b66df17515ffbce137fb9f to your computer and use it in GitHub Desktop.
Save gnif/9448044927b66df17515ffbce137fb9f to your computer and use it in GitHub Desktop.
AMD Acronyms and Reset Information (2018)
At a high level we treat each GPU as an SOC. The SOC is built from a set of IP blocks (intellectual property) that provide various functionality. The driver is designed around the idea that each SOC is a collection of IP blocks. The IP blocks are versioned so that we can write a single driver component for all SOCs that contain that IP version. So, the general list of IPs that you may see on an SOC:
DCE - Display and Compositing Engine. This is the display block.
GFX - Graphics and Compute. This is the graphics and compute (shader) block.
GMC - Graphics Memory Controller. This is the memory controller for the GPU. It provides support for VRAM and vitualized access to VRAM and system memory for GPU clients.
SDMA - System DMA. This is a general purpose DMA engine on the GPU. It's generally used for paging of GPU memory and for things like transfer queues in user mode acceleration drivers.
UVD - Unified Video Decode. This is the video decode and encode block on the GPU. It started out as decode only and later gained support for encode of formats other than H.264 as well.
VCE - Video Codec Engine. This is the video encode engine for H.264 video.
PSP - Platform Security Processor. This sets the security policy on the GPU and handles firmware loading for the other IPs. PSP must be functional to use the other IPs on the system.
SMU - System Management Unit. This is the clock and voltage controller on the GPU.
The IPs that are on a specific SOC are enumerated in the soc files in the driver (e.g., vi.c soc15.c, etc.). There is a high level IP structure and each instance of the driver stores an array of all of the IPs on the SOC. Those IP structures have a common API and the driver enumerates the list and then for major operations like init, fini, suspend, resume, etc., the driver walks the list and calls that API for each IP on the SOC.
Most of the IPs on the GPU provide a light weight soft reset mechanism to reset that specific IP. Depending on the type of hang a soft reset may or may not be able to recover the IP. If it's not, you have to do a full adapter reset. This resets the entire GPU. PSP and SMU do not support soft reset. They cannot be reset once stared without a full adapter reset. On older asics adapter reset was done by writing a special sequence to pci config space. Internally this reset was handled by the SMU. For vega10 and newer, full adapter reset is handled by the PSP (mode1 reset). Soft reset is not currently implemented for any of the IPs on vega10, but it works similarly to older IPs. That said, you don't need soft reset if you have adapter reset. It should also be noted that in the event of an adapter reset, the contents of vram should not be considered reliable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment