Skip to content

Instantly share code, notes, and snippets.

@geohot
Created March 26, 2024 20:34
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save geohot/7dff8fd6259b1e6d57efb772b900fd69 to your computer and use it in GitHub Desktop.
Save geohot/7dff8fd6259b1e6d57efb772b900fd69 to your computer and use it in GitHub Desktop.
MES Page Fault Crash
[55883.721977] amdgpu: map VA 0x702eae9d2000 - 0x702eae9d3000 in entry 0000000072d2b750
[55883.721996] amdgpu: INC mapping count 1
[55883.722133] kfd kfd: amdgpu: ioctl cmd 0xc0184b0c (#0xc), arg 0x7ffe16172bef
[55883.722238] gmc_v11_0_process_interrupt: 6 callbacks suppressed
[55883.722250] amdgpu 0000:c3:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32774, for process python3 pid 356134 thread python3 pid 356134)
[55883.722343] amdgpu 0000:c3:00.0: amdgpu: in page starting at address 0x00000000aabbc000 from client 10
[55883.722391] amdgpu 0000:c3:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800A30
[55883.722429] amdgpu 0000:c3:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[55883.722466] amdgpu 0000:c3:00.0: amdgpu: MORE_FAULTS: 0x0
[55883.722497] amdgpu 0000:c3:00.0: amdgpu: WALKER_ERROR: 0x0
[55883.722528] amdgpu 0000:c3:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[55883.722561] amdgpu 0000:c3:00.0: amdgpu: MAPPING_ERROR: 0x0
[55883.722592] amdgpu 0000:c3:00.0: amdgpu: RW: 0x0
[55883.722621] amdgpu: client id 0xa, source id 0, vmid 8, pasid 0x8006. raw data:
[55883.722628] amdgpu: 818000A, F6ED6F02, 4CD, 8006, AABBC, 40, 0, 0.
[55883.722660] amdgpu: Evicting PASID 0x8006 queues
[55883.861108] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[55883.861477] amdgpu: failed to remove hardware queue from MES, doorbell=0x1000
[55883.861495] amdgpu: MES might be in unrecoverable state, issue a GPU reset
[55883.861514] amdgpu: Failed to evict queue 0
[55883.862401] amdgpu 0000:c3:00.0: amdgpu: GPU reset begin!
[55883.862444] amdgpu: Free mem_obj = 00000000300c7743, range_start = 0, range_end = 0
[55884.885195] amdgpu 0000:c3:00.0: amdgpu: IP block:gfx_v11_0 is hung!
[55884.885390] amdgpu 0000:c3:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0, for process pid 0 thread pid 0)
[55884.885469] amdgpu 0000:c3:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[55884.885515] amdgpu 0000:c3:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A41
[55884.885554] amdgpu 0000:c3:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[55884.885590] amdgpu 0000:c3:00.0: amdgpu: MORE_FAULTS: 0x1
[55884.885620] amdgpu 0000:c3:00.0: amdgpu: WALKER_ERROR: 0x0
[55884.885651] amdgpu 0000:c3:00.0: amdgpu: PERMISSION_FAULTS: 0x4
[55884.885682] amdgpu 0000:c3:00.0: amdgpu: MAPPING_ERROR: 0x0
[55884.885713] amdgpu 0000:c3:00.0: amdgpu: RW: 0x1
[55884.885746] amdgpu 0000:c3:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0, for process pid 0 thread pid 0)
[55884.885801] amdgpu 0000:c3:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[55884.885846] amdgpu 0000:c3:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040C40
[55884.885883] amdgpu 0000:c3:00.0: amdgpu: Faulty UTCL2 client ID: CPG (0x6)
[55884.885919] amdgpu 0000:c3:00.0: amdgpu: MORE_FAULTS: 0x0
[55884.885949] amdgpu 0000:c3:00.0: amdgpu: WALKER_ERROR: 0x0
[55884.885979] amdgpu 0000:c3:00.0: amdgpu: PERMISSION_FAULTS: 0x4
[55884.886011] amdgpu 0000:c3:00.0: amdgpu: MAPPING_ERROR: 0x0
[55884.886041] amdgpu 0000:c3:00.0: amdgpu: RW: 0x1
[55884.886073] amdgpu 0000:c3:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0, for process pid 0 thread pid 0)
[55884.886128] amdgpu 0000:c3:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[55884.886172] amdgpu 0000:c3:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[55884.886209] amdgpu 0000:c3:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[55884.886245] amdgpu 0000:c3:00.0: amdgpu: MORE_FAULTS: 0x0
[55884.886275] amdgpu 0000:c3:00.0: amdgpu: WALKER_ERROR: 0x0
[55884.886305] amdgpu 0000:c3:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[55884.886337] amdgpu 0000:c3:00.0: amdgpu: MAPPING_ERROR: 0x0
[55884.886368] amdgpu 0000:c3:00.0: amdgpu: RW: 0x0
[55884.886401] amdgpu 0000:c3:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0, for process pid 0 thread pid 0)
[55884.888229] amdgpu 0000:c3:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[55884.889186] amdgpu 0000:c3:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[55884.890178] amdgpu 0000:c3:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[55884.891085] amdgpu 0000:c3:00.0: amdgpu: MORE_FAULTS: 0x0
[55884.891970] amdgpu 0000:c3:00.0: amdgpu: WALKER_ERROR: 0x0
[55884.892844] amdgpu 0000:c3:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[55884.893782] amdgpu 0000:c3:00.0: amdgpu: MAPPING_ERROR: 0x0
[55884.894651] amdgpu 0000:c3:00.0: amdgpu: RW: 0x0
[55885.154869] amdgpu: Free mem_obj = 00000000851f6ffb, range_start = 1, range_end = 8
[55885.154880] amdgpu: GFXOFF is enabled
[55885.155291] amdgpu: Unmap VA 0x702e0da49000 - 0x702e0da51000 from vm 00000000baebac5e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment