Peng sunway513

## ai_vs_human_report.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                sunway513
                / ai_vs_human_report.md
            
            
              Created
              May 2, 2026 19:48
            
              
                AI agent vs human expert capability study: 38h AI saga vs 12h Lingpeng on same V4 cudagraph problem (DeepSeek-V4 CUDAGraph capture port onto ATOM PR#650)
              
          
    AI Agent vs Human Expert: V4 Cudagraph Capture Port — A Capability Study

Author: sunway513 (peng.sun@amd.com), with AI agent (Claude Opus 4.7, 1M context)
Date: 2026-05-02
Subject: Same problem (DeepSeek-V4 CUDAGraph capture port onto Lingpeng's PR#650 base in ATOM/ROCm), two solvers (~38 h AI session vs ~12 h human-expert focused work), what we learned about each.

TL;DR


## Dockerfile.roctracer-fix
# syntax=docker/dockerfile:1.6
#
# Dockerfile.roctracer-fix
#
# Rebuild pytorch in a ROCm/vLLM preview image with kineto reverted to the
# pre-rocprofiler-sdk commit, restoring healthy torch.profiler behavior under
# HIP graph replay.
#
# Restores GPU occupancy 73% -> 97%, hipGraphLaunch 324us -> ~50us under
# torch.profiler. See ROCm/AI-Frameworks-Dashboard#73 for context.
	# syntax=docker/dockerfile:1.6
	#
	# Dockerfile.roctracer-fix
	#
	# Rebuild pytorch in a ROCm/vLLM preview image with kineto reverted to the
	# pre-rocprofiler-sdk commit, restoring healthy torch.profiler behavior under
	# HIP graph replay.
	#
	# Restores GPU occupancy 73% -> 97%, hipGraphLaunch 324us -> ~50us under
	# torch.profiler. See ROCm/AI-Frameworks-Dashboard#73 for context.