Author: sunway513 (peng.sun@amd.com), with AI agent (Claude Opus 4.7, 1M context) Date: 2026-05-02 Subject: Same problem (DeepSeek-V4 CUDAGraph capture port onto Lingpeng's PR#650 base in ATOM/ROCm), two solvers (~38 h AI session vs ~12 h human-expert focused work), what we learned about each.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # syntax=docker/dockerfile:1.6 | |
| # | |
| # Dockerfile.roctracer-fix | |
| # | |
| # Rebuild pytorch in a ROCm/vLLM preview image with kineto reverted to the | |
| # pre-rocprofiler-sdk commit, restoring healthy torch.profiler behavior under | |
| # HIP graph replay. | |
| # | |
| # Restores GPU occupancy 73% -> 97%, hipGraphLaunch 324us -> ~50us under | |
| # torch.profiler. See ROCm/AI-Frameworks-Dashboard#73 for context. |