Skip to content

Instantly share code, notes, and snippets.

View sunway513's full-sized avatar

Peng sunway513

View GitHub Profile
@sunway513
sunway513 / ai_vs_human_report.md
Created May 2, 2026 19:48
AI agent vs human expert capability study: 38h AI saga vs 12h Lingpeng on same V4 cudagraph problem (DeepSeek-V4 CUDAGraph capture port onto ATOM PR#650)

AI Agent vs Human Expert: V4 Cudagraph Capture Port — A Capability Study

Author: sunway513 (peng.sun@amd.com), with AI agent (Claude Opus 4.7, 1M context) Date: 2026-05-02 Subject: Same problem (DeepSeek-V4 CUDAGraph capture port onto Lingpeng's PR#650 base in ATOM/ROCm), two solvers (~38 h AI session vs ~12 h human-expert focused work), what we learned about each.


TL;DR

@sunway513
sunway513 / Dockerfile.roctracer-fix
Created April 11, 2026 03:02
Dockerfile + rebuild script for pytorch rocprofiler-sdk kineto regression fix (ROCm/AI-Frameworks-Dashboard#73)
# syntax=docker/dockerfile:1.6
#
# Dockerfile.roctracer-fix
#
# Rebuild pytorch in a ROCm/vLLM preview image with kineto reverted to the
# pre-rocprofiler-sdk commit, restoring healthy torch.profiler behavior under
# HIP graph replay.
#
# Restores GPU occupancy 73% -> 97%, hipGraphLaunch 324us -> ~50us under
# torch.profiler. See ROCm/AI-Frameworks-Dashboard#73 for context.