Created
September 24, 2025 16:16
-
-
Save syadegari/ada8311c44c91357645d82c7f9dfbe71 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| root@computeinstance-e00ccn3b72xn1s8y6y:/workspace# mkdir -p /workspace/syad && cd /workspace/syad | |
| git clone --depth=1 --branch main --single-branch https://github.com/HazyResearch/ThunderKittens ThunderKittens 2>/dev/null || true | |
| # --- versions --- | |
| . /etc/os-release 2>/dev/null; UBU="${NAME:-Linux} ${VERSION_ID:-$(uname -r)}" | |
| CUDA=$(nvcc --version 2>/dev/null | sed -n 's/.*release \([0-9.]\+\).*/\1/p' | head -1 || echo "") | |
| TORCH=$(python -c "import torch; print(torch.__version__)" 2>/dev/null || echo "not found") | |
| TRT=$(python -c "import tensorrt as trt; print(trt.__version__)" 2>/dev/null || echo "not found") | |
| GPU=$(nvidia-smi --query-gpu=name --format=csv,noheader 2>/dev/null | paste -sd ' / ' - || echo "not found") | |
| HASH=$(git -C ThunderKittens rev-parse --short=12 HEAD 2>/dev/null || echo "unknown") | |
| echo | |
| echo "Ubuntu: $UBU" | |
| echo "CUDA: ${CUDA:-not found}" | |
| echo "PyTorch: $TORCH" | |
| echo "TensorRT: $TRT" | |
| echo "GPU: $GPU" | |
| echo "TK HEAD (main): $HASH" | |
| echo | |
| set -x | |
| export THUNDERKITTENS_ROOT=/workspace/syad/ThunderKittens | |
| cd /workspace/syad/ThunderKittens/kernels/matmul/educational/ | |
| set +x | |
| make clean && make && ./matmul || echo "educational failed" | |
| set -x | |
| cd ../H100/ | |
| set +x | |
| make clean && make && ./matmul || echo "H100 failed" | |
| Ubuntu: Ubuntu 24.04 | |
| CUDA: 12.8 | |
| PyTorch: 2.7.0a0+7c8ec84dab.nv25.03 | |
| TensorRT: 10.9.0.34 | |
| GPU: NVIDIA H100 80GB HBM3 | |
| TK HEAD (main): 2ba96ceedfb1 | |
| + export THUNDERKITTENS_ROOT=/workspace/syad/ThunderKittens | |
| + THUNDERKITTENS_ROOT=/workspace/syad/ThunderKittens | |
| + cd /workspace/syad/ThunderKittens/kernels/matmul/educational/ | |
| + set +x | |
| rm -f matmul | |
| nvcc level_08.cu -DNDEBUG -Xcompiler=-fPIE -Xcompiler -fopenmp --expt-extended-lambda --expt-relaxed-constexpr -Xcompiler=-Wno-psabi -Xcompiler=-fno-strict-aliasing --use_fast_math -forward-unknown-to-host-compiler -O3 -Xnvlink=--verbose -Xptxas=--verbose -Xptxas=--warn-on-spills -std=c++20 -MD -MT -MF -x cu -lrt -lpthread -ldl -DKITTENS_HOPPER -arch=sm_90a -lcuda -lcudadevrt -lcudart_static -lcublas -lgomp -I/workspace/syad/ThunderKittens/include -I/workspace/syad/ThunderKittens/prototype -o matmul | |
| ptxas info : (C7508) Potential Performance Loss: 'setmaxnreg' ignored; unable to determine register count at entry. | |
| ptxas info : 3 bytes gmem | |
| ptxas info : Compiling entry function '_Z6kernel14matmul_globals' for 'sm_90a' | |
| ptxas info : Function properties for _Z6kernel14matmul_globals | |
| 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads | |
| ptxas info : Used 154 registers, used 16 barriers, 128 bytes smem | |
| ptxas info : Compile time = 73.157 ms | |
| nvlink info : 0 bytes gmem | |
| -------------------- M=4096 N=4096 K=4096 -------------------- | |
| Allocated host memory | |
| Initialized matrices | |
| Performed CPU matrix multiplication | |
| Allocated device memory | |
| Copied matrices to device | |
| Avg Kernel execution time: 268.414 us | |
| Achieved performance: 512.041 TFLOPs | |
| Copied result back to host | |
| Converted result back to float | |
| Max error: 0.0982647 | |
| Error count: 0 | |
| Total count: 16777216 | |
| + cd ../H100/ | |
| + set +x | |
| rm -f matmul | |
| nvcc matmul.cu -DNDEBUG -Xcompiler=-fPIE -Xcompiler -fopenmp --expt-extended-lambda --expt-relaxed-constexpr -Xcompiler=-Wno-psabi -Xcompiler=-fno-strict-aliasing --use_fast_math -forward-unknown-to-host-compiler -O3 -Xnvlink=--verbose -Xptxas=--verbose -Xptxas=--warn-on-spills -std=c++20 -MD -MT -MF -x cu -lrt -lpthread -ldl -DKITTENS_HOPPER -arch=sm_90a -lcuda -lcudadevrt -lcudart_static -lcublas -lgomp -I/workspace/syad/ThunderKittens/include -I/workspace/syad/ThunderKittens/prototype -o matmul | |
| ptxas info : 3 bytes gmem | |
| ptxas info : Compiling entry function '_ZN7kittens9prototype3lcf6kernelI15matmul_templateILi2ELi4ELi8EEEEvNT_6layout7globalsE' for 'sm_90a' | |
| ptxas info : Function properties for _ZN7kittens9prototype3lcf6kernelI15matmul_templateILi2ELi4ELi8EEEEvNT_6layout7globalsE | |
| 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads | |
| ptxas info : Used 168 registers, used 16 barriers, 80 bytes smem | |
| ptxas info : Compile time = 244.284 ms | |
| nvlink info : 0 bytes gmem | |
| -------------------- M=4096 N=4096 K=4096 -------------------- | |
| Block size: 128x256 | |
| Allocated host memory | |
| Initialized matrices | |
| Performed CPU matrix multiplication | |
| Allocated device memory | |
| Copied matrices to device | |
| Launching warmup kernel with grid (132, 1), block (384) | |
| Launching kernel with grid (132, 1), block (384) | |
| terminate called after throwing an instance of 'std::runtime_error' | |
| what(): Error in tile TMA descriptor creation: unspecified launch failure | |
| Parameters: | |
| batch: 1 | |
| depth: 1 | |
| rows: 4096 | |
| cols: 4096 | |
| ST::rows: 64 | |
| ST::cols: 64 | |
| cuTensorMapEncodeTiled arguments: | |
| tma_map: 140730429461440 | |
| tma_format: 9 | |
| tma_dim: 5 | |
| global_addr: 139840108625920 | |
| global_addr memory type: valid device memory | |
| gmem_shape: 140730429460672 [64, 4096, 64, 1, 1] | |
| gmem_stride: 140730429459776 [8192, 128, 33554432, 33554432] | |
| smem_shape: 140730429459712 [64, 64, 1, 1, 1] | |
| smem_stride: 140730429459744 [1, 1, 1, 1, 1] | |
| tma_interleave: 0 | |
| tma_swizzle: 3 | |
| tma_l2Promotion: 0 | |
| tma_oobFill: 0 | |
| Aborted (core dumped) | |
| H100 failed | |
| root@computeinstance-e00ccn3b72xn1s8y6y:/workspace/syad/ThunderKittens/kernels/matmul/H100# |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment