Created
April 9, 2024 16:46
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
874f1987f [frontend] Improve function def parsing with stacked decorators. (#3564) | |
walltime ms: 0.350 | |
8237f1b45 [FRONTEND][NFC] Frontend cleanup (#3541) | |
walltime ms: 0.358 | |
88abff689 [FRONTEND] changed hook format and added launch metadata for external tools (#3492) | |
walltime ms: 0.356 | |
95d9b7f4a [FRONTEND][BACKEND] Move conversion of sw only fp8 types into the front end. (#3477) | |
walltime ms: 0.355 | |
dca2d07c4 [FRONTEND] Fix arg name conflict bug (#3383) | |
walltime ms: 0.357 | |
f08bdc1a8 [CACHE] Verify that when preloading a kernel its name matches what we have in specialization_data (#3395) | |
walltime ms: 0.357 | |
55bb88744 [CACHE] Adding RuntimeError on signature mismatch with the cached function (#3389) | |
walltime ms: 0.366 | |
d42ca115c Adding tl.const annotation to mark and validate that const tensors are not being stored to (#3360) | |
walltime ms: 0.269 | |
72cba380a [AMD] Add amd f8 datatype (#3322) | |
walltime ms: 0.191 | |
5a7bf72e2 [Easy][FRONTEND] Add pre run hooks to JITFunction (#3314) | |
walltime ms: 0.194 | |
18cb30ca7 [easy][nfc] Consistently use sha256 for hashing (#3246) | |
walltime ms: 0.193 | |
c681b5390 [RUNTIME] Include higher order function arguments in cache_key (#3137) | |
walltime ms: 0.190 | |
d32880ce5 [INTERPRETER] Revive flash attention test (#3158) | |
walltime ms: 0.281 | |
98b5945d2 [FRONTEND] Fix dtype serialization when preloading with dtype const (#3129) | |
walltime ms: 0.191 | |
91641c329 [FRONTEND] Support preloading kernel (#3121) | |
walltime ms: 0.188 | |
d883e9570 [FRONTEND] Remove specialization for divisible by 8 (#3122) | |
walltime ms: 0.175 | |
b6e24b699 [FRONTEND] Allow `tl.{u}int{width}` annotations to bypass opportunistic value-based JIT-specialization (#3102) | |
walltime ms: 0.174 | |
00c144eec Revert "[FRONTEND] Allow `tl.{u}int{width}` annotations to bypass opportunistic value-based JIT-specialization" (#3103) | |
walltime ms: 0.162 | |
5aef9810f [FRONTEND] Allow `tl.{u}int{width}` annotations to bypass opportunistic value-based JIT-specialization | |
walltime ms: 0.173 | |
dd2a32363 Remove experimental TMA and Warp specialization features (#3080) | |
walltime ms: 0.165 | |
b844d519b [RUNTIME] Allow setting active driver (#2973) | |
walltime ms: 0.172 | |
f3e2d8408 [FRONTEND] make CompiledKernel `metadata` a namedtuple instead of a dict, and pass it to hook in lieu of kernel object (#2929) | |
walltime ms: 0.169 | |
8594268c8 [FRONTEND] Update jit.py to delay the import of InterpretedFunction to avoid being dependent on numpy by default (#2904) | |
walltime ms: 0.169 | |
48034034c [FRONTEND] use standard plugin interface for CUDA (#2887) | |
walltime ms: 0.171 | |
53d868113 [CLEANUP] Fix typos across the project (#2876) | |
walltime ms: 0.166 | |
03ceaa64c [BACKEND] clean-up how we use LLVM (#2844) | |
walltime ms: 0.168 | |
03678a3af [FRONTEND] make some cuda-specific functions more general; remove triton-translate (#2811) | |
walltime ms: 0.128 | |
73a331925 [FRONTEND] split pybind11 src into multiple files (#2810) | |
walltime ms: 0.128 | |
c6040bcbd When computing cache keys, be more strict about checkint the module name (#2713) | |
walltime ms: 0.128 | |
755002bd3 [FRONTEND] clean-up runtime/jit.py (#2756) | |
walltime ms: 0.128 | |
f2bc68ec0 Rewrite some very frequently called "try" statements (was too expensive). (#2742) | |
walltime ms: 0.147 | |
72c983392 [FRONTEND] refactor `compiler` submodule (#2701) | |
walltime ms: 0.158 | |
9998b1064 [RUNTIME] Ensure changed line numbers invalidate cache (#2600) | |
walltime ms: 0.166 | |
df08301e7 Reformat Python code with yapf. (#2589) | |
walltime ms: 0.176 | |
943330790 [FRONTEND] add do_not_specialize property back to JITFunction (#2573) | |
walltime ms: 0.178 | |
12f906287 [FRONTEND] Refactor jit.py. (#2556) | |
walltime ms: 0.165 | |
f88b01f55 Apply `ruff` pre-commit to python/triton/runtime. (#2558) | |
walltime ms: 0.102 | |
768fc1fcd [FRONTEND] change hash to not require ptxas (#2476) | |
walltime ms: 0.103 | |
29828fe49 [FRONTEND] add option to disable fp mul/add fusion (#2495) | |
walltime ms: 0.105 | |
cb83b42ed [FRONTEND] using closure to create jit launcher (#2289) | |
walltime ms: 0.103 | |
e686b4d6d [FRONTEND] interpreter rewrite (#2321) | |
walltime ms: 0.074 | |
37f12497b [FRONTEND] Add PyTorch fp8 dtypes to Triton (#2279) | |
walltime ms: 0.075 | |
9e9fbe01f [FRONTEND] Fix specialization on triton integer types (#2236) | |
walltime ms: 0.076 | |
c6d33dceb [ROCM] Core Functionality for AMD (#1983) | |
walltime ms: 0.087 | |
ab3e8b0da [FRONTEND] fix handling of do_not_specialize with interior constantexprs (#2188) | |
walltime ms: 0.088 | |
ebfe0ffb2 [FRONTEND] fix for undefined dtypes in jit during loading defaults (#2114) | |
walltime ms: 0.087 | |
6cb67185f [FRONTEND]To use proper default num_warps and num_stages based on the device backend in JITFucntion (#2130) | |
walltime ms: 0.074 | |
23dd11d47 [BACKEND] Solidify f8e4m3 (#2105) | |
walltime ms: 0.064 | |
fc667d1f8 [FRONTEND] fix new absolute imports (#2072) | |
walltime ms: 0.064 | |
98372f46d [FRONTEND] Remove extra calls to _get_config causing runtime overhead (#2094) | |
walltime ms: 0.075 | |
a01c116f7 [FRONTEND/BACKEND] Revived Float8E4B15x4 (#2090) | |
walltime ms: 0.257 | |
776b3784c [FRONTEND] further improve version_key speed (#2073) | |
walltime ms: 0.248 | |
0e11257b8 [FRONTEND] improve speed of computing version_key (#2071) | |
walltime ms: 0.249 | |
30a331e62 [FRONTEND] Support jit functions without arguments (#2043) | |
walltime ms: 0.246 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment