Skip to content

Instantly share code, notes, and snippets.

@jjsjann123
Created July 18, 2024 21:13
Show Gist options
  • Save jjsjann123/87345938c0dd0c12b83c2b8f4c42fa9c to your computer and use it in GitHub Desktop.
Save jjsjann123/87345938c0dd0c12b83c2b8f4c42fa9c to your computer and use it in GitHub Desktop.
case 0001 - thunder with bookend enabled + https://github.com/NVIDIA/Fuser/pull/2630
case 0002 - thunder with bookend disabled + https://github.com/NVIDIA/Fuser/pull/2630
case 0003 - thunder with bookend enabled + nvfuser main (in PR 2630)
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-13b-hf-backward-bs1-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-13b-hf-backward-bs1-thunder] (0003_9d49ebe) 1.1494 (1.0) 1.8138 (1.08) 1.1747 (1.00) 0.0693 (1.05) 1.1571 (1.0) 0.0041 (1.01) 53;73 851.2685 (1.00) 871 1
test_litgpt_qkv_split_rope[Llama-2-13b-hf-backward-bs1-thunder] (0001_39ec109) 1.1506 (1.00) 1.6751 (1.0) 1.1745 (1.0) 0.0685 (1.04) 1.1574 (1.00) 0.0041 (1.0) 51;75 851.4553 (1.0) 870 1
test_litgpt_qkv_split_rope[Llama-2-13b-hf-backward-bs1-thunder] (0002_68bcaa0) 1.3169 (1.15) 1.8319 (1.09) 1.3401 (1.14) 0.0662 (1.0) 1.3235 (1.14) 0.0056 (1.38) 39;106 746.2110 (0.88) 760 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-13b-hf-backward-bs2-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-13b-hf-backward-bs2-thunder] (0002_68bcaa0) 2.1667 (1.0) 2.7531 (1.0) 2.2035 (1.0) 0.0718 (1.0) 2.1880 (1.0) 0.0069 (1.01) 24;86 453.8304 (1.0) 463 1
test_litgpt_qkv_split_rope[Llama-2-13b-hf-backward-bs2-thunder] (0001_39ec109) 2.3433 (1.08) 2.9658 (1.08) 2.3869 (1.08) 0.0741 (1.03) 2.3708 (1.08) 0.0068 (1.0) 22;65 418.9508 (0.92) 427 1
test_litgpt_qkv_split_rope[Llama-2-13b-hf-backward-bs2-thunder] (0003_9d49ebe) 2.3455 (1.08) 2.9712 (1.08) 2.3876 (1.08) 0.0761 (1.06) 2.3706 (1.08) 0.0069 (1.01) 22;47 418.8246 (0.92) 427 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-13b-hf-forward-bs1-thunder]': 3 tests -----------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-13b-hf-forward-bs1-thunder] (0003_9d49ebe) 574.0351 (1.0) 738.6361 (1.0) 582.7912 (1.0) 26.1511 (1.0) 576.6049 (1.0) 2.0040 (1.04) 43;52 1.7159 (1.0) 872 2
test_litgpt_qkv_split_rope[Llama-2-13b-hf-forward-bs1-thunder] (0001_39ec109) 574.6759 (1.00) 743.6601 (1.01) 583.0165 (1.00) 26.7738 (1.02) 576.9003 (1.00) 1.9354 (1.0) 41;50 1.7152 (1.00) 871 2
test_litgpt_qkv_split_rope[Llama-2-13b-hf-forward-bs1-thunder] (0002_68bcaa0) 590.7016 (1.03) 763.0377 (1.03) 602.1098 (1.03) 26.5964 (1.02) 596.2721 (1.03) 2.9230 (1.51) 40;44 1.6608 (0.97) 848 2
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-13b-hf-forward-bs2-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-13b-hf-forward-bs2-thunder] (0003_9d49ebe) 1.1485 (1.0) 1.4426 (1.00) 1.1709 (1.0) 0.0608 (1.13) 1.1531 (1.0) 0.0044 (1.02) 66;73 854.0682 (1.0) 873 1
test_litgpt_qkv_split_rope[Llama-2-13b-hf-forward-bs2-thunder] (0001_39ec109) 1.1488 (1.00) 1.4412 (1.0) 1.1713 (1.00) 0.0617 (1.15) 1.1533 (1.00) 0.0043 (1.0) 66;76 853.7724 (1.00) 872 1
test_litgpt_qkv_split_rope[Llama-2-13b-hf-forward-bs2-thunder] (0002_68bcaa0) 1.1541 (1.00) 1.4689 (1.02) 1.1766 (1.00) 0.0536 (1.0) 1.1643 (1.01) 0.0053 (1.24) 45;49 849.8995 (1.00) 867 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-13b-hf-inference-bs1-thunder]': 3 tests -----------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-13b-hf-inference-bs1-thunder] (0003_9d49ebe) 564.5920 (1.0) 722.1149 (1.0) 572.8849 (1.0) 24.0246 (1.0) 567.1922 (1.0) 2.2240 (1.0) 42;58 1.7456 (1.0) 890 2
test_litgpt_qkv_split_rope[Llama-2-13b-hf-inference-bs1-thunder] (0001_39ec109) 564.7773 (1.00) 726.7585 (1.01) 573.2385 (1.00) 24.3502 (1.01) 567.6080 (1.00) 2.3965 (1.08) 42;50 1.7445 (1.00) 885 2
test_litgpt_qkv_split_rope[Llama-2-13b-hf-inference-bs1-thunder] (0002_68bcaa0) 586.7993 (1.04) 762.1502 (1.06) 600.1130 (1.05) 25.0755 (1.04) 594.5591 (1.05) 3.1121 (1.40) 41;48 1.6664 (0.95) 851 2
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-13b-hf-inference-bs2-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-13b-hf-inference-bs2-thunder] (0001_39ec109) 1.1483 (1.0) 1.4301 (1.00) 1.1747 (1.00) 0.0670 (1.33) 1.1525 (1.0) 0.0044 (1.07) 83;87 851.2901 (1.00) 871 1
test_litgpt_qkv_split_rope[Llama-2-13b-hf-inference-bs2-thunder] (0003_9d49ebe) 1.1490 (1.00) 1.4234 (1.0) 1.1747 (1.00) 0.0666 (1.32) 1.1527 (1.00) 0.0042 (1.0) 83;90 851.2566 (1.00) 871 1
test_litgpt_qkv_split_rope[Llama-2-13b-hf-inference-bs2-thunder] (0002_68bcaa0) 1.1523 (1.00) 1.4649 (1.03) 1.1731 (1.0) 0.0505 (1.0) 1.1612 (1.01) 0.0054 (1.30) 45;56 852.4226 (1.0) 870 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-70b-hf-backward-bs1-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-70b-hf-backward-bs1-thunder] (0003_9d49ebe) 1.3649 (1.0) 2.1050 (1.0) 1.3940 (1.0) 0.0763 (1.03) 1.3764 (1.0) 0.0071 (1.0) 36;82 717.3398 (1.0) 733 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-backward-bs1-thunder] (0001_39ec109) 1.4112 (1.03) 2.1406 (1.02) 1.4496 (1.04) 0.0853 (1.15) 1.4272 (1.04) 0.0095 (1.33) 38;108 689.8452 (0.96) 709 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-backward-bs1-thunder] (0002_68bcaa0) 2.1292 (1.56) 2.7618 (1.31) 2.1677 (1.56) 0.0743 (1.0) 2.1520 (1.56) 0.0077 (1.09) 24;64 461.3108 (0.64) 471 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-70b-hf-backward-bs2-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-70b-hf-backward-bs2-thunder] (0003_9d49ebe) 2.3181 (1.0) 2.7254 (1.0) 2.3639 (1.0) 0.0716 (1.0) 2.3468 (1.0) 0.0075 (1.16) 27;50 423.0269 (1.0) 431 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-backward-bs2-thunder] (0001_39ec109) 2.5545 (1.10) 3.0172 (1.11) 2.5984 (1.10) 0.0749 (1.05) 2.5806 (1.10) 0.0067 (1.04) 24;47 384.8508 (0.91) 392 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-backward-bs2-thunder] (0002_68bcaa0) 4.1344 (1.78) 4.5415 (1.67) 4.1754 (1.77) 0.0751 (1.05) 4.1559 (1.77) 0.0065 (1.0) 16;22 239.4967 (0.57) 242 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-70b-hf-forward-bs1-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-70b-hf-forward-bs1-thunder] (0002_68bcaa0) 1.1632 (1.0) 1.4903 (1.0) 1.1843 (1.0) 0.0513 (1.04) 1.1724 (1.0) 0.0047 (1.63) 44;57 844.3647 (1.0) 861 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-forward-bs1-thunder] (0003_9d49ebe) 1.2631 (1.09) 1.5493 (1.04) 1.2789 (1.08) 0.0495 (1.0) 1.2674 (1.08) 0.0033 (1.14) 38;49 781.9436 (0.93) 792 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-forward-bs1-thunder] (0001_39ec109) 1.2633 (1.09) 1.5520 (1.04) 1.2788 (1.08) 0.0501 (1.01) 1.2672 (1.08) 0.0029 (1.0) 38;65 781.9888 (0.93) 793 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-70b-hf-forward-bs2-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-70b-hf-forward-bs2-thunder] (0002_68bcaa0) 2.0829 (1.0) 2.3992 (1.0) 2.1063 (1.0) 0.0529 (1.22) 2.0937 (1.0) 0.0061 (2.35) 26;32 474.7645 (1.0) 481 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-forward-bs2-thunder] (0003_9d49ebe) 2.4031 (1.15) 2.6825 (1.12) 2.4164 (1.15) 0.0435 (1.0) 2.4067 (1.15) 0.0026 (1.0) 19;32 413.8305 (0.87) 417 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-forward-bs2-thunder] (0001_39ec109) 2.4036 (1.15) 2.6477 (1.10) 2.4203 (1.15) 0.0456 (1.05) 2.4100 (1.15) 0.0047 (1.81) 20;29 413.1796 (0.87) 417 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-70b-hf-inference-bs1-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-70b-hf-inference-bs1-thunder] (0002_68bcaa0) 1.1621 (1.0) 1.5012 (1.0) 1.1839 (1.0) 0.0562 (1.10) 1.1710 (1.0) 0.0048 (1.63) 44;50 844.6648 (1.0) 861 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-inference-bs1-thunder] (0001_39ec109) 1.2798 (1.10) 1.5350 (1.02) 1.2960 (1.09) 0.0510 (1.0) 1.2843 (1.10) 0.0029 (1.0) 37;60 771.6307 (0.91) 782 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-inference-bs1-thunder] (0003_9d49ebe) 1.2804 (1.10) 1.5786 (1.05) 1.2980 (1.10) 0.0578 (1.13) 1.2848 (1.10) 0.0032 (1.10) 37;52 770.4420 (0.91) 782 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-70b-hf-inference-bs2-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-70b-hf-inference-bs2-thunder] (0002_68bcaa0) 2.0836 (1.0) 2.4144 (1.0) 2.1082 (1.0) 0.0556 (1.17) 2.0955 (1.0) 0.0059 (1.87) 25;32 474.3402 (1.0) 480 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-inference-bs2-thunder] (0001_39ec109) 2.4364 (1.17) 2.7046 (1.12) 2.4526 (1.16) 0.0477 (1.0) 2.4417 (1.17) 0.0031 (1.0) 19;27 407.7310 (0.86) 411 1
test_litgpt_qkv_split_rope[Llama-2-70b-hf-inference-bs2-thunder] (0003_9d49ebe) 2.4371 (1.17) 2.6993 (1.12) 2.4523 (1.16) 0.0485 (1.02) 2.4409 (1.16) 0.0033 (1.05) 20;31 407.7883 (0.86) 411 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-7b-hf-backward-bs1-thunder]': 3 tests --------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-7b-hf-backward-bs1-thunder] (0003_9d49ebe) 966.5266 (1.0) 1,274.6090 (1.0) 991.6783 (1.0) 67.3159 (1.32) 974.3124 (1.0) 4.6606 (1.21) 65;72 1,008.3916 (1.0) 1039 1
test_litgpt_qkv_split_rope[Llama-2-7b-hf-backward-bs1-thunder] (0001_39ec109) 968.3007 (1.00) 1,387.2627 (1.09) 993.5829 (1.00) 70.6508 (1.38) 975.4039 (1.00) 4.1781 (1.09) 65;76 1,006.4585 (1.00) 1035 1
test_litgpt_qkv_split_rope[Llama-2-7b-hf-backward-bs1-thunder] (0002_68bcaa0) 1,086.9848 (1.12) 1,358.3573 (1.07) 1,105.7024 (1.11) 51.0456 (1.0) 1,092.8805 (1.12) 3.8380 (1.0) 53;61 904.4025 (0.90) 922 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-7b-hf-backward-bs2-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-7b-hf-backward-bs2-thunder] (0002_68bcaa0) 1.7148 (1.0) 2.0211 (1.0) 1.7437 (1.0) 0.0579 (1.0) 1.7268 (1.0) 0.0188 (1.0) 36;36 573.4846 (1.0) 584 1
test_litgpt_qkv_split_rope[Llama-2-7b-hf-backward-bs2-thunder] (0003_9d49ebe) 1.9095 (1.11) 2.7698 (1.37) 1.9492 (1.12) 0.0868 (1.50) 1.9331 (1.12) 0.0192 (1.02) 28;30 513.0360 (0.89) 524 1
test_litgpt_qkv_split_rope[Llama-2-7b-hf-backward-bs2-thunder] (0001_39ec109) 1.9111 (1.11) 3.2624 (1.61) 1.9577 (1.12) 0.1137 (1.96) 1.9343 (1.12) 0.0195 (1.04) 28;50 510.7933 (0.89) 524 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-7b-hf-forward-bs1-thunder]': 3 tests -----------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-7b-hf-forward-bs1-thunder] (0003_9d49ebe) 490.3623 (1.0) 670.0815 (1.0) 499.5598 (1.0) 29.3366 (1.07) 492.7265 (1.0) 1.8413 (1.0) 50;64 2.0018 (1.0) 1021 2
test_litgpt_qkv_split_rope[Llama-2-7b-hf-forward-bs1-thunder] (0001_39ec109) 490.5020 (1.00) 672.7516 (1.00) 499.6060 (1.00) 28.2535 (1.03) 493.1726 (1.00) 1.9012 (1.03) 48;59 2.0016 (1.00) 1021 2
test_litgpt_qkv_split_rope[Llama-2-7b-hf-forward-bs1-thunder] (0002_68bcaa0) 493.5381 (1.01) 677.0343 (1.01) 505.1594 (1.01) 27.5387 (1.0) 499.0739 (1.01) 2.4311 (1.32) 49;59 1.9796 (0.99) 1013 2
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------ benchmark 'test_litgpt_qkv_split_rope[Llama-2-7b-hf-forward-bs2-thunder]': 3 tests ------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-7b-hf-forward-bs2-thunder] (0002_68bcaa0) 961.0066 (1.0) 1,301.5298 (1.02) 982.3816 (1.0) 56.3930 (1.0) 969.2824 (1.0) 4.6082 (1.15) 54;64 1,017.9344 (1.0) 1042 1
test_litgpt_qkv_split_rope[Llama-2-7b-hf-forward-bs2-thunder] (0003_9d49ebe) 976.0242 (1.02) 1,289.2867 (1.01) 1,002.0712 (1.02) 67.7384 (1.20) 980.8233 (1.01) 3.9912 (1.0) 91;100 997.9331 (0.98) 1025 1
test_litgpt_qkv_split_rope[Llama-2-7b-hf-forward-bs2-thunder] (0001_39ec109) 976.9667 (1.02) 1,278.4265 (1.0) 1,004.5210 (1.02) 69.1080 (1.23) 982.4364 (1.01) 4.3432 (1.09) 94;104 995.4993 (0.98) 1025 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-2-7b-hf-inference-bs1-thunder]': 3 tests -----------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-7b-hf-inference-bs1-thunder] (0001_39ec109) 480.2174 (1.0) 654.3170 (1.01) 488.9436 (1.00) 27.2999 (1.08) 482.5373 (1.0) 1.9888 (1.04) 52;66 2.0452 (1.00) 1044 2
test_litgpt_qkv_split_rope[Llama-2-7b-hf-inference-bs1-thunder] (0003_9d49ebe) 480.6588 (1.00) 645.2850 (1.0) 488.6043 (1.0) 25.3896 (1.0) 482.7022 (1.00) 1.9213 (1.0) 50;59 2.0466 (1.0) 1043 2
test_litgpt_qkv_split_rope[Llama-2-7b-hf-inference-bs1-thunder] (0002_68bcaa0) 492.8224 (1.03) 662.7725 (1.03) 502.9896 (1.03) 25.6136 (1.01) 497.3048 (1.03) 2.5826 (1.34) 48;60 1.9881 (0.97) 1016 2
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------ benchmark 'test_litgpt_qkv_split_rope[Llama-2-7b-hf-inference-bs2-thunder]': 3 tests ------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-2-7b-hf-inference-bs2-thunder] (0002_68bcaa0) 955.6860 (1.0) 1,256.3746 (1.01) 976.5058 (1.0) 49.1856 (1.0) 965.0886 (1.0) 4.4480 (1.24) 53;65 1.0241 (1.0) 1046 1
test_litgpt_qkv_split_rope[Llama-2-7b-hf-inference-bs2-thunder] (0003_9d49ebe) 969.9231 (1.01) 1,245.4838 (1.00) 996.3445 (1.02) 65.7105 (1.34) 974.5723 (1.01) 3.5763 (1.0) 99;107 1.0037 (0.98) 1030 1
test_litgpt_qkv_split_rope[Llama-2-7b-hf-inference-bs2-thunder] (0001_39ec109) 970.6048 (1.02) 1,243.7003 (1.0) 997.3207 (1.02) 65.2221 (1.33) 975.9706 (1.01) 4.4382 (1.24) 98;104 1.0027 (0.98) 1032 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-70B-backward-bs1-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-70B-backward-bs1-thunder] (0003_9d49ebe) 2.4899 (1.0) 2.8107 (1.0) 2.5343 (1.0) 0.0679 (1.30) 2.5172 (1.0) 0.0083 (1.38) 26;38 394.5923 (1.0) 402 1
test_litgpt_qkv_split_rope[Llama-3-70B-backward-bs1-thunder] (0001_39ec109) 2.5844 (1.04) 2.8996 (1.03) 2.6277 (1.04) 0.0669 (1.28) 2.6112 (1.04) 0.0065 (1.08) 24;43 380.5640 (0.96) 387 1
test_litgpt_qkv_split_rope[Llama-3-70B-backward-bs1-thunder] (0002_68bcaa0) 3.9971 (1.61) 4.3265 (1.54) 4.0280 (1.59) 0.0524 (1.0) 4.0167 (1.60) 0.0060 (1.0) 12;28 248.2615 (0.63) 251 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-70B-backward-bs2-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-70B-backward-bs2-thunder] (0003_9d49ebe) 4.4816 (1.0) 5.1227 (1.0) 4.5144 (1.0) 0.0790 (1.43) 4.4972 (1.0) 0.0102 (1.24) 11;16 221.5114 (1.0) 224 1
test_litgpt_qkv_split_rope[Llama-3-70B-backward-bs2-thunder] (0001_39ec109) 4.9309 (1.10) 5.7774 (1.13) 4.9686 (1.10) 0.0960 (1.74) 4.9474 (1.10) 0.0083 (1.0) 11;24 201.2626 (0.91) 203 1
test_litgpt_qkv_split_rope[Llama-3-70B-backward-bs2-thunder] (0002_68bcaa0) 8.0227 (1.79) 8.3380 (1.63) 8.0559 (1.78) 0.0552 (1.0) 8.0443 (1.79) 0.0090 (1.09) 6;9 124.1321 (0.56) 125 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-70B-forward-bs1-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-70B-forward-bs1-thunder] (0002_68bcaa0) 2.0732 (1.0) 2.4158 (1.0) 2.1001 (1.0) 0.0551 (1.04) 2.0873 (1.0) 0.0055 (1.63) 25;31 476.1693 (1.0) 482 1
test_litgpt_qkv_split_rope[Llama-3-70B-forward-bs1-thunder] (0003_9d49ebe) 2.3140 (1.12) 2.6080 (1.08) 2.3310 (1.11) 0.0532 (1.01) 2.3186 (1.11) 0.0037 (1.09) 21;33 428.9969 (0.90) 433 1
test_litgpt_qkv_split_rope[Llama-3-70B-forward-bs1-thunder] (0001_39ec109) 2.3167 (1.12) 2.6070 (1.08) 2.3350 (1.11) 0.0528 (1.0) 2.3229 (1.11) 0.0034 (1.0) 21;27 428.2653 (0.90) 433 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-70B-forward-bs2-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-70B-forward-bs2-thunder] (0002_68bcaa0) 3.9257 (1.0) 4.2115 (1.0) 3.9547 (1.0) 0.0586 (1.19) 3.9409 (1.0) 0.0070 (1.67) 14;17 252.8613 (1.0) 255 1
test_litgpt_qkv_split_rope[Llama-3-70B-forward-bs2-thunder] (0003_9d49ebe) 4.6060 (1.17) 4.8767 (1.16) 4.6230 (1.17) 0.0507 (1.03) 4.6109 (1.17) 0.0048 (1.15) 11;12 216.3087 (0.86) 218 1
test_litgpt_qkv_split_rope[Llama-3-70B-forward-bs2-thunder] (0001_39ec109) 4.6070 (1.17) 4.8682 (1.16) 4.6255 (1.17) 0.0492 (1.0) 4.6145 (1.17) 0.0042 (1.0) 11;13 216.1908 (0.85) 218 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-70B-inference-bs1-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-70B-inference-bs1-thunder] (0002_68bcaa0) 2.0745 (1.0) 2.3732 (1.0) 2.0966 (1.0) 0.0484 (1.0) 2.0855 (1.0) 0.0061 (2.30) 24;26 476.9625 (1.0) 482 1
test_litgpt_qkv_split_rope[Llama-3-70B-inference-bs1-thunder] (0003_9d49ebe) 2.3491 (1.13) 2.6522 (1.12) 2.3671 (1.13) 0.0558 (1.15) 2.3541 (1.13) 0.0034 (1.26) 21;31 422.4492 (0.89) 426 1
test_litgpt_qkv_split_rope[Llama-3-70B-inference-bs1-thunder] (0001_39ec109) 2.3524 (1.13) 2.6557 (1.12) 2.3702 (1.13) 0.0543 (1.12) 2.3577 (1.13) 0.0027 (1.0) 20;35 421.8964 (0.88) 426 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-70B-inference-bs2-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-70B-inference-bs2-thunder] (0002_68bcaa0) 3.9197 (1.0) 4.2165 (1.0) 3.9461 (1.0) 0.0484 (1.0) 3.9355 (1.0) 0.0075 (1.64) 12;14 253.4129 (1.0) 255 1
test_litgpt_qkv_split_rope[Llama-3-70B-inference-bs2-thunder] (0003_9d49ebe) 4.6805 (1.19) 4.9524 (1.17) 4.6981 (1.19) 0.0538 (1.11) 4.6860 (1.19) 0.0046 (1.0) 10;12 212.8498 (0.84) 214 1
test_litgpt_qkv_split_rope[Llama-3-70B-inference-bs2-thunder] (0001_39ec109) 4.6817 (1.19) 4.9944 (1.18) 4.7020 (1.19) 0.0568 (1.17) 4.6900 (1.19) 0.0047 (1.02) 10;10 212.6774 (0.84) 214 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-8B-backward-bs1-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-8B-backward-bs1-thunder] (0003_9d49ebe) 1.4446 (1.0) 1.8426 (1.00) 1.4762 (1.0) 0.0700 (1.16) 1.4567 (1.0) 0.0075 (1.03) 43;111 677.4310 (1.0) 692 1
test_litgpt_qkv_split_rope[Llama-3-8B-backward-bs1-thunder] (0001_39ec109) 1.4970 (1.04) 1.8397 (1.0) 1.5324 (1.04) 0.0664 (1.10) 1.5111 (1.04) 0.0199 (2.70) 42;52 652.5690 (0.96) 669 1
test_litgpt_qkv_split_rope[Llama-3-8B-backward-bs1-thunder] (0002_68bcaa0) 2.1523 (1.49) 2.4913 (1.35) 2.1841 (1.48) 0.0602 (1.0) 2.1724 (1.49) 0.0074 (1.0) 24;55 457.8636 (0.68) 466 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-8B-backward-bs2-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-8B-backward-bs2-thunder] (0003_9d49ebe) 2.4904 (1.0) 3.1497 (1.0) 2.5344 (1.0) 0.0809 (1.39) 2.5168 (1.0) 0.0085 (1.35) 20;38 394.5744 (1.0) 402 1
test_litgpt_qkv_split_rope[Llama-3-8B-backward-bs2-thunder] (0001_39ec109) 2.7624 (1.11) 3.4113 (1.08) 2.8036 (1.11) 0.0802 (1.38) 2.7856 (1.11) 0.0078 (1.23) 19;44 356.6885 (0.90) 363 1
test_litgpt_qkv_split_rope[Llama-3-8B-backward-bs2-thunder] (0002_68bcaa0) 4.1678 (1.67) 4.4861 (1.42) 4.2055 (1.66) 0.0581 (1.0) 4.1913 (1.67) 0.0063 (1.0) 15;27 237.7855 (0.60) 240 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-8B-forward-bs1-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-8B-forward-bs1-thunder] (0002_68bcaa0) 1.1831 (1.0) 1.5146 (1.0) 1.2054 (1.0) 0.0548 (1.06) 1.1926 (1.0) 0.0048 (1.74) 44;49 829.6001 (1.0) 846 1
test_litgpt_qkv_split_rope[Llama-3-8B-forward-bs1-thunder] (0003_9d49ebe) 1.2741 (1.08) 1.5905 (1.05) 1.2925 (1.07) 0.0556 (1.07) 1.2798 (1.07) 0.0027 (1.0) 38;60 773.6756 (0.93) 784 1
test_litgpt_qkv_split_rope[Llama-3-8B-forward-bs1-thunder] (0001_39ec109) 1.2757 (1.08) 1.5614 (1.03) 1.2930 (1.07) 0.0517 (1.0) 1.2813 (1.07) 0.0032 (1.16) 38;50 773.3818 (0.93) 785 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-8B-forward-bs2-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-8B-forward-bs2-thunder] (0002_68bcaa0) 2.1178 (1.0) 2.4669 (1.0) 2.1427 (1.0) 0.0608 (1.17) 2.1285 (1.0) 0.0055 (1.26) 25;28 466.6992 (1.0) 473 1
test_litgpt_qkv_split_rope[Llama-3-8B-forward-bs2-thunder] (0003_9d49ebe) 2.4166 (1.14) 2.7110 (1.10) 2.4370 (1.14) 0.0518 (1.0) 2.4243 (1.14) 0.0046 (1.06) 22;26 410.3326 (0.88) 414 1
test_litgpt_qkv_split_rope[Llama-3-8B-forward-bs2-thunder] (0001_39ec109) 2.4194 (1.14) 2.7481 (1.11) 2.4401 (1.14) 0.0542 (1.05) 2.4270 (1.14) 0.0043 (1.0) 23;30 409.8129 (0.88) 414 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-8B-inference-bs1-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-8B-inference-bs1-thunder] (0002_68bcaa0) 1.1811 (1.0) 1.4872 (1.0) 1.2027 (1.0) 0.0505 (1.0) 1.1910 (1.0) 0.0046 (1.94) 44;50 831.4394 (1.0) 846 1
test_litgpt_qkv_split_rope[Llama-3-8B-inference-bs1-thunder] (0003_9d49ebe) 1.2894 (1.09) 1.5821 (1.06) 1.3058 (1.09) 0.0521 (1.03) 1.2938 (1.09) 0.0024 (1.0) 37;75 765.8194 (0.92) 776 1
test_litgpt_qkv_split_rope[Llama-3-8B-inference-bs1-thunder] (0001_39ec109) 1.2899 (1.09) 1.5964 (1.07) 1.3085 (1.09) 0.0554 (1.10) 1.2963 (1.09) 0.0036 (1.50) 37;42 764.2224 (0.92) 776 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Llama-3-8B-inference-bs2-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Llama-3-8B-inference-bs2-thunder] (0002_68bcaa0) 2.1130 (1.0) 2.4023 (1.0) 2.1352 (1.0) 0.0481 (1.0) 2.1245 (1.0) 0.0055 (1.99) 23;27 468.3308 (1.0) 474 1
test_litgpt_qkv_split_rope[Llama-3-8B-inference-bs2-thunder] (0003_9d49ebe) 2.4556 (1.16) 2.7771 (1.16) 2.4724 (1.16) 0.0578 (1.20) 2.4593 (1.16) 0.0028 (1.0) 19;32 404.4689 (0.86) 408 1
test_litgpt_qkv_split_rope[Llama-3-8B-inference-bs2-thunder] (0001_39ec109) 2.4574 (1.16) 2.7280 (1.14) 2.4756 (1.16) 0.0559 (1.16) 2.4627 (1.16) 0.0041 (1.47) 20;22 403.9436 (0.86) 408 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Mistral-7B-v0.1-backward-bs1-thunder]': 3 tests --------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-backward-bs1-thunder] (0003_9d49ebe) 833.6753 (1.0) 1,268.3570 (1.0) 863.2496 (1.0) 77.9489 (1.38) 843.4244 (1.0) 4.5239 (1.21) 75;79 1,158.4135 (1.0) 1200 1
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-backward-bs1-thunder] (0001_39ec109) 857.0496 (1.03) 1,880.3440 (1.48) 891.3220 (1.03) 90.5087 (1.60) 867.1694 (1.03) 6.5423 (1.75) 77;144 1,121.9290 (0.97) 1171 1
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-backward-bs1-thunder] (0002_68bcaa0) 1,164.3507 (1.40) 1,455.2809 (1.15) 1,185.1804 (1.37) 56.6036 (1.0) 1,170.6725 (1.39) 3.7281 (1.0) 53;61 843.7534 (0.73) 859 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Mistral-7B-v0.1-backward-bs2-thunder]': 3 tests -----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-backward-bs2-thunder] (0003_9d49ebe) 1.3672 (1.0) 1.7846 (1.0) 1.3982 (1.0) 0.0768 (1.40) 1.3774 (1.0) 0.0057 (1.0) 46;99 715.1881 (1.0) 732 1
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-backward-bs2-thunder] (0001_39ec109) 1.4991 (1.10) 1.8608 (1.04) 1.5310 (1.09) 0.0727 (1.32) 1.5104 (1.10) 0.0099 (1.74) 42;65 653.1602 (0.91) 668 1
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-backward-bs2-thunder] (0002_68bcaa0) 2.2082 (1.62) 2.4679 (1.38) 2.2400 (1.60) 0.0549 (1.0) 2.2274 (1.62) 0.0066 (1.16) 28;59 446.4238 (0.62) 454 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Mistral-7B-v0.1-forward-bs1-thunder]': 3 tests -----------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-forward-bs1-thunder] (0002_68bcaa0) 611.8962 (1.0) 798.4489 (1.0) 623.7554 (1.0) 28.5427 (1.13) 617.4822 (1.0) 2.8514 (1.67) 38;46 1.6032 (1.0) 817 2
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-forward-bs1-thunder] (0003_9d49ebe) 655.7745 (1.07) 816.8439 (1.02) 663.9504 (1.06) 25.2701 (1.0) 658.0185 (1.07) 1.7098 (1.0) 37;55 1.5061 (0.94) 763 2
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-forward-bs1-thunder] (0001_39ec109) 656.9168 (1.07) 825.1602 (1.03) 666.5052 (1.07) 27.1270 (1.07) 660.2430 (1.07) 2.0489 (1.20) 38;53 1.5004 (0.94) 762 2
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Mistral-7B-v0.1-forward-bs2-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-forward-bs2-thunder] (0002_68bcaa0) 1.1841 (1.0) 1.5069 (1.0) 1.2073 (1.0) 0.0550 (1.05) 1.1945 (1.0) 0.0054 (1.63) 43;49 828.2721 (1.0) 843 1
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-forward-bs2-thunder] (0003_9d49ebe) 1.3161 (1.11) 1.6223 (1.08) 1.3346 (1.11) 0.0528 (1.00) 1.3218 (1.11) 0.0042 (1.27) 41;46 749.2623 (0.90) 761 1
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-forward-bs2-thunder] (0001_39ec109) 1.3189 (1.11) 1.6222 (1.08) 1.3369 (1.11) 0.0526 (1.0) 1.3240 (1.11) 0.0033 (1.0) 41;57 747.9992 (0.90) 760 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Mistral-7B-v0.1-inference-bs1-thunder]': 3 tests -----------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-inference-bs1-thunder] (0002_68bcaa0) 608.6505 (1.0) 769.3740 (1.0) 619.2626 (1.0) 24.4132 (1.0) 613.8531 (1.0) 2.8908 (1.96) 39;46 1.6148 (1.0) 824 2
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-inference-bs1-thunder] (0003_9d49ebe) 662.1759 (1.09) 854.1443 (1.11) 671.3290 (1.08) 29.6139 (1.21) 664.4833 (1.08) 1.4757 (1.0) 37;57 1.4896 (0.92) 756 2
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-inference-bs1-thunder] (0001_39ec109) 662.7077 (1.09) 858.9285 (1.12) 672.5471 (1.09) 28.7203 (1.18) 666.2593 (1.09) 1.9625 (1.33) 36;49 1.4869 (0.92) 755 2
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[Mistral-7B-v0.1-inference-bs2-thunder]': 3 tests ----------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-inference-bs2-thunder] (0002_68bcaa0) 1.1816 (1.0) 1.4946 (1.0) 1.2019 (1.0) 0.0514 (1.04) 1.1900 (1.0) 0.0045 (1.52) 43;53 832.0504 (1.0) 849 1
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-inference-bs2-thunder] (0003_9d49ebe) 1.3313 (1.13) 1.5963 (1.07) 1.3467 (1.12) 0.0494 (1.0) 1.3353 (1.12) 0.0031 (1.07) 36;49 742.5479 (0.89) 752 1
test_litgpt_qkv_split_rope[Mistral-7B-v0.1-inference-bs2-thunder] (0001_39ec109) 1.3327 (1.13) 1.6244 (1.09) 1.3503 (1.12) 0.0523 (1.06) 1.3382 (1.12) 0.0029 (1.0) 36;55 740.5897 (0.89) 751 1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[phi-2-backward-bs1-thunder]': 3 tests -----------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[phi-2-backward-bs1-thunder] (0002_68bcaa0) 413.4714 (1.0) 593.5216 (1.0) 425.0547 (1.0) 28.9642 (1.0) 417.2190 (1.0) 2.0915 (1.08) 84;123 2.3526 (1.0) 1208 2
test_litgpt_qkv_split_rope[phi-2-backward-bs1-thunder] (0003_9d49ebe) 509.3822 (1.23) 719.9156 (1.21) 519.8101 (1.22) 30.5723 (1.06) 512.9115 (1.23) 2.1737 (1.12) 46;58 1.9238 (0.82) 982 2
test_litgpt_qkv_split_rope[phi-2-backward-bs1-thunder] (0001_39ec109) 509.7184 (1.23) 712.4012 (1.20) 519.8365 (1.22) 30.5913 (1.06) 513.0244 (1.23) 1.9437 (1.0) 46;60 1.9237 (0.82) 982 2
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[phi-2-backward-bs2-thunder]': 3 tests --------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[phi-2-backward-bs2-thunder] (0002_68bcaa0) 855.8463 (1.0) 1,222.2901 (1.0) 878.0552 (1.0) 64.3214 (1.0) 861.5991 (1.0) 3.5341 (1.0) 72;86 1,138.8805 (1.0) 1169 1
test_litgpt_qkv_split_rope[phi-2-backward-bs2-thunder] (0001_39ec109) 1,061.1359 (1.24) 1,368.3066 (1.12) 1,085.7051 (1.24) 68.3327 (1.06) 1,067.7939 (1.24) 4.0680 (1.15) 59;71 921.0604 (0.81) 942 1
test_litgpt_qkv_split_rope[phi-2-backward-bs2-thunder] (0003_9d49ebe) 1,064.9329 (1.24) 1,368.9576 (1.12) 1,088.8843 (1.24) 69.0291 (1.07) 1,070.5930 (1.24) 3.9125 (1.11) 60;72 918.3713 (0.81) 940 1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[phi-2-forward-bs1-thunder]': 3 tests --------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[phi-2-forward-bs1-thunder] (0002_68bcaa0) 264.1503 (1.0) 32,047.8598 (53.22) 351.1163 (1.0) 1,630.3071 (58.81) 266.2939 (1.0) 1.1983 (1.0) 1;31 2.8481 (1.0) 380 10
test_litgpt_qkv_split_rope[phi-2-forward-bs1-thunder] (0001_39ec109) 421.4216 (1.60) 602.2383 (1.00) 431.5364 (1.23) 28.1007 (1.01) 425.1185 (1.60) 2.7438 (2.29) 58;70 2.3173 (0.81) 1191 2
test_litgpt_qkv_split_rope[phi-2-forward-bs1-thunder] (0003_9d49ebe) 422.7692 (1.60) 602.2230 (1.0) 433.5003 (1.23) 27.7237 (1.0) 427.0775 (1.60) 2.9320 (2.45) 60;70 2.3068 (0.81) 1185 2
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------ benchmark 'test_litgpt_qkv_split_rope[phi-2-forward-bs2-thunder]': 3 tests ------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[phi-2-forward-bs2-thunder] (0002_68bcaa0) 316.5050 (1.0) 438.4018 (1.0) 324.0075 (1.0) 18.4277 (1.0) 319.5088 (1.0) 2.0470 (1.0) 60;68 3.0863 (1.0) 1054 3
test_litgpt_qkv_split_rope[phi-2-forward-bs2-thunder] (0001_39ec109) 882.9692 (2.79) 1,235.8259 (2.82) 913.8048 (2.82) 76.0675 (4.13) 888.9297 (2.78) 4.3295 (2.11) 108;116 1.0943 (0.35) 1133 1
test_litgpt_qkv_split_rope[phi-2-forward-bs2-thunder] (0003_9d49ebe) 884.2209 (2.79) 1,209.7368 (2.76) 915.3895 (2.83) 77.7803 (4.22) 889.7614 (2.78) 4.6969 (2.29) 108;114 1.0924 (0.35) 1131 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[phi-2-inference-bs1-thunder]': 3 tests -----------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[phi-2-inference-bs1-thunder] (0002_68bcaa0) 241.7048 (1.0) 284.7171 (1.0) 244.7735 (1.0) 5.6194 (1.0) 243.4491 (1.0) 1.0933 (1.0) 24;28 4.0854 (1.0) 415 10
test_litgpt_qkv_split_rope[phi-2-inference-bs1-thunder] (0001_39ec109) 327.1885 (1.35) 441.3140 (1.55) 335.2728 (1.37) 19.1354 (3.41) 330.4615 (1.36) 2.4914 (2.28) 60;65 2.9826 (0.73) 1021 3
test_litgpt_qkv_split_rope[phi-2-inference-bs1-thunder] (0003_9d49ebe) 329.7934 (1.36) 451.6471 (1.59) 338.3180 (1.38) 20.3373 (3.62) 333.1862 (1.37) 2.2419 (2.05) 60;70 2.9558 (0.72) 1011 3
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------- benchmark 'test_litgpt_qkv_split_rope[phi-2-inference-bs2-thunder]': 3 tests -----------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_litgpt_qkv_split_rope[phi-2-inference-bs2-thunder] (0002_68bcaa0) 300.6080 (1.0) 430.4169 (1.0) 309.1545 (1.0) 21.1278 (1.0) 303.8944 (1.0) 2.1807 (1.0) 62;72 3.2346 (1.0) 1108 3
test_litgpt_qkv_split_rope[phi-2-inference-bs2-thunder] (0001_39ec109) 555.6252 (1.85) 713.3083 (1.66) 564.3503 (1.83) 24.5695 (1.16) 558.5134 (1.84) 2.3164 (1.06) 44;51 1.7719 (0.55) 900 2
test_litgpt_qkv_split_rope[phi-2-inference-bs2-thunder] (0003_9d49ebe) 557.1833 (1.85) 715.0518 (1.66) 565.4833 (1.83) 25.0493 (1.19) 559.5875 (1.84) 2.3595 (1.08) 43;56 1.7684 (0.55) 898 2
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean
@jjsjann123
Copy link
Author

jjsjann123 commented Jul 19, 2024

Looking at backward rope. taking benchmark litgpt_qkv_split_rope[Llama-2-70b-hf-backward-bs1-thunder]

With bookend

import torch
from thunder.executors.torchex import no_autocast

@torch.no_grad()
@no_autocast
def backward_fn(saved_for_backward, cotangents):
  # saved_for_backward: "Collection"
  # cotangents: "Collection"
  C0, C1, = saved_for_backward
  clear_mutable_collection(saved_for_backward)
  del saved_for_backward
  t6, t7, t8, = cotangents
  clear_mutable_collection(cotangents)
  del cotangents
  t31, t36, = C0
  clear_mutable_collection(C0)
  del C0
  i15, = C1
  clear_mutable_collection(C1)
  del C1
  t222 = torch_slice_prim_impl(t7, [0, 0, 0, 0], [1, 64, 4096, 128], [1, 1, 1, 1])  # t222: "cuda:0 bf16[1, 64, 4096, 128]"
  del t7
  t226 = torch_slice_prim_impl(t6, [0, 0, 0, 0], [1, 64, 4096, 128], [1, 1, 1, 1])  # t226: "cuda:0 bf16[1, 64, 4096, 128]"
  del t6
  t319 = torch.reshape(t8, (1, 8, 8, 4096, 128))  # t319: "cuda:0 bf16[1, 8, 8, 4096, 128]"
    # t319 = ltorch.reshape(t8, (1, 8, 8, 4096, 128))  # t319: "cuda:0 bf16[1, 8, 8, 4096, 128]"
      # t319 = prims.reshape(t8, (1, 8, 8, 4096, 128))  # t319: "cuda:0 bf16[1, 8, 8, 4096, 128]"
  del t8
  [t353] = nvFusion0(i15, t222, t226, t31, t319, t36)
    # t33 = prims.convert_element_type(t31, dtypes.float32)  # t33: "cuda:0 f32[1, 64, 4096, 128]"
    # t38 = prims.convert_element_type(t36, dtypes.float32)  # t38: "cuda:0 f32[1, 64, 4096, 128]"
    # t223 = prims.full([1, 64, 4096, 0], 0, device=devices.Device("cuda:0"), dtype=dtypes.bfloat16)  # t223: "cuda:0 bf16[1, 64, 4096, 0]"
    # t224 = prims.pad(t223, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 128, 0)))  # t224: "cuda:0 bf16[1, 64, 4096, 128]"
    # t229 = prims.convert_element_type(t222, dtypes.float32)  # t229: "cuda:0 f32[1, 64, 4096, 128]"
    # t233 = prims.mul(t38, t229)  # t233: "cuda:0 f32[1, 64, 4096, 128]"
    # t236 = prims.convert_element_type(t233, dtypes.bfloat16)  # t236: "cuda:0 bf16[1, 64, 4096, 128]"
    # t241 = prims.mul(t33, t229)  # t241: "cuda:0 f32[1, 64, 4096, 128]"
    # t248 = prims.convert_element_type(t224, dtypes.float32)  # t248: "cuda:0 f32[1, 64, 4096, 128]"
    # t250 = prims.add(t248, t241)  # t250: "cuda:0 f32[1, 64, 4096, 128]"
    # t253 = prims.slice_prim(t236, [0, 0, 0, 0], [1, 64, 4096, 64], [1, 1, 1, 1])  # t253: "cuda:0 bf16[1, 64, 4096, 64]"
    # t254 = prims.slice_prim(t236, [0, 0, 0, 64], [1, 64, 4096, 128], [1, 1, 1, 1])  # t254: "cuda:0 bf16[1, 64, 4096, 64]"
    # t255 = prims.convert_element_type(t253, dtypes.float32)  # t255: "cuda:0 f32[1, 64, 4096, 64]"
    # t256 = prims.neg(t255)  # t256: "cuda:0 f32[1, 64, 4096, 64]"
    # t257 = prims.convert_element_type(t256, dtypes.bfloat16)  # t257: "cuda:0 bf16[1, 64, 4096, 64]"
    # t258 = prims.pad(t257, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (64, 0, 0)))  # t258: "cuda:0 bf16[1, 64, 4096, 128]"
    # t260 = prims.convert_element_type(t258, dtypes.float32)  # t260: "cuda:0 f32[1, 64, 4096, 128]"
    # t261 = prims.add(t250, t260)  # t261: "cuda:0 f32[1, 64, 4096, 128]"
    # t263 = prims.pad(t254, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 64, 0)))  # t263: "cuda:0 bf16[1, 64, 4096, 128]"
    # t265 = prims.convert_element_type(t263, dtypes.float32)  # t265: "cuda:0 f32[1, 64, 4096, 128]"
    # t266 = prims.add(t261, t265)  # t266: "cuda:0 f32[1, 64, 4096, 128]"
    # t267 = prims.convert_element_type(t266, dtypes.bfloat16)  # t267: "cuda:0 bf16[1, 64, 4096, 128]"
    # t268 = prims.convert_element_type(t226, dtypes.float32)  # t268: "cuda:0 f32[1, 64, 4096, 128]"
    # t272 = prims.mul(t38, t268)  # t272: "cuda:0 f32[1, 64, 4096, 128]"
    # t275 = prims.convert_element_type(t272, dtypes.bfloat16)  # t275: "cuda:0 bf16[1, 64, 4096, 128]"
    # t284 = prims.mul(t33, t268)  # t284: "cuda:0 f32[1, 64, 4096, 128]"
    # t293 = prims.add(t248, t284)  # t293: "cuda:0 f32[1, 64, 4096, 128]"
    # t300 = prims.slice_prim(t275, [0, 0, 0, 0], [1, 64, 4096, 64], [1, 1, 1, 1])  # t300: "cuda:0 bf16[1, 64, 4096, 64]"
    # t301 = prims.slice_prim(t275, [0, 0, 0, 64], [1, 64, 4096, 128], [1, 1, 1, 1])  # t301: "cuda:0 bf16[1, 64, 4096, 64]"
    # t302 = prims.convert_element_type(t300, dtypes.float32)  # t302: "cuda:0 f32[1, 64, 4096, 64]"
    # t303 = prims.neg(t302)  # t303: "cuda:0 f32[1, 64, 4096, 64]"
    # t304 = prims.convert_element_type(t303, dtypes.bfloat16)  # t304: "cuda:0 bf16[1, 64, 4096, 64]"
    # t305 = prims.pad(t304, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (64, 0, 0)))  # t305: "cuda:0 bf16[1, 64, 4096, 128]"
    # t307 = prims.convert_element_type(t305, dtypes.float32)  # t307: "cuda:0 f32[1, 64, 4096, 128]"
    # t308 = prims.add(t293, t307)  # t308: "cuda:0 f32[1, 64, 4096, 128]"
    # t310 = prims.pad(t301, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 64, 0)))  # t310: "cuda:0 bf16[1, 64, 4096, 128]"
    # t312 = prims.convert_element_type(t310, dtypes.float32)  # t312: "cuda:0 f32[1, 64, 4096, 128]"
    # t313 = prims.add(t308, t312)  # t313: "cuda:0 f32[1, 64, 4096, 128]"
    # t314 = prims.convert_element_type(t313, dtypes.bfloat16)  # t314: "cuda:0 bf16[1, 64, 4096, 128]"
    # t324 = prims.reshape(t267, (1, 8, 8, 4096, 128))  # t324: "cuda:0 bf16[1, 8, 8, 4096, 128]"
    # t329 = prims.reshape(t314, (1, 8, 8, 4096, 128))  # t329: "cuda:0 bf16[1, 8, 8, 4096, 128]"
    # t335 = prims.convert_element_type(t319, dtypes.float32)  # t335: "cuda:0 f32[1, 8, 8, 4096, 128]"
    # t336 = prims.sum(t335, (0, 2))  # t336: "cuda:0 f32[8, 4096, 128]"
    # t337 = prims.convert_element_type(t336, dtypes.bfloat16)  # t337: "cuda:0 bf16[8, 4096, 128]"
    # t338 = prims.broadcast_in_dim(t337, [1, 8, 1, 4096, 128], [1, 3, 4])  # t338: "cuda:0 bf16[1, 8, 1, 4096, 128]"
    # t344 = prims.convert_element_type(t324, dtypes.float32)  # t344: "cuda:0 f32[1, 8, 8, 4096, 128]"
    # t345 = prims.sum(t344, (0, 2))  # t345: "cuda:0 f32[8, 4096, 128]"
    # t346 = prims.convert_element_type(t345, dtypes.bfloat16)  # t346: "cuda:0 bf16[8, 4096, 128]"
    # t347 = prims.broadcast_in_dim(t346, [1, 8, 1, 4096, 128], [1, 3, 4])  # t347: "cuda:0 bf16[1, 8, 1, 4096, 128]"
    # t353 = prims.cat((t329, t347, t338), i15)  # t353: "cuda:0 bf16[1, 8, 10, 4096, 128]"
  del i15, t222, t226, t31, t319, t36
  t359 = torch.permute(t353, (0, 3, 1, 2, 4))  # t359: "cuda:0 bf16[1, 4096, 8, 10, 128]"
    # t359 = ltorch.permute(t353, (0, 3, 1, 2, 4))  # t359: "cuda:0 bf16[1, 4096, 8, 10, 128]"
      # t359 = prims.transpose(t353, (0, 3, 1, 2, 4))  # t359: "cuda:0 bf16[1, 4096, 8, 10, 128]"
  del t353
  t365 = torch.reshape(t359, (1, 4096, 10240))  # t365: "cuda:0 bf16[1, 4096, 10240]"
    # t365 = ltorch.reshape(t359, (1, 4096, 10240))  # t365: "cuda:0 bf16[1, 4096, 10240]"
      # t365 = prims.reshape(t359, (1, 4096, 10240))  # t365: "cuda:0 bf16[1, 4096, 10240]"
  del t359
  return (t365, None, None)

without bookend

import torch
from thunder.executors.torchex import no_autocast

@torch.no_grad()
@no_autocast
def backward_fn(saved_for_backward, cotangents):
  # saved_for_backward: "Collection"
  # cotangents: "Collection"
  C0, C1, = saved_for_backward
  clear_mutable_collection(saved_for_backward)
  del saved_for_backward
  t6, t7, t8, = cotangents
  clear_mutable_collection(cotangents)
  del cotangents
  t31, t36, = C0
  clear_mutable_collection(C0)
  del C0
  i15, = C1
  clear_mutable_collection(C1)
  del C1
  t222 = torch_slice_prim_impl(t7, [0, 0, 0, 0], [1, 64, 4096, 128], [1, 1, 1, 1])  # t222: "cuda:0 bf16[1, 64, 4096, 128]"
  del t7
  t226 = torch_slice_prim_impl(t6, [0, 0, 0, 0], [1, 64, 4096, 128], [1, 1, 1, 1])  # t226: "cuda:0 bf16[1, 64, 4096, 128]"
  del t6
  t319 = torch.reshape(t8, (1, 8, 8, 4096, 128))  # t319: "cuda:0 bf16[1, 8, 8, 4096, 128]"
    # t319 = ltorch.reshape(t8, (1, 8, 8, 4096, 128))  # t319: "cuda:0 bf16[1, 8, 8, 4096, 128]"
      # t319 = prims.reshape(t8, (1, 8, 8, 4096, 128))  # t319: "cuda:0 bf16[1, 8, 8, 4096, 128]"
  del t8
  [t353] = nvFusion0(i15, t222, t226, t31, t319, t36)
    # t33 = prims.convert_element_type(t31, dtypes.float32)  # t33: "cuda:0 f32[1, 64, 4096, 128]"
    # t38 = prims.convert_element_type(t36, dtypes.float32)  # t38: "cuda:0 f32[1, 64, 4096, 128]"
    # t223 = prims.full([1, 64, 4096, 0], 0, device=devices.Device("cuda:0"), dtype=dtypes.bfloat16)  # t223: "cuda:0 bf16[1, 64, 4096, 0]"
    # t224 = prims.pad(t223, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 128, 0)))  # t224: "cuda:0 bf16[1, 64, 4096, 128]"
    # t229 = prims.convert_element_type(t222, dtypes.float32)  # t229: "cuda:0 f32[1, 64, 4096, 128]"
    # t233 = prims.mul(t38, t229)  # t233: "cuda:0 f32[1, 64, 4096, 128]"
    # t236 = prims.convert_element_type(t233, dtypes.bfloat16)  # t236: "cuda:0 bf16[1, 64, 4096, 128]"
    # t241 = prims.mul(t33, t229)  # t241: "cuda:0 f32[1, 64, 4096, 128]"
    # t248 = prims.convert_element_type(t224, dtypes.float32)  # t248: "cuda:0 f32[1, 64, 4096, 128]"
    # t250 = prims.add(t248, t241)  # t250: "cuda:0 f32[1, 64, 4096, 128]"
    # t253 = prims.slice_prim(t236, [0, 0, 0, 0], [1, 64, 4096, 64], [1, 1, 1, 1])  # t253: "cuda:0 bf16[1, 64, 4096, 64]"
    # t254 = prims.slice_prim(t236, [0, 0, 0, 64], [1, 64, 4096, 128], [1, 1, 1, 1])  # t254: "cuda:0 bf16[1, 64, 4096, 64]"
    # t255 = prims.convert_element_type(t253, dtypes.float32)  # t255: "cuda:0 f32[1, 64, 4096, 64]"
    # t256 = prims.neg(t255)  # t256: "cuda:0 f32[1, 64, 4096, 64]"
    # t257 = prims.convert_element_type(t256, dtypes.bfloat16)  # t257: "cuda:0 bf16[1, 64, 4096, 64]"
    # t258 = prims.pad(t257, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (64, 0, 0)))  # t258: "cuda:0 bf16[1, 64, 4096, 128]"
    # t260 = prims.convert_element_type(t258, dtypes.float32)  # t260: "cuda:0 f32[1, 64, 4096, 128]"
    # t261 = prims.add(t250, t260)  # t261: "cuda:0 f32[1, 64, 4096, 128]"
    # t263 = prims.pad(t254, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 64, 0)))  # t263: "cuda:0 bf16[1, 64, 4096, 128]"
    # t265 = prims.convert_element_type(t263, dtypes.float32)  # t265: "cuda:0 f32[1, 64, 4096, 128]"
    # t266 = prims.add(t261, t265)  # t266: "cuda:0 f32[1, 64, 4096, 128]"
    # t267 = prims.convert_element_type(t266, dtypes.bfloat16)  # t267: "cuda:0 bf16[1, 64, 4096, 128]"
    # t268 = prims.convert_element_type(t226, dtypes.float32)  # t268: "cuda:0 f32[1, 64, 4096, 128]"
    # t272 = prims.mul(t38, t268)  # t272: "cuda:0 f32[1, 64, 4096, 128]"
    # t275 = prims.convert_element_type(t272, dtypes.bfloat16)  # t275: "cuda:0 bf16[1, 64, 4096, 128]"
    # t284 = prims.mul(t33, t268)  # t284: "cuda:0 f32[1, 64, 4096, 128]"
    # t293 = prims.add(t248, t284)  # t293: "cuda:0 f32[1, 64, 4096, 128]"
    # t300 = prims.slice_prim(t275, [0, 0, 0, 0], [1, 64, 4096, 64], [1, 1, 1, 1])  # t300: "cuda:0 bf16[1, 64, 4096, 64]"
    # t301 = prims.slice_prim(t275, [0, 0, 0, 64], [1, 64, 4096, 128], [1, 1, 1, 1])  # t301: "cuda:0 bf16[1, 64, 4096, 64]"
    # t302 = prims.convert_element_type(t300, dtypes.float32)  # t302: "cuda:0 f32[1, 64, 4096, 64]"
    # t303 = prims.neg(t302)  # t303: "cuda:0 f32[1, 64, 4096, 64]"
    # t304 = prims.convert_element_type(t303, dtypes.bfloat16)  # t304: "cuda:0 bf16[1, 64, 4096, 64]"
    # t305 = prims.pad(t304, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (64, 0, 0)))  # t305: "cuda:0 bf16[1, 64, 4096, 128]"
    # t307 = prims.convert_element_type(t305, dtypes.float32)  # t307: "cuda:0 f32[1, 64, 4096, 128]"
    # t308 = prims.add(t293, t307)  # t308: "cuda:0 f32[1, 64, 4096, 128]"
    # t310 = prims.pad(t301, 0.0, ((0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 64, 0)))  # t310: "cuda:0 bf16[1, 64, 4096, 128]"
    # t312 = prims.convert_element_type(t310, dtypes.float32)  # t312: "cuda:0 f32[1, 64, 4096, 128]"
    # t313 = prims.add(t308, t312)  # t313: "cuda:0 f32[1, 64, 4096, 128]"
    # t314 = prims.convert_element_type(t313, dtypes.bfloat16)  # t314: "cuda:0 bf16[1, 64, 4096, 128]"
    # t324 = prims.reshape(t267, (1, 8, 8, 4096, 128))  # t324: "cuda:0 bf16[1, 8, 8, 4096, 128]"
    # t329 = prims.reshape(t314, (1, 8, 8, 4096, 128))  # t329: "cuda:0 bf16[1, 8, 8, 4096, 128]"
    # t335 = prims.convert_element_type(t319, dtypes.float32)  # t335: "cuda:0 f32[1, 8, 8, 4096, 128]"
    # t336 = prims.sum(t335, (0, 2))  # t336: "cuda:0 f32[8, 4096, 128]"
    # t337 = prims.convert_element_type(t336, dtypes.bfloat16)  # t337: "cuda:0 bf16[8, 4096, 128]"
    # t338 = prims.broadcast_in_dim(t337, [1, 8, 1, 4096, 128], [1, 3, 4])  # t338: "cuda:0 bf16[1, 8, 1, 4096, 128]"
    # t344 = prims.convert_element_type(t324, dtypes.float32)  # t344: "cuda:0 f32[1, 8, 8, 4096, 128]"
    # t345 = prims.sum(t344, (0, 2))  # t345: "cuda:0 f32[8, 4096, 128]"
    # t346 = prims.convert_element_type(t345, dtypes.bfloat16)  # t346: "cuda:0 bf16[8, 4096, 128]"
    # t347 = prims.broadcast_in_dim(t346, [1, 8, 1, 4096, 128], [1, 3, 4])  # t347: "cuda:0 bf16[1, 8, 1, 4096, 128]"
    # t353 = prims.cat((t329, t347, t338), i15)  # t353: "cuda:0 bf16[1, 8, 10, 4096, 128]"
  del i15, t222, t226, t31, t319, t36
  t359 = torch.permute(t353, (0, 3, 1, 2, 4))  # t359: "cuda:0 bf16[1, 4096, 8, 10, 128]"
    # t359 = ltorch.permute(t353, (0, 3, 1, 2, 4))  # t359: "cuda:0 bf16[1, 4096, 8, 10, 128]"
      # t359 = prims.transpose(t353, (0, 3, 1, 2, 4))  # t359: "cuda:0 bf16[1, 4096, 8, 10, 128]"
  del t353
  t365 = torch.reshape(t359, (1, 4096, 10240))  # t365: "cuda:0 bf16[1, 4096, 10240]"
    # t365 = ltorch.reshape(t359, (1, 4096, 10240))  # t365: "cuda:0 bf16[1, 4096, 10240]"
      # t365 = prims.reshape(t359, (1, 4096, 10240))  # t365: "cuda:0 bf16[1, 4096, 10240]"
  del t359
  return (t365, None, None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment