Skip to content

Instantly share code, notes, and snippets.

@nicolasvasilache
Created December 2, 2021 10:38
Show Gist options
  • Save nicolasvasilache/9a526e6af1aae841a4f97d49f9b37db1 to your computer and use it in GitHub Desktop.
Save nicolasvasilache/9a526e6af1aae841a4f97d49f9b37db1 to your computer and use it in GitHub Desktop.
Matmul perf
export MLIR_RUNNER_UTILS_LIB=${IREE_LLVM_SANDBOX_BUILD_DIR}/lib/libmlir_runner_utils.so; cd ${IREE_LLVM_SANDBOX_SOURCE_DIR}; python -m python.examples.matmul.bench
###############################################################
Compile-time problem size {'m': 192, 'n': 128, 'k': 256}
Runtime problem size {'m': 192, 'n': 128, 'k': 256}
Problem types [<class 'numpy.float32'>, <class 'numpy.float32'>, <class 'numpy.float32'>]
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10100>
compilation in 0.144s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
1.9e-04 1.9e-04 1.6e-04 1.4e-04 1.3e-04 1.2e-04 1.2e-04 1.2e-04 1.2e-04 seconds
65.15 65.15 79.23 88.43 95.46 104.71 105.11 105.11 105.11 GFlops/s
2.71 2.71 3.30 3.68 3.98 4.36 4.38 4.38 4.38 GBs/s
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10160>
compilation in 0.213s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
2.1e-04 2.1e-04 1.3e-04 1.1e-04 1.0e-04 1.0e-04 1.0e-04 1.0e-04 1.0e-04 seconds
59.29 59.29 100.07 114.63 122.47 123.11 123.12 123.12 123.12 GFlops/s
2.47 2.47 4.17 4.78 5.10 5.13 5.13 5.13 5.13 GBs/s
###############################################################
Compile-time problem size {'m': 260, 'n': 280, 'k': 300}
Runtime problem size {'m': 260, 'n': 280, 'k': 300}
Problem types [<class 'numpy.float32'>, <class 'numpy.float32'>, <class 'numpy.float32'>]
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10100>
compilation in 0.203s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
4.8e-04 4.8e-04 4.8e-04 4.7e-04 4.2e-04 3.9e-04 3.5e-04 3.5e-04 3.5e-04 seconds
91.67 91.67 91.72 92.76 104.65 112.30 126.27 126.27 126.27 GFlops/s
2.58 2.58 2.58 2.61 2.95 3.16 3.56 3.56 3.56 GBs/s
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10160>
compilation in 0.2786s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
5.4e-04 5.4e-04 5.3e-04 5.2e-04 5.1e-04 4.7e-04 4.6e-04 4.6e-04 4.6e-04 seconds
80.67 80.67 82.39 83.64 86.05 93.04 95.97 95.97 95.97 GFlops/s
2.27 2.27 2.32 2.36 2.42 2.62 2.70 2.70 2.70 GBs/s
###############################################################
Compile-time problem size {'m': 1000, 'n': 1000, 'k': 1000}
Runtime problem size {'m': 1000, 'n': 1000, 'k': 1000}
Problem types [<class 'numpy.float32'>, <class 'numpy.float32'>, <class 'numpy.float32'>]
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10100>
compilation in 0.2089s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
2.4e-02 2.4e-02 2.4e-02 2.4e-02 2.3e-02 2.2e-02 2.2e-02 2.2e-02 2.2e-02 seconds
82.34 82.34 82.60 84.03 88.17 89.31 91.18 91.18 91.18 GFlops/s
0.66 0.66 0.66 0.67 0.71 0.71 0.73 0.73 0.73 GBs/s
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10160>
compilation in 0.3373s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
1.9e-02 1.9e-02 1.9e-02 1.9e-02 1.9e-02 1.8e-02 1.8e-02 1.8e-02 1.8e-02 seconds
103.13 103.13 103.59 104.38 107.42 108.87 109.76 109.76 109.76 GFlops/s
0.83 0.83 0.83 0.84 0.86 0.87 0.88 0.88 0.88 GBs/s
###############################################################
Compile-time problem size {'m': 1024, 'n': 1024, 'k': 1024}
Runtime problem size {'m': 1024, 'n': 1024, 'k': 1024}
Problem types [<class 'numpy.float32'>, <class 'numpy.float32'>, <class 'numpy.float32'>]
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10100>
compilation in 0.1843s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
2.5e-02 2.5e-02 2.4e-02 2.4e-02 2.3e-02 2.3e-02 2.1e-02 2.1e-02 2.1e-02 seconds
86.10 86.10 89.21 90.36 92.79 94.16 99.98 99.98 99.98 GFlops/s
0.67 0.67 0.70 0.71 0.72 0.74 0.78 0.78 0.78 GBs/s
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10160>
compilation in 0.324s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
2.2e-02 2.2e-02 2.1e-02 2.0e-02 1.9e-02 1.9e-02 1.8e-02 1.8e-02 1.8e-02 seconds
97.04 97.04 102.23 109.05 113.86 114.55 117.45 117.45 117.45 GFlops/s
0.76 0.76 0.80 0.85 0.89 0.89 0.92 0.92 0.92 GBs/s
###############################################################
Compile-time problem size {'m': 2040, 'n': 2040, 'k': 2040}
Runtime problem size {'m': 2040, 'n': 2040, 'k': 2040}
Problem types [<class 'numpy.float32'>, <class 'numpy.float32'>, <class 'numpy.float32'>]
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10100>
compilation in 0.1959s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
2.2e-01 2.2e-01 2.1e-01 2.1e-01 2.1e-01 2.1e-01 2.0e-01 2.0e-01 2.0e-01 seconds
78.97 78.97 79.40 79.43 81.41 82.31 84.93 84.93 84.93 GFlops/s
0.31 0.31 0.31 0.31 0.32 0.32 0.33 0.33 0.33 GBs/s
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10160>
compilation in 0.3441s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
1.6e-01 1.6e-01 1.6e-01 1.6e-01 1.5e-01 1.5e-01 1.5e-01 1.5e-01 1.5e-01 seconds
106.83 106.83 107.90 108.43 110.90 112.23 113.04 113.04 113.04 GFlops/s
0.42 0.42 0.42 0.43 0.43 0.44 0.44 0.44 0.44 GBs/s
###############################################################
Compile-time problem size {'m': 2040, 'n': 2041, 'k': 2042}
Runtime problem size {'m': 2040, 'n': 2041, 'k': 2042}
Problem types [<class 'numpy.float32'>, <class 'numpy.float32'>, <class 'numpy.float32'>]
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10100>
compilation in 0.213s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
2.5e-01 2.5e-01 2.3e-01 2.3e-01 2.3e-01 2.3e-01 2.3e-01 2.3e-01 2.3e-01 seconds
69.37 69.37 73.15 73.21 74.55 74.80 75.20 75.20 75.20 GFlops/s
0.27 0.27 0.29 0.29 0.29 0.29 0.29 0.29 0.29 GBs/s
Compilation expert <python.examples.core.transform.TransformationList object at 0x7f8724e10160>
compilation in 0.3449s
xxxxxxxxxx : 10 iters time on 1 threads
------------------------------------------------------------------------------------------------------------------------
slowest p1 p10 p25 p50 p75 p90 p99 fastest unit
------------------------------------------------------------------------------------------------------------------------
1.6e-01 1.6e-01 1.6e-01 1.6e-01 1.6e-01 1.5e-01 1.5e-01 1.5e-01 1.5e-01 seconds
104.21 104.21 105.87 106.12 109.26 110.66 111.86 111.86 111.86 GFlops/s
0.41 0.41 0.41 0.42 0.43 0.43 0.44 0.44 0.44 GBs/s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment