As per discussion on Reddit,
it seems a workaround for the Intel MKL's notorious SIMD throttling of AMD Zen CPUs is as simple a setting MKL_DEBUG_CPU_TYPE=5
environment variable. Intel removed the debug mode starting with MKL 2020.1 or newer. Although MKL 2020.1 and following appear to have improved performance by default on AMD to some extend.
For AMD Zen CPU, it is recommended to get MKL to speed-boost NumPy, SciPy, scikit-learn, and NumExpr even without using debug mode. You can have more speed boost using MKL_DEBUG_CPU_TYPE=5
, but you need to downgrade the MKL to version 2020.0 or below.
For TensorFlow using AMD CPU, better to install origin version using pip install tensorflow
rather than tensorflow-mkl.
All three scripts are executed in the same Python 3.8 environment on a AMD Ryzen™ 7 5800X CPU.
Library | OpenBLAS | MKL2020.2 | MKL2020.0 | MKL with Flag |
---|---|---|---|---|
NumPy | 0.55s | 0.54s | 0.54s | 0.49s |
PyTorch | N/A | 0.68s | 0.62s | 0.60s |
TensorFlow | 0.18s | 0.17s | 0.17s | 0.17s |
Library | OpenBLAS | MKL2020.2 | MKL2020.0 | MKL with Flag |
---|---|---|---|---|
NumPy | 0.06ms | 0.03ms | 0.03ms | 0.03ms |
PyTorch | N/A | 0.02ms | 0.02ms | 0.02ms |
TensorFlow | 0.41ms | 0.91ms | 0.91ms | 0.90ms |
Library | OpenBLAS | MKL2020.2 | MKL2020.0 | MKL with Flag |
---|---|---|---|---|
NumPy | 0.71s | 0.32s | 0.32s | 0.25s |
PyTorch | N/A | 0.31s | 0.30s | 0.30s |
TensorFlow | 0.49s | 0.83s | 0.83s | 0.83s |
Library | OpenBLAS | MKL2020.2 | MKL2020.0 | MKL with Flag |
---|---|---|---|---|
NumPy | 0.08s | 0.07s | 0.07s | 0.07s |
PyTorch | N/A | 0.04s | 0.04s | 0.04s |
TensorFlow | 0.12s | 0.19s | 0.20s | 0.20s |
Library | OpenBLAS | MKL2020.2 | MKL2020.0 | MKL with Flag |
---|---|---|---|---|
NumPy | 3.29s | 3.09s | 3.07s | 2.58s |
PyTorch | N/A | 1.16s | 1.12s | 1.14s |
TensorFlow | 3.69s | 4.70s | 4.66s | 4.66s |
Note: TensorFlow might be handling eigendecomposition slightly differently than Numpy and PyTorch, hence the disrepancy.
MKL Environment is setup through Anaconda with the following commands:
$ conda create -n py38mkl python=3.8 && conda activate py38mkl
$ conda install numpy "blas=*=mkl"
$ conda install -c pytorch pytorch
$ conda install -c anaconda tensorflow-mkl
OpenBLAS environment is setup through Anaconda with the following commands:
$ conda create -n py38nomkl python=3.8 && conda activate py38nomkl
$ conda install nomkl
$ conda install numpy "blas=*=openblas"
$ pip install tensorflow
The NumPy
benchmark created by Markus Beuckelmann,
adapted for PyTorch
by /u/une-transaction,
adapted for TensorFlow
by Roman Ring,
and modified by me.