As per discussion on Reddit,
it seems a workaround for the Intel MKL's notorious SIMD throttling of AMD Zen CPUs is as simple a setting MKL_DEBUG_CPU_TYPE=5
environment variable.
All three scripts are executed in the same Python 3.7 environment on a first-gen AMD Zen CPU (1950x).
The difference will be even bigger on newer models as first-gen Zen resolves 256-bit AVX2 in two 128-bit instructions.
Library | OpenBLAS | MKL Default | MKL With Flag |
---|---|---|---|
NumPy | 0.58s | 1.00s | 0.56s |
PyTorch | N/A | 0.48s | 0.26s |
TensorFlow | 0.22s | 0.47s | 0.20s |
Library | OpenBLAS | MKL Default | MKL With Flag |
---|---|---|---|
NumPy | 11.82s | 7.54s | 6.67s |
PyTorch | N/A | 2.25s | 2.06s |
TensorFlow | 8.61s | 6.51s | 6.73s |
Note: TensorFlow might be handling eigendecomposition slightly differently than Numpy and PyTorch, hence the disrepancy.
Full results: NumPy, PyTorch, and TensorFlow.
MKL Environment is setup through Anaconda with the following commands:
$ conda create -n py37mkl python=3.7 && conda activate py37mkl
$ conda install numpy "blas=*=mkl"
$ conda install -c pytorch pytorch
$ conda install -c anaconda tensorflow-mkl
OpenBLAS environment is setup through Anaconda with the following commands:
$ conda create -n py37nomkl python=3.7 && conda activate py37nomkl
$ conda install nomkl
$ conda install numpy "blas=*=openblas"
$ pip install tensorflow
The NumPy
benchmark created by Markus Beuckelmann,
adapted for PyTorch
by /u/une-transaction
and adapted for TensorFlow
by me.
Thanks for sharing, @Miffyli!