-
-
Save yuxuanzhuang/4918cd1b5d8d62de79eab9df40de4bb7 to your computer and use it in GitHub Desktop.
It seems to be a generic effect of the numpy
parallel code (e.g. np.dot
). I did a test without MDAnalysis
and the code after np.dot
(regardless it is numpy
related or not) tends to slow down if numpy
is subscribing multiple threads.
https://gist.github.com/yuxuanzhuang/82e1e7b57d0cda80ac964d1cd138f618
This is just weird.
I suppose the conclusion is still that oversubscribing hurts performance.
I can confirm it should be an OpenBlas
issue. For numpy
built with mkl
, the default num_of_thread will be 6 (the physical core number), so the oversubscribing won't be happening.
[{'filepath': '/home/scottzhuang/anaconda3/lib/libmkl_rt.so',
'prefix': 'libmkl_rt',
'user_api': 'blas',
'internal_api': 'mkl',
'version': '2020.0.0',
'num_threads': 6,
'threading_layer': 'intel'},
{'filepath': '/home/scottzhuang/anaconda3/lib/libiomp5.so',
'prefix': 'libiomp',
'user_api': 'openmp',
'internal_api': 'openmp',
'version': None,
'num_threads': 12}]
The problem now is, OpenBlas
always uses all hyperthreading in default, which the numerical code doesn't seem to benefit from (?), and also tend to cause the troubles mentioned above. I guess we should limit threads to the number of physical cores for all the MDAnalysis
code...before OpenBlas
changes its default.
Can you share your conclusions in the MDA issue MDAnalysis/mdanalysis#2975 for discussion so that other interested parties can also comment?
Without transformations, n_threads and oversubscription do not affect
ts = self._read_next_timestep()
, with ~5260 µs.With transformation, at n=1 and n=6
frame = self._xdr.read()
is ~5000 µs or 6000 µs (comparable to the above) but when oversubscribed rises to 8279.6 µs. That's odd.