Last active
March 13, 2017 11:57
-
-
Save orbeckst/a1f9dcfa4e2d047654a15b07dd78b6e4 to your computer and use it in GitHub Desktop.
Parallel analysis with MDAnalysis and dask
Hi @mrocklin, we are still very interested in dask + MDAnalysis. A student, @mkhoshle , has recently been working with it and did some benchmarking, see
- mdnalysis-developers post: Draft of parallel MDAnalysis benchmark report
- Khoshlessan, Mahzad; Beckstein, Oliver (2017): Parallel analysis in the MDAnalysis Library: Benchmark of Trajectory File Formats. figshare. https://doi.org/10.6084/m9.figshare.4695742
She found pretty good scaling on a single node using dask.multiprocessing.get
but performance problems with distributed.Client.get
and I think she was about to get in touch at some point. The report is pretty long and a lot in it relates to the difficulty of ingesting data (different file formats that are in use in our field were benchmarked) but towards the end there's data on distributed.
@mkhoshle is also on https://gitter.im/dask/dask (I think) so you can certainly also have a conversation there.
Oliver
Relevant conversation on blaze-dev mailing list (now defunct)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For performance, future readers might want to try using different schedulers that work with multiple processes. Perhaps the distributed scheduler.
http://dask.pydata.org/en/latest/scheduler-choice.html
http://distributed.readthedocs.io/en/latest/