Skip to content

Instantly share code, notes, and snippets.

@orbeckst
Last active March 13, 2017 11:57
Show Gist options
  • Save orbeckst/a1f9dcfa4e2d047654a15b07dd78b6e4 to your computer and use it in GitHub Desktop.
Save orbeckst/a1f9dcfa4e2d047654a15b07dd78b6e4 to your computer and use it in GitHub Desktop.
Parallel analysis with MDAnalysis and dask
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mrocklin
Copy link

For performance, future readers might want to try using different schedulers that work with multiple processes. Perhaps the distributed scheduler.

http://dask.pydata.org/en/latest/scheduler-choice.html
http://distributed.readthedocs.io/en/latest/

@orbeckst
Copy link
Author

orbeckst commented Mar 12, 2017

Hi @mrocklin, we are still very interested in dask + MDAnalysis. A student, @mkhoshle , has recently been working with it and did some benchmarking, see

She found pretty good scaling on a single node using dask.multiprocessing.get but performance problems with distributed.Client.get and I think she was about to get in touch at some point. The report is pretty long and a lot in it relates to the difficulty of ingesting data (different file formats that are in use in our field were benchmarked) but towards the end there's data on distributed.

@mkhoshle is also on https://gitter.im/dask/dask (I think) so you can certainly also have a conversation there.

Oliver

@mrocklin
Copy link

Relevant conversation on blaze-dev mailing list (now defunct)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment