Skip to content

Instantly share code, notes, and snippets.

@dotsdl
Last active April 9, 2016 01:01
Show Gist options
  • Save dotsdl/456b35be436299f55a42 to your computer and use it in GitHub Desktop.
Save dotsdl/456b35be436299f55a42 to your computer and use it in GitHub Desktop.
MDSynthesis+distributed+nglview: a lab demo
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MDSynthesis + distributed + nglview\n",
"\n",
"> A match made in the seventh circle.\n",
" \n",
" ~ Dante"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Today I'm going to show off some of the most useful tools I've been playing/working with in recent months. We'll use MDSynthesis for gathering up existing datasets, but that library isn't necessary to use either of the other tools: **``distributed``** and **``nglview``**."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import mdsynthesis as mds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we gather up a Bundle of wild DIMS simulations (NapA, inward-to-outward transitions). Note that this will only work on our infrastructure but should work even if for you the Sims are read-only:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"b = mds.discover('/nfs/homes4/dldotson/Projects/Transporters/NapA/SYSTEMS/DIMS/dims/S1/in2out/fitted/')\n",
"b = b.flatten()\n",
"b = mds.Bundle(sorted(b))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<Bundle([<Sim: 'NapA_S1_i2o_10'>, <Sim: 'NapA_S1_i2o_11'>, <Sim: 'NapA_S1_i2o_12'>, <Sim: 'NapA_S1_i2o_13'>, <Sim: 'NapA_S1_i2o_14'>, <Sim: 'NapA_S1_i2o_15'>, <Sim: 'NapA_S1_i2o_16'>, <Sim: 'NapA_S1_i2o_17'>, <Sim: 'NapA_S1_i2o_18'>, <Sim: 'NapA_S1_i2o_19'>, <Sim: 'NapA_S1_i2o_1'>, <Sim: 'NapA_S1_i2o_20'>, <Sim: 'NapA_S1_i2o_21'>, <Sim: 'NapA_S1_i2o_22'>, <Sim: 'NapA_S1_i2o_23'>, <Sim: 'NapA_S1_i2o_2'>, <Sim: 'NapA_S1_i2o_24'>, <Sim: 'NapA_S1_i2o_25'>, <Sim: 'NapA_S1_i2o_26'>, <Sim: 'NapA_S1_i2o_27'>, <Sim: 'NapA_S1_i2o_28'>, <Sim: 'NapA_S1_i2o_29'>, <Sim: 'NapA_S1_i2o_30'>, <Sim: 'NapA_S1_i2o_31'>, <Sim: 'NapA_S1_i2o_32'>, <Sim: 'NapA_S1_i2o_33'>, <Sim: 'NapA_S1_i2o_34'>, <Sim: 'NapA_S1_i2o_35'>, <Sim: 'NapA_S1_i2o_36'>, <Sim: 'NapA_S1_i2o_3'>, <Sim: 'NapA_S1_i2o_37'>, <Sim: 'NapA_S1_i2o_38'>, <Sim: 'NapA_S1_i2o_39'>, <Sim: 'NapA_S1_i2o_40'>, <Sim: 'NapA_S1_i2o_41'>, <Sim: 'NapA_S1_i2o_42'>, <Sim: 'NapA_S1_i2o_43'>, <Sim: 'NapA_S1_i2o_44'>, <Sim: 'NapA_S1_i2o_45'>, <Sim: 'NapA_S1_i2o_46'>, <Sim: 'NapA_S1_i2o_47'>, <Sim: 'NapA_S1_i2o_48'>, <Sim: 'NapA_S1_i2o_49'>, <Sim: 'NapA_S1_i2o_4'>, <Sim: 'NapA_S1_i2o_50'>, <Sim: 'NapA_S1_i2o_5'>, <Sim: 'NapA_S1_i2o_6'>, <Sim: 'NapA_S1_i2o_7'>, <Sim: 'NapA_S1_i2o_8'>, <Sim: 'NapA_S1_i2o_9'>])>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"50"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualization of molecular trajectories with ``nglview``"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the ``nglview`` docs:\n",
"\n",
"> An IPython/Jupyter widget to interactively view molecular structures and trajectories. Utilizes the embeddable NGL Viewer for rendering. Support for showing data from the file-system, RCSB PDB, simpletraj and from objects of analysis libraries mdtraj, pytraj, mdanalysis."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since we have a Bundle of DIMS simulations at the ready, we can have a look at some of them. We can make an ``nglview`` widget directly from an MDAnalysis Universe:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import nglview"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"s = b[0]"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<Sim: 'NapA_S1_i2o_10'>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<Universe with 11748 atoms and 11842 bonds>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.universe"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"wg = nglview.show_mdanalysis(s.universe)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"wg"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By feeding in the full universe, we've fed in all the atoms in the system. We probably only care about a subset, though, and we can get better performance if we feed in an AtomGroup of the subset of atoms we actually want to look at:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"wg = nglview.show_mdanalysis(s.universe.select_atoms('protein and name CA'))"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"wg"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can add custom representations, too, using Chimera-style selections. "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"wg.add_representation('spacefill', '.CA')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using ``distributed`` to export work to other machines"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the ``distributed`` docs:\n",
"\n",
"> Distributed is a lightweight library for distributed computing in Python. It extends both the ``concurrent.futures`` and ``dask`` APIs to moderate sized clusters. Distributed provides data-local computation by keeping data on worker nodes, running computations where data lives, and by managing complex data dependencies between tasks."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first step is to set up a scheduler. We can set this up on any machine in the lab, but I'll use my own workstation. On ``chipbacca``, I can do:\n",
"\n",
"```\n",
"dscheduler\n",
"```\n",
"\n",
"and we see that the scheduler server is listening on port 8786.\n",
"We can connect to it with an ``Executor``, which serves as our interface to shipping work to the scheduler's workers:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from distributed import Executor, progress"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"ex = Executor('chipbacca:8786')"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<Executor: scheduler=chipbacca:8786 workers=0 threads=0>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ex"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But for us to get any work done, our scheduler needs some workers. We can log into any machine on our network and run:\n",
"\n",
"```\n",
"dworker --nprocs 8 --nthreads 1 chipbacca:8786\n",
"```\n",
"\n",
"to create 8 workers (processes) with 1 thread each. But since we have a queueing system for distributing work across all our machines, we should be nice and submit our workers as a job:\n",
"\n",
"```\n",
"qsub -b y -pe singlenode 8 -V dworker --nprocs 8 --nthreads 1 chipbacca:8786\n",
"```\n",
"\n",
"and we should see that our scheduler notices its new minions."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<Executor: scheduler=chipbacca:8786 workers=8 threads=8>"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ex"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's werk. We have a Bundle of DIMS simulations, which see a transition of NapA from an inward- to an outward-facing state. Perhaps we should get something easy to start: the z-position of the center of mass with time."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"4.7159480518117007"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.universe.atoms.center_of_mass()[2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need to express this as a function so we can map the work out to our workers."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_z_position(sim, atomsel):\n",
" \"\"\"Get z-component of the center-of-mass for each frame \n",
" from given sim.\n",
" \n",
" Parameters\n",
" ----------\n",
" sim : mdsynthesis.Sim\n",
" Simulation to get z-component of center-of-mass for.\n",
" atomsel : str\n",
" Atom selection string to apply giving the atoms to \n",
" get center-of-mass for.\n",
" \n",
" Returns\n",
" -------\n",
" pandas.Series\n",
" Series giving the z-component of the center-of-mass\n",
" for each time.\n",
" \n",
" \"\"\"\n",
" import numpy as np\n",
" import pandas as pd\n",
" atoms = sim.universe.select_atoms(atomsel)\n",
"\n",
" nframes = sim.universe.trajectory.n_frames\n",
" \n",
" times = np.zeros(nframes)\n",
" z_com = np.zeros(nframes)\n",
" \n",
" for i, ts in enumerate(sim.universe.trajectory):\n",
" times[i] = sim.universe.trajectory.time\n",
" z_com[i] = atoms.center_of_mass()[2]\n",
" \n",
" return pd.Series(z_com, index=pd.Float64Index(times), name='z-com')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's test it out on one simulation:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<Sim: 'NapA_S1_i2o_10' | active universe: 'main'>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.000000 4.715948\n",
"1.000000 4.724737\n",
"2.000000 4.715122\n",
"3.000000 4.669568\n",
"4.000000 4.615996\n",
"5.000000 4.567842\n",
"6.000000 4.522301\n",
"7.000000 4.453215\n",
"8.000000 4.427373\n",
"9.000000 4.411255\n",
"10.000000 4.364684\n",
"11.000000 4.337990\n",
"12.000000 4.301215\n",
"13.000000 4.241995\n",
"14.000000 4.226140\n",
"15.000000 4.186588\n",
"16.000001 4.170983\n",
"17.000001 4.152306\n",
"18.000001 4.146538\n",
"19.000001 4.107781\n",
"20.000001 4.089231\n",
"21.000001 4.088165\n",
"22.000001 4.052312\n",
"23.000001 4.014691\n",
"24.000001 3.990937\n",
"25.000001 3.982774\n",
"26.000001 3.955422\n",
"27.000001 3.941301\n",
"28.000001 3.911046\n",
"29.000001 3.876631\n",
" ... \n",
"470.000015 1.268469\n",
"471.000015 1.256933\n",
"472.000016 1.242232\n",
"473.000016 1.278160\n",
"474.000016 1.232388\n",
"475.000016 1.217521\n",
"476.000016 1.230567\n",
"477.000016 1.208083\n",
"478.000016 1.206554\n",
"479.000016 1.179443\n",
"480.000016 1.152746\n",
"481.000016 1.135147\n",
"482.000016 1.198375\n",
"483.000016 1.154788\n",
"484.000016 1.169218\n",
"485.000016 1.179835\n",
"486.000016 1.173160\n",
"487.000016 1.164912\n",
"488.000016 1.169258\n",
"489.000016 1.192270\n",
"490.000016 1.186147\n",
"491.000016 1.203794\n",
"492.000016 1.205519\n",
"493.000016 1.206804\n",
"494.000016 1.212217\n",
"495.000016 1.168486\n",
"496.000016 1.142478\n",
"497.000016 1.158761\n",
"498.000016 1.121259\n",
"499.000016 1.143703\n",
"Name: z-com, dtype: float64"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_z_position(s, 'protein')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pretty fast! And what does this timeseries look like?"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7fe734030790>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAD/CAYAAAA+LVfjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl4VeW1x/HvUq4oAiIgoqioVaEKKoiKIjVUBQtOOCsW\nrrOiV1uH1lorKPVqW61XuW0dqeJtsVaxWEsRUA7K4FRQQa0DigoKgqCEQYZk3T9WYkIMySE5yT7D\n7/M85+Fk7529V3bC2u9Z+33fbe6OiIjkli2SDkBERDafkreISA5S8hYRyUFK3iIiOUjJW0QkByl5\ni4jkoLSSt5lNNbPnyl7/W2Xd2WY228xeMbOhDROmiIhUZrX18zazZsBf3P34ata1BF4CDgLWA7OA\no9z98waIVUREyqTT8u4E7GFmz5rZM2bWo9K6nsAMd1/t7uuBKcDhDRGoiIhUaJLGNhuAke5+r5l9\nF3jazPZ291KgDfBFpW1XAK0aIE4REakkneQ9193nALj722a2FGgPfAosY+Nk3RpYkvEoRURkI+nU\nvK8HtnL34Wa2E/AssJ+7u5m1AF4GugMGvAj0cvfiKvvQBCoiInXg7lbd8nRq3iOB7mb2AjAGuAg4\n18wGlyXpW4ikPRW4tWrirhSAXu4MGzYs8Riy5aVzoXOh81Hzqya1lk08kvEJVRZPq7T+/4D/q20/\nIiKSORqkIyKSg5S8G1lRUVHSIWQNnYsKOhcb0/moXa03LDNyEDNvjOOIiOQTM8PrccNSRESyjJK3\niEgOUvIWEclBSt4iIjlIyVtEJAcpeYuI5CAlbxGRHKTkLSKSgxoteb/6amMdSUQk/zXaCMtWrZx5\n86B16wY/nIhIXsiKEZY9e0Iq1VhHExHJb42WvI8+GiZNaqyjiYjkt0ZL3n36wNSpjXU0EZH81mg1\n7/XrnTZtYN48aNu2wQ8pIpLzsqLm3aRJ1L2nT2+sI4qI5K9G7efduzdMm1b7diIiUrNGTd5HHAEv\nvNCYRxQRyU9pJ28LM8ysb5XlV5vZdDN7ruy1y6b2ceihMHcurFpVn5BFRKTWp8dXcgXQqZrlhwAn\nufuS2nawzTbQvXuUTvr124wji4jIRtJqeZvZrsCxwFPVrN4TuN/Mnjeza2rb19FHw+TJmxekiIhs\nLN2yyUjgqk2smwCcBxwFHG1m/Wva0ZFHqseJiEh91Zq8zWwQ8Ia7v72JTW5092Xuvh4YBxxY0/66\ndo26tx4mLyJSd+nUvHsD+5nZFKAz0M3MVrr7DDPrADxrZl3Lkncf4IHqdjJ8+PBv3m+1VREffVTE\n7rvXN3wRkfyRSqVIpTkJ1GaNsDSzUcCjQAegxN1Hm9kVwBCgGJjm7jdU831e+Tj9+sHll8Pxx6d9\naBGRglPTCMtGGx5f+Ti/+hW89RY8/HCDH1pEJGdlXfL+8kvYa6/odXJgjRVyEZHClRVzm1TWqhX8\n5jdw5ZVJHF1EJPcl0vIGWL0a2rWDRYugefMGD0FEJOdkXcsboFkz6NFDc52IiNRFok+PP+kkeOih\nJCMQEclNiZVNAIqLoWPHGLSz884NHoaISE7JyrIJQIsWMGAAPPlkklGIiOSeRJM3wCmnwN/+lnQU\nIiK5JdGyCcCSJdHne/ly2CLxS4mISPbI2rIJwA47wPbbw/vvJx2JiEjuSDx5Axx8sJ5tKSKyORIv\nmwBMmQKDBsE778RNTBERyfKyCUCfPnDAATBhQtKRiIjkhqxI3gAnnKAugyIi6cqa5H366ZBKwdSp\nSUciIpL9siZ5t2kD998PgwfDrFlJRyMikt2y4oZlZfffDyNGwEcfgVVbphcRKQxZf8OysgsugKZN\nYfbspCMREcleWZe8zeCMM+Cuu2DDhqSjERHJTllXNoGYbbB375jze8aMBgxMRCSLZaRsYmGGmfWt\nsvxsM5ttZq+Y2dD6BgsxUGfWLJg/PwbuiIjIxjanbHIF0KnyAjNrCfwC6AUcDlxqZu0yEtgWcPLJ\n8MQTmdibiEh+SSt5m9muwLHAU1VW9QRmuPtqd18PTCGSeEaceqqSt4hIddJteY8ErqpmeRvgi0pf\nrwBa1Teocr17w8KF8OabmdqjiEh+aFLbBmY2CHjD3d+2b3e8XsbGybo1sKS6/QwfPvyb90VFRRQV\nFdUa3JZbwo9/DNdcEy3wZs1q/RYRkZyVSqVIpVJpbVtrbxMzuwfYD9gAdAYWA0PdfYaZtQBeBroD\nBrwI9HL34ir72KzeJpWtXg1nngm77Qb/+7912oWISE6qqbfJZnUVNLNRwKNAB6DE3Ueb2TnAtcA6\n4LfuPqaa76tz8gb49FPYbz946y3Yaac670ZEJKdkLHnXI4B6JW+IIfNTpsBzz2UoKBGRLJcXyXvD\nhiidPPccdO6cocBERLJYTs1tsilNmsTTdh5+OOlIRESSlzMtb4C5c+HYY2PGwS23zEBgIiJZLC9a\n3gBdukDbtjB9etKRiIgkK6eSN8Dxx8P48UlHISKSrJxL3gMGwLhx0AjVHhGRrJVzyfvQQ6Pnycsv\nJx2JiEhyci55m8F558GDDyYdiYhIcnKqt0m58hGXn3wCzZtnbLciIlklb3qblNt5ZzjiCPjrX5OO\nREQkGTmZvCEeVPzrX8N77yUdiYhI48vJsglAaSlcdx28/z6MHZvRXYuIZIW8mNukOl9+CR07xojL\nVhl7BISISHbIu5p3uVatoEcPmDkz6UhERBpXTidvgAMPhNdfTzoKEZHGlRfJ+7XXko5CRKRx5UXy\nnj076ShERBpXTt+whBgq37o1fPghtGnTIIcQEUlE3t6whHhIQ8+eMGNG0pGIiDSenE/eAL17w+TJ\nSUchItJ4aiybmFlz4BFge+A/gMvdfXal9VcDJwNryxYNdvcF1eynwcomECWTgw+Gjz+GZs0a7DAi\nIo2qPmWTq4CUuxcBNwA3V1l/CHCSu3+/7PWtxN0Y9tgD+vSJ4fIiIoWgtpb3YcA8d//czE4FBrr7\noErrXwEWAq2Bp9z99k3sp0Fb3gAffBADdpYuhS3yohgkIoWuzi1vd59ZlrjHE+WTx6tsMgE4DzgK\nONrM+mci4LrYc09o2VITVYlIYWhS00oz6wAscvf+ZrYb8CLwZKVNbixvUpvZOOBAoNonTA4fPvyb\n90VFRRQVFdUr8Or07AkvvQSdOmV81yIiDS6VSpFKpdLatrayyVjgAXcfb2btgBfcvVPZug7As0BX\nd19vZo+VbTuxmv00eNkEYOTIGG2pp+yISD6o86yCZtYZuA8oIVrpNwJ7ABvcfbSZXQEMAYqBae5+\nwyb20yjJe968eEjDwoWqe4tI7svbKWGrs+++0QI/6qhGOZyISIPJ6xGWVQ0bBuefD1OnJh2JiEjD\nybuWtzv86lfw6qvweNW+MSIiOaSgWt5mMHAgzJoVX8+eDWvWJBuTiEim5V3yBth7b1iyBN59F773\nPbjzzqQjEhHJrLxM3ltsEa3v/faD/v3hrrugpCTpqEREMifvat6VuUcZZd99YfToGD4vIpIrCqrm\nXZmV/cj9+sHYscnGIiKSSXnd8i43b17M+T1uXEwdKyKSCwq25V3uO9+BIUPgqaeSjkREJDMKInkD\nHHMMTPzWrCsiIrmpYJL3EUfA/PmaMlZE8kPBJO+ttorSyahRSUciIlJ/BZO8ISarevnlpKMQEam/\ngkreXbvCG29E/28RkVxWUMl7p50icS9enHQkIiL1U1DJ2wwOOQT+8pekIxERqZ+CGKRT2XvvRQKf\nNw9at046GhGRTSv4QTqV7b03DBgADz2UdCQiInVXcMkb4NRT4dlnk45CRKTuCjJ5H3AAvP560lGI\niNRdjcnbzJqb2ZNmljKz6WbWrcr6s81stpm9YmZDGzbUzNl9dyguhqVLk45ERKRuamt5XwWk3L0I\nuAG4uXyFmbUEfgH0Ag4HLjWzdg0UZ0aZwXe/C+3a6RFpIpKbakvek4AxZe/bACsqresJzHD31e6+\nHphCJPGcMGUK9OkDTz+ddCQiIpuvxuTt7jPd/XMzGw88AlR+Hnsb4ItKX68AWmU+xIaxzTZw7rlw\n990acSkiuadJTSvNrAOwyN37m9luwIvAk2Wrl7Fxsm4NLNnUvoYPH/7N+6KiIoqKiuoWcQaddRbc\ndhs891zMeyIikqRUKkUqlUpr2xoH6ZjZWOABdx9fVs9+wd07la1rAbwMdAeMSOy93L24mv1kzSCd\nqm6+GWbMgHPOiZeISLaoaZBObcm7M3AfUEK00m8E9gA2uPtoMzsHuBZYB/zW3cdsYj9Zm7ynTYtH\npLVpA5deGvOe3Hdf0lGJiNQjeWcwgKxN3uvWwa23wrvvwp//HMuyNFQRKTBK3mlyh/btY87vjh2T\njkZECp3mNkmTGRx3HNx+e9KRiIjUTC3vKr78EvbZJ25i7rVX0tGISCFTy3sztGoFgwZp1kERyW5K\n3tUYOBCeeSbpKERENk1lk2qsWQNt28Lnn8O22yYdjYgUKpVNNtM228TDil95JelIRESqp+S9Cb16\nwfTpSUchIlI9Je9NUPIWkWymmvcmLF4MnTvDggWqe4tIMlTzroMdd4wBOx06wJ/+lHQ0IiIbU8u7\nBitXRuK+6y6YOxe20KVORBqRWt511Lw5XHQRbL01/OMfSUcjIlJBybsWZnDVVXDPPUlHIiJSQWWT\nNCxfHrMMLlkCTZsmHY2IFAqVTepp++2j58mLLyYdiYhIUPJO09FHw+TJSUchIhKUvNN0zDEwaVLS\nUYiIBCXvNB1+eDwqbebMeHSaiEiSlLzT1LQp/OQnkcR//OOkoxGRQldrbxMz2woYTTw1vgQY5u6T\nKq2/GjgZWFu2aLC7L6iyj5zubVKutBQ+/hgOOQReeAE6dUo6IhHJZ/V6ALGZDQEOcffLzKwtMMPd\n96m0/i/A5e6+pIZ95EXyLjdsGHz2Gdx3X9KRiEg+q2/yPhJY5u5zzKw58I67d6i0/hVgIdAaeMrd\nv/X43nxL3l98Ad/9btzAPOCApKMRkXxVr37e7j61LHF3ASYCVZPzBOA84CjgaDPrX9+As12bNvDb\n38Ipp8DatbVvLyKSaU3S2cjMbiTq2j9y91SV1TeWN6vNbBxwIDC+6j6GDx/+zfuioiKKiorqFHC2\nOOccuPdeePZZ6J/3lysRaQypVIpUKpXWtumUTc4CzgJOcff1VdZ1AJ4Furr7ejN7DHjA3SdW2S6v\nyibl7rwT5syBUaOSjkRE8lF9a94PA92ApYABDjwClLj7aDO7AhgCFAPT3P2GavaRl8l70SLYd194\n5x3YYYekoxGRfFOv5J2hAPIyeQOcf37cvLzmmqQjEZF8o4mpGtCpp8JTTyUdhYgUGiXveurTJ56y\n8+qrSUciIoVEybuett46HpOmIfMi0piUvDPgxBNh1qyYuOrgg2HcuKQjEpF8p+SdAS1bwh57wNVX\nRxJ/7LGkIxKRfKfknSF9+sDTT8ODD0IqpZGXItKw1FUwQ9ati+R90knQrx+sWgUzZiQdlYjkMvXz\nbmQlJfHA4meegf32SzoaEclV6ufdyLbcEk4/HcaOTToSEclXSt4N5HvfU9lERBqOyiYNZPHieNLO\nsmWwhS6RIlIHKpskYMcdY7Kqt95KOhIRyUdK3g3o8MNh+vSkoxCRfKTk3YB69YIRI+CMMzR5lYhk\nVlpP0pG6GTAAXnkF9t8fLrooRmLm+AOERCRL6IZlI5k0CX74Q7j+erjiiqSjEZFcoEE6WWLmzGiN\nz5kDHTokHY2IZDsl7yxy2WUxiGfIEDjooKSjEZFspuSdRRYuhF12ifelpWDV/lpERNTPO6t06AAL\nFsCuu8b83yIidVFr8jazrczsUTN7ycxmmNkxVdafbWazzewVMxvacKHmjw4dYvj8888nHYmI5Kp0\nWt5nAV+4+6HACcDvyleYWUvgF0Av4HDgUjNr1xCB5ptjjoEJE5KOQkRyVTrJez5wT9n7r4FtK63r\nCcxw99Xuvh6YQiRxqUX//jB5MnzyCXz1VdLRiEiuqTV5u/tUd59jZl2AicDtlVa3Ab6o9PUKoFVm\nQ8xPO+wQ08buthtceGHS0YhIrklrhKWZ3QicDPzI3VOVVi1j42TdGlhS3T6GDx/+zfuioiKKNNSQ\n++6L4fMHHgiDBsGf/pR0RCKSpFQqRSqVSmvbWrsKmtlZRN37lLLSSOV1LYCXge6AAS8Cvdy9uMp2\n6ipYg0WLYM89YcUKaKIJC0SkTE1dBdNJFccCuwPPmJkBDjwClLj7aDO7hUja64BbqyZuqV379rDz\nzjBvXswBLiJSGw3SyRInnRRzn5xyStKRiEi20CCdHHDAAfDss0lHISK5Qi3vLLFkCXTpAlOmwL77\nJh2NiGQDtbxzwA47wOWXwx13JB2JiOQCJe8sMnQoTJwI48cnHYmIZDsl7yzSpg0MGwaPPpp0JCKS\n7ZS8s0z37jB7dtJRiEi20w3LLLN2LbRqBUuXwrbb1r69iOQv3bDMIU2bxlPnu3aFNWugpATWr6/9\n+0SksKjlnaUGDoyRl88/H/9+9lnczCx/Co+I5D+1vHPQyJFR+x4yBKZOjdb3BReAroEiAmnOKiiN\nb5dd4MUX432rVjH/d9++MHp0xQyEr74Kl16qQT0ihUhlkxwydy4ceSSMGhVzoWy7LZx4IvzmNzGx\nlYjkF5VN8kSXLnDaaZG4r7gC5syBSZNg8OCkIxORxqbknWOGDYt/Dz0U9tgDPv44auPz5ycalog0\nMiXvHLPTTvD++3DGGfH11ltHK/zii6G0FP7971g+diycc46ejymSr1TzzgMbNkBREZjB9Onw2mvx\ndPrvfAdOPhmuuSbpCEWkLmqqeSt554nFi6MHyr/+BTNnQrdu0Rr/+c/hpZdgC33GEsk5St4FZNky\nOPdcuPbaqIv36QN77w1//GPSkYnI5lLyLmCrV8dQ+4cegt69k45GRDZHRroKmtkZZnZrNcuvNrPp\nZvZc2UsDuLNIs2bRvTCVSjoSEcmkWpO3hYnAH4knx1d1CHCSu3+/7LUg00FK/fToEaMxRSR/1Jq8\ny+odxwJDN7HJnsD9Zva8malfQxbq0QOeeirmRikthV/+EnbcMW5yqpolkpvSKpu4eynVt7oBJgDn\nAUcBR5tZ/wzFJhnSsSM8/DD8858xsGfSpJh2tn37mCulsuXLk4lRRDZPJiamurH8bqSZjQMOBPQU\nxixiFkPo99sPttwSDjgA5s2LVyoVMxdCJO6dd44Rm507JxqyiNSiXsnbzDoAz5pZV3dfD/QBHqhu\n2+HDh3/zvqioiKKiovocWurgoIMq3u+1F/z5zzBgAHzwQfQNX7cOvv4aDj4YzjoLbrsNWrdOLl6R\nQpNKpUil2bsg7a6CZjYE6OTu15vZecAGdx9tZlcAQ4BiYJq731DN96qrYBYqLYUzz4QJEyJxb7kl\n/Oxn8MknsW7pUnjyyaSjFClc6uctNVq0KB639u9/w1FHxbLiYujQARYsgJYtk40vae7w3nvw3HNw\n+un6NCKNR1PCSo3at49EXZ64AVq0gCOOgFNOgX/8Ix74MHhwzJuS6668ElauTH/7iROhUycYOhRG\njGi4uEQ2h5K3bNKoUXD00fDTn0Y3wxYt4sk9114bNzTfeqv2fXz8cd27I65fDzffHDdWM2XePLj7\nbnjssRh9CjXH5w5/+1tMOTB9OjzxBOyzD/zhD1FWGjsW/vu/1eWyPlIp2G47WLs26UhyjLs3+CsO\nI7muuNi9ZUv3Cy90P/ts9xNPdF+/vmL9Bx+4f/RRxdeffea+7bbuzz3nXlrqPnt2/PvVV+5ffFH7\n8W6+OY536aXuTz/tPn+++7Rp7suW1fx9n3/uPnKk+113xfEqu+ce9//4D3dwHzjQ/Z//dN95Z/dZ\ns6rf17nnum+5pfucORU/05NPunft6v7DH7rvv79769YRX6aNGBE/Q74bMSJ+H7/5jXtJifthh7l/\n8kmsmz3b/eOP677v5cvdly7NTJxJKMud1efVTa3I5EvJO3+sWBH/Fhe7H3NMJMBrr3X/61/d+/d3\n797dfcOGSLAXX+zeokUk4b//Pf7afvIT9333dT/++Ip9lpS4r1u38XFWrnRv18590iT37bd3b9PG\nfb/93Js3d7/99optqnr33bhg9O7tvtde7pMnb7z+2GMjId53n/uAAe7durmfdZb797/v/tJLG1+M\nXnvNfccd42JT1YIF7k2bunfq5P7gg/GzXXbZ5p/PTZk3z71Vqzj+v/6Vuf1mo5NOcr/yyriIXnhh\nnMtu3dz/+Mf4fZef1+p+3zUZN859773dTzgh4yE3GiVvaRDLlkWC/P73I8k0aeJ+8MHuv/td/GVt\nvbX7Aw+4H3lkvK68Mpb36+fetm201P/nf9wvuCD28fe/u69dG8uuu879zDPjOBdd5N6rV7SaTz89\nLhq33BLJ8913N47pooviYuEeCXrgwHi/YoX7D34QCbFqEli3Lta1a+d+6qnRCh882L1Ll5pbvpdc\nEgl/zRr3446Lnytd69Z9+6Lwpz/FxWXo0Ph0cPzx7r/6VcRW9eLWUNasafhjLFjg/sQT7l9/HZ+M\ndtklfo833RQX+0MOidZ369axbPfdK7b74IP0jrFihft227k/8khc+N97r2F/poai5C0NprwssXBh\nJNc33nBv1ixayYsWRYI68kj3006L/6wHHRRllPPPj5LDdtvFX2HPnvHvn/8c/+60U8VH51Wr3D/9\nNN5/+WW0xLff3v3qq9232ML90Uejxbx2bbSEX389tl26NMouc+ZEIjz+ePcxYzb9c3z9dSSMNm2i\ndf/II7HPdMyd677PPultW1wcnz723rti2fvvx3H/+tf4JPD00xHTmjXuffu6//Sn6e27vrp0iXLQ\nAw98e92qVentY+rU+AT2n//57bLVunXu7du7d+7s/stfRuu6e/eK7X7/+/h+9/hEVloayXv8+Pi7\nGDeu9uOXlkbSL29xjxgRF8XPPksv/myi5C2N6sYbo/VcnZKS+Hfs2Pjrmz7d/amnYtlpp7l36OB+\nzTUVybo6//xn1LTd3e+9N5Jynz6xP4iyTbmWLWPZtdem16q89NK6lT/WrIlPApXLLtV55x333XZz\nP+KIaGWOHx+fEK6/Pn7u6ixaFK3Oq66qed9vvBEJq7p7Av/+d3wyKE+SH38c269fH63SSy5xP+ec\naPG3aBGfjD76KH5Pq1a5v/pqnMeqybhccXGc47Fj3Q84ID59dexYca+g3Msvx/2C6dPjk9luu7nP\nmFHzz/XrX7t/5ztx/FtvjfMxYkTFhbVyTHPnxqe/Hj3iZ3aP+C++OC4a8+fXfKxso+QtWWfVqooE\nXG76dPeiIvc330x/P8uXRyu8a9dI/lX/1KZMiSSVrvXrKy4wm2vXXaNW7e6eSrk/9li8LylxnzAh\nkszll7v//OexvGPHaOm3bh3lp5qS2CefRMu8pMT9oYeiXFNV377x8997r/sZZ1QkOvf4VARREiot\njVh32CGO26xZlJM6d44L4erV7v/1X1EG22uv2Ndxx8X333df9Td3L7kkSh39+0fJq6TE/bzz3O++\nu2KbTz91P/zwSKTlreN0biYuXx73MbbdNspZ110XsW+7bZznHj2ivFRaGjfRb7qp+t/hhRe633FH\n7ceri/feq/0iVBdK3pLXvvoq6tgbNrgvWZJcHMcdF584evaMVl6PHlECOeywSIRnnhmfLMovTvff\nHxesa65xHzSo9otG+/bRum3ZMmr/7hW9dj78MFrMv/99fAI49thoaQ8dGuvPP999jz3chwyJFnW7\ndpEUp02L+wJnneU+c+bGJYvZsyO5nn9+JPMDD4yMscUWG/euefvtaKkvX75xvGPGxCeiCRPcf/az\nil4ljz+++ed2xIi46LVsGRex99+PeyQdO8ZNzf33j3JPy5ZRWqvO3/4W91Y2Jd2yUFXvvRc/1zbb\nVHwKqK1HVLqUvEUawd13u5tFghw8OJLc974XN1fnzo1W9zXXbLr0UJubbopkeO21ca/gxhujG+Nf\n/hIJbfjwqNs/80z8O3dutKjnzYta9uTJkbR/8YuNe2AsXeq+eHHtx//xj2M/zz8fF6HyZHfaae63\n3fbt7detiwtG69bRSt5ll0i49fHBBxWxlpREC/yII+ICftllcS42pbg4En35jeI334xPEqWlsa/O\nnd3/8Y+Nv2flyk3fJC2/2J5zTnzKadMmbqbPnRuJPN2bqzVR8hZpBPPmRW22vDvlYYdFr4qG8Ic/\nuB96aPSGadIkavVVlZZG/blZs+iGt2GD+513xgXmnns2/5hTp7qPHh3vTzghyjNvvx0XhE114/vw\nw/i+kSMj25Sfm0w5+2z3K65If/tjjom6/KefRuLfc0/3UaPikwS433BDbLdsWdyfuPXWqMvPnFnx\nqe6112K7ffeNslTHjrHtmDFx4d5mm4oSU33VlLw1t4lIBrnHFLwQIwabNm34Yy5cCNtvH4+8q84F\nF8Cuu8KwYTFqddaseDh1fYwfD9ddFyNt99orRpkmYf78mFBt113T2/53v4PHH4+ZNC+5BHr2hKuu\ngm7d4KOP4vc1fjwMGgSTJ8MOO8S5XbkyJm+bMgUOOyy+/wc/gD33hDFjYrRt+e99+HD48MNY1q8f\nXHxx3f8OaprbRC1vkTxX3uUukzZsiBr0eefFiNZcsWpVjEm46ab4esOGuJcA8QmhVaso7bRqFeUl\niFp+aWm02iFKL599FuUpiHsdVRUXR3fZpk3jBmttPZ2Ki+NG8B13bFx7R2UTEZGwePHGN4enTIkE\n615xw3fAgPi66tD8fv2iTOQeCblduyiZVGflyqird+vm/qMfxbJLLtn4hurKlTGeoH37uJ8AMf1C\nuZqSt8omIiJlvv4a2raFn/885rav6vbbY/77yZPT3+eCBbD//vCvf0WZ5d57o3R18skx2dm998ZM\nlz/7GdxxB7z/fkx8BprPW0QkbbfcAgMHxjTIVa1YEfPf77PP5u1z0CD48suop0NMr/z883FfZPr0\neLYswNtvx9TMN90EZ58NzZsreYuIJOb11+HAA+PB38cdF9Msl5TEuiZVHkbZt288JLxvX5g4Uclb\nRCRRt921aK86AAADvElEQVQWvVtqe3zvwoXwyCPlZRslbxGRnLFqVTzB6owzlLxFRHJORp5haWZn\nmNmt1Sw/28xmm9krZja0PoGKiEh6ak3eFiYCfwS8yrqWwC+AXsDhwKVm1q4hAs0XqVQq6RCyhs5F\nBZ2Ljel81K7W5F1W7zgWqK5V3ROY4e6r3X09MIVI4rIJ+qOsoHNRQediYzoftUurbOLupVRpdZdp\nA3xR6esVQKsMxCUiIjVIu+a9CcvYOFm3BpbUc58iIlKLtHubmNkQoJO7X19pWQvgZaA7YMCLQC93\nL67yvepqIiJSB5vqbdKkuoW1MbPzgA3uPtrMbiGS9jrg1qqJu6aDi4hI3TRKP28REcms+ta8RUQk\nAQ2avM2siZk9YmYvmtk0M9vMubhyW+WBTWa2f9l5eNHM7q+0zdVmNqtskNPA5KLNPDPbysweNbOX\nzGyGmR1jZl0L7TwAmFlzM3vSzFJmNt3MuhXquShXNoZkhpn1LfRzUSebmug7Ey/gPODOsve9gacb\n8njZ8iJu3k4EVgP/XbbseWD/svcPAqcAewEvlW2/HfAe0CTp+DN4HoYAvyt73wZ4F5haaOeh7Ge9\nEbiy7H0R8PdC/Juock6uJLoa9y30c1GXV51uWG6Go4E/ALj7C2Y2poGPlxXc3c3sWGAwsI+ZbQPs\n5O5vlG0yHvgekdCe9viL/crM3ga6AK8lEXcDmA/MKnu/FmhL3GcptPMAMAmYV/a+LbCewvybAMDM\ndiUG/z0FbEkBn4u6auiad9VBPKUNfLys4RsPbGoFLK+0+quyZa3J40FO7j7V3eeYWRfik8hvKcDz\nAODuM939czMbDzwCPE6BnosyI4Gryt4X5P+P+mrolvcy4uNOuULt2lL1PLQGPi9b3rbK8rwa5GRm\nNwInAz8CZgI/rLS6kM5DB2CRu/c3s92A14mfvVwhnYtBwBvu/rbFI9cL9v9HfTR0y/tZ4FSAsjLC\nCw18vKzk7muBxWa2f9migcAzxFwwJwKY2Q5AR3d/M5koM8/MzgJ6AAe7e6pQz0OZkUC/svdfE0lo\nsZkdULaskM5Fb6CPmU0hSie/BvYo0L+LOmvolvfDwGgzewVYCZzTwMfLZlcCo8ysBJjm7pMBzOwJ\nM3uNGOR0eZIBNoBjgd2BZyyaWA5cQeGdB4DrgfvM7Fri/93FRKngwUI7F+5+Sfl7MxsFjCEuZoX4\nd1FnGqQjIpKDNEhHRCQHKXmLiOQgJW8RkRyk5C0ikoOUvEVEcpCSt4hIDlLyFhHJQUreIiI56P8B\nWED6Egk6/2cAAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7fe7a01def10>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"get_z_position(s, 'protein').plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This simple measurement resolves the transition. Let's now get all of these serially, and see how long it takes."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NapA_S1_i2o_9CPU times: user 49.3 s, sys: 6.64 s, total: 55.9 s\n",
"Wall time: 56.3 s\n",
"\n"
]
}
],
"source": [
"%%time\n",
"z_com = dict()\n",
"for sim in b:\n",
" print \"\\r{}\".format(sim.name), \n",
" z_com[sim.name] = get_z_position(sim, 'protein')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pretty slow. We could do basically the same thing using the ``Bundle.map`` method, but we could do it in parallel on this machine using the ``multiprocessing`` module internally:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 62.8 ms, sys: 64.1 ms, total: 127 ms\n",
"Wall time: 6.53 s\n"
]
}
],
"source": [
"%time z_com_map = b.map(get_z_position, atomsel='protein', processes=8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Which is, predictably, about 8 times faster. But if this required more time to bake, with perhaps more than just 50 simulations, we wouldn't want to tie up our notebook session waiting for this to finish. We could instead use our ``distributed`` workers."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [],
"source": [
"z_com_dist = ex.map(get_z_position, b, atomsel='protein')\n",
"progress(z_com_dist)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A good deal longer; there's some overhead to divvying out jobs to other machines, and the trajectories still have to all get pulled through the network. What if we submit another job of 8 workers to the queue?\n",
"\n",
"```\n",
"qsub -b y -pe singlenode 8 -V dworker --nprocs 8 --nthreads 1 chipbacca:8786\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"<Executor: scheduler=chipbacca:8786 workers=16 threads=16>"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ex"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"z_com_dist = ex.map(get_z_position, b, atomsel='protein')\n",
"progress(z_com_dist)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Still pretty long. An important consideration when deciding how many workers to have at once is to remember that if they're iterating through trajectories with MDAnalysis they will all be hammering the same NFS filesystem, and probably the same fileserver. The bottleneck can very quickly become I/O. So, less is often more. \n",
"\n",
"Tasks that take minutes as opposed to seconds are also better candidates for this kind of parallelism, since for tiny tasks the overhead of scheduling and passing the information (pickling the function, inputs) through the network can be much greater than the time required for the task itself.\n",
"\n",
"Regardless, a big plus is that we can continue working in our Python session while these tasks bake; we don't have to wait to keep working, and we could keep adding tasks through the ``Executor`` and let the scheduler do the work of making them happen."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment