Last active
August 27, 2018 16:59
-
-
Save orbeckst/26081375f3ea3152f08bbcc90c14c5eb to your computer and use it in GitHub Desktop.
benchmarking dihedral featurization in MDAnalysis
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Benchmarking dihedral featurization in MDAnalysis\n", | |
"\n", | |
"We compare a simple use of the `AtomGroup.dihedral.value()` over a list of dihedral angles against the recently implemented `MDAnalysis.analysis.dihedrals.Dihedral` analyis class, which uses optimized functions for calculating multiple dihedral angles." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Packages " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0.18.1-dev\n" | |
] | |
} | |
], | |
"source": [ | |
"import MDAnalysis as mda\n", | |
"print(mda.__version__)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Test data\n", | |
"As test data we use our standard AdK trajectory from \n", | |
"\n", | |
"* Seyler, Sean; Beckstein, Oliver (2017): Molecular dynamics trajectory for benchmarking MDAnalysis. figshare. Fileset. https://doi.org/10.6084/m9.figshare.5108170.v1\n", | |
"\n", | |
"which can be downloaded with\n", | |
"```bash\n", | |
"wget https://ndownloader.figshare.com/files/8672074 -O 1ake_007-nowater-core-dt240ps.dcd\n", | |
"wget https://ndownloader.figshare.com/files/8672230 -O adk4AKE.psf\n", | |
"```\n", | |
"\n", | |
"Drop the trajectories into `../files`.\n", | |
"\n", | |
"In the code we convert the DCD trajectory to a XTC trajectory because all our previous benchmarks were done with XTC. If the trajectory already exists, no conversion is performed." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Code\n", | |
"\n", | |
"The two test scripts are attached to the gist. The key differences are shown here." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### dihedral_featurizer_simple.py\n", | |
"This code creates a list of `dihedral` instances and evalutes their `value()` at each time step.\n", | |
"\n", | |
"```python\n", | |
"\n", | |
"def angle2sincos(x):\n", | |
" \"\"\"Convert angle x to (cos x, sin x).\n", | |
"\n", | |
" Parameters\n", | |
" ----------\n", | |
" x : float or array_like\n", | |
"\n", | |
" Returns\n", | |
" -------\n", | |
" feature_vector : array\n", | |
" 1D feature vector ``[cos(x[0]), sin(x[0]), cos(x[1]), sin(x[1]), ...]``.\n", | |
" \"\"\"\n", | |
" x = np.deg2rad(x)\n", | |
" return np.ravel(np.transpose([np.cos(x), np.sin(x)]))\n", | |
"\n", | |
"\n", | |
"def residues_to_dihedrals(residues):\n", | |
" \"\"\"Return list of [phi1, psi1, phi2, psi2, ...] dihedral objects\"\"\"\n", | |
" # flatten the list, see https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python\n", | |
" return list(chain.from_iterable(\n", | |
" (res.phi_selection().dihedral, res.psi_selection().dihedral)\n", | |
" for res in residues))\n", | |
"\n", | |
"def featurize_dihedrals(dihedrals):\n", | |
" angles = [dihedral.value() for dihedral in dihedrals]\n", | |
" return angle2sincos(angles)\n", | |
"\n", | |
"\n", | |
"# in the main code\n", | |
"if __name__ == \"__main__\":\n", | |
" \n", | |
" u = mda.Universe(TOP, XTC)\n", | |
" mobile = u.select_atoms(\"protein\")\n", | |
" residues = mobile.residues[1:-1]\n", | |
"\n", | |
" dihedrals = residues_to_dihedrals(residues)\n", | |
"\n", | |
" results = []\n", | |
" for ts in u.trajectory[::nstep]:\n", | |
" results.append(featurize_dihedrals(dihedrals))\n", | |
"\n", | |
" results = np.array(results)\n", | |
"```\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### dihedral_featurizer_analysis.py\n", | |
"\n", | |
"This code uses the new class [MDAnalysis.analysis.dihedrals.Dihedral](https://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/dihedrals.html#MDAnalysis.analysis.dihedrals.Dihedral) which calculates dihedrals for large groups efficiently with the [MDAnalysis.lib.distances.calc_dihedrals](https://www.mdanalysis.org/mdanalysis/documentation_pages/lib/distances.html#MDAnalysis.lib.distances.calc_dihedrals) function.\n", | |
"\n", | |
"The key code components are\n", | |
"```python\n", | |
"def residues_to_dihedral_groups(residues):\n", | |
" \"\"\"Return list of [phi1, psi1, phi2, psi2, ...] atom groups\"\"\"\n", | |
" # flatten the list, see https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python\n", | |
" return list(chain.from_iterable(\n", | |
" (res.phi_selection(), res.psi_selection())\n", | |
" for res in residues))\n", | |
"\n", | |
"# in the main code\n", | |
"if __name__ == \"__main__\":\n", | |
"\n", | |
" u = mda.Universe(TOP, XTC)\n", | |
" mobile = u.select_atoms(\"protein\")\n", | |
" residues = mobile.residues[1:-1]\n", | |
" \n", | |
" dihedral_groups = residues_to_dihedral_groups(residues)\n", | |
" \n", | |
" DA = MDAnalysis.analysis.dihedrals.Dihedral(dihedral_groups, step=nstep)\n", | |
" DA.run()\n", | |
" x = np.deg2rad(DA.angles)\n", | |
" results = np.concatenate((np.cos(x), np.sin(x)), axis=1)\n", | |
"```\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Results\n", | |
"We analyze the $\\phi$ and $\\psi$ backbone dihedrals for residues 2 to 213 of AdK, i.e., $2 \\times 212 = 424$ dihedral angles.\n", | |
"\n", | |
"The results are for a step size `nstep=5`, i.e., only every fifth frame is analyzed. \n", | |
"\n", | |
"The output result is ordered differently between the two approaches. For the `dihedral_featurizer_analysis.py` it contains a \"dihedral feature vector\" per frame with `[cos(x0), cos(x1), ..., sin(x0), sin(x1) ...]` whereas `dihedral_featurizer_simple.py` produces `[cos(x0), sin(x0), cos(x1), sin(x1) ...]` per frame (this has historical reasons). They contain exactly the same data.\n", | |
"\n", | |
"The benchmarks were run on a Macbook Pro with a SSD and 3.1 GHz Intel Core i7, Python 3.6." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Results for `dihedral_featurizer_analysis.py`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0.18.1-dev\n" | |
] | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"/Volumes/Data/oliver/Biop/Projects/SPIDAL/Research/parallel/notebooks/dihedral_featurizer_analysis.py:47: UserWarning: output trajectory /Volumes/Data/oliver/Biop/Projects/SPIDAL/Research/parallel/files/1ake_007-nowater-core-dt240ps.xtc exists, no conversion\n", | |
" \"\"\"\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"--------------------------[ TIMING ]--------------------\n", | |
"load Universe 0.1141 s\n", | |
"select protein 0.0057 s\n", | |
"generate dihedral groups 0.4075 s\n", | |
"setup MDAnalysis.analysis.dihedrals.Dihedral 0.0420 s\n", | |
"get trajectory length 0.0000 s\n", | |
"featurize_dihedrals() (838 frames) 0.3550 s\n", | |
"result --> (cos(x), sin(x)) 0.0050 s\n", | |
"--------------------------------------------------------\n", | |
"total time 0.9293 s\n", | |
"--------------------------------------------------------\n", | |
"\n", | |
"\n", | |
"runtime per frame 0.000423605 s\n" | |
] | |
} | |
], | |
"source": [ | |
"%run dihedral_featurizer_analysis.py" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Results for `dihedral_featurizer_simple.py`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0.18.1-dev\n" | |
] | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"/Volumes/Data/oliver/Biop/Projects/SPIDAL/Research/parallel/notebooks/dihedral_featurizer_simple.py:63: UserWarning: output trajectory /Volumes/Data/oliver/Biop/Projects/SPIDAL/Research/parallel/files/1ake_007-nowater-core-dt240ps.xtc exists, no conversion\n", | |
" warnings.warn(\"output trajectory {} exists, no conversion\".format(trj_out))\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"--------------------------[ TIMING ]--------------------\n", | |
"load Universe 0.1203 s\n", | |
"select protein 0.0059 s\n", | |
"generate dihedrals 0.4407 s\n", | |
"featurize_dihedrals() (1 step) 0.0396 s\n", | |
"featurize_dihedrals() (838 frames) 31.2707 s\n", | |
"list --> array 0.0048 s\n", | |
"--------------------------------------------------------\n", | |
"total time 31.8820 s\n", | |
"--------------------------------------------------------\n", | |
"\n", | |
"\n", | |
"runtime per frame 0.0373159 s\n" | |
] | |
} | |
], | |
"source": [ | |
"%run dihedral_featurizer_simple.py" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Comparison \n", | |
"The version with the analysis function is *much* faster: Comparing the run time per frame:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"88.09126426741894" | |
] | |
}, | |
"execution_count": 33, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"0.0373159 / 0.000423605" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The speed-up is about 80 fold!" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Conclusion\n", | |
"\n", | |
"Even for moderate numbers of dihedral calculations, using the optimized analysis class [MDAnalysis.analysis.dihedrals.Dihedral](https://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/dihedrals.html#MDAnalysis.analysis.dihedrals.Dihedral) (available in the upcoming 0.19.0 release of MDAnalysis) is *much faster* (> 80 fold in the test here). Thus, **`MDAnalysis.analysis.dihedrals.Dihedral` should be used when possible.**\n", | |
"\n", | |
"(Note that for the specific case of backbone angles a specialized class [MDAnalysis.analysis.dihedrals.Ramachandran](https://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/dihedrals.html#MDAnalysis.analysis.dihedrals.Ramachandran) exists that automatically generates the selections. We could have used this class for the benchmark but instead opted for the more general class to keep the comparison between the two scripts as close as possible.)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import os | |
import warnings | |
import sys | |
import time | |
from collections import OrderedDict | |
from itertools import chain | |
import numpy as np | |
import MDAnalysis as mda | |
import MDAnalysis.analysis.dihedrals | |
class StopWatch(OrderedDict): | |
fmt = "{0:40s} {1:10.4f} s" | |
def tic(self, label): | |
if label in self: | |
raise ValueError("label {} already exists".format(label)) | |
self[label] = time.time() | |
def show(self): | |
print("--------------------------[ TIMING ]--------------------") | |
labels = list(self.keys()) | |
start = labels[0] | |
for stop in labels[1:]: | |
dt = self[stop] - self[start] | |
print(self.fmt.format(stop, dt)) | |
start = stop | |
print("".join(56*["-"])) | |
print(self.fmt.format("total time", | |
self[labels[-1]] - self[labels[0]])) | |
print("".join(56*["-"])) | |
def residues_to_dihedral_groups(residues): | |
"""Return list of [phi1, psi1, phi2, psi2, ...] atom groups""" | |
# flatten the list, see https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python | |
return list(chain.from_iterable( | |
(res.phi_selection(), res.psi_selection()) | |
for res in residues)) | |
def convert_trajectory(top, trj_in, trj_out, force=False): | |
if os.path.exists(trj_out) and not force: | |
warnings.warn("output trajectory {} exists, no conversion".format(trj_out)) | |
return trj_out | |
u = mda.Universe(top, trj_in) | |
with mda.Writer(trj_out, n_atoms=u.atoms.n_atoms) as W: | |
for ts in u.trajectory: | |
W.write(u.atoms) | |
return trj_out | |
if __name__ == "__main__": | |
print(mda.__version__) | |
nstep = 5 | |
FILESDIR = os.path.abspath(os.path.join( | |
os.path.dirname(__file__), os.path.pardir, 'files')) | |
TOP = os.path.join(FILESDIR, 'adk4AKE.psf') | |
DCD = os.path.join(FILESDIR, '1ake_007-nowater-core-dt240ps.dcd') | |
XTC = os.path.join(FILESDIR, '1ake_007-nowater-core-dt240ps.xtc') | |
convert_trajectory(TOP, DCD, XTC) | |
timer = StopWatch() | |
timer.tic('init') | |
u = mda.Universe(TOP, XTC) | |
timer.tic('load Universe') | |
mobile = u.select_atoms("protein") | |
residues = mobile.residues[1:-1] | |
timer.tic('select protein') | |
dihedral_groups = residues_to_dihedral_groups(residues) | |
timer.tic('generate dihedral groups') | |
DA = MDAnalysis.analysis.dihedrals.Dihedral(dihedral_groups, step=nstep) | |
timer.tic('setup MDAnalysis.analysis.dihedrals.Dihedral') | |
# 1 frame | |
#timer.tic('featurize_dihedrals() (1 step)') | |
nframes = len(u.trajectory[::nstep]) | |
timer.tic("get trajectory length") | |
DA.run() | |
timer.tic('featurize_dihedrals() ({} frames)'.format(nframes)) | |
# reformat as a feature vector of [cos(x) cos(x) ... sin(x) sin(x)...] | |
# per frame | |
x = np.deg2rad(DA.angles) | |
results = np.concatenate((np.cos(x), np.sin(x)), axis=1) | |
timer.tic('result --> (cos(x), sin(x))') | |
timer.show() | |
# per-frame (pulling out data from timer instance is ugly) | |
label = 'featurize_dihedrals() ({} frames)'.format(nframes) | |
labels = list(timer.keys()) | |
ix = labels.index(label) | |
runtime = timer[label] - timer[labels[ix-1]] | |
runtime_per_frame = runtime / nframes | |
print("\n") | |
print("{0:40s} {1:g} s".format("runtime per frame", | |
runtime_per_frame)) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import os | |
import warnings | |
import sys | |
import time | |
from collections import OrderedDict | |
from itertools import chain | |
import numpy as np | |
import MDAnalysis as mda | |
class StopWatch(OrderedDict): | |
fmt = "{0:40s} {1:10.4f} s" | |
def tic(self, label): | |
if label in self: | |
raise ValueError("label {} already exists".format(label)) | |
self[label] = time.time() | |
def show(self): | |
print("--------------------------[ TIMING ]--------------------") | |
labels = list(self.keys()) | |
start = labels[0] | |
for stop in labels[1:]: | |
dt = self[stop] - self[start] | |
print(self.fmt.format(stop, dt)) | |
start = stop | |
print("".join(56*["-"])) | |
print(self.fmt.format("total time", | |
self[labels[-1]] - self[labels[0]])) | |
print("".join(56*["-"])) | |
def angle2sincos(x): | |
"""Convert angle x to (cos x, sin x). | |
Parameters | |
---------- | |
x : float or array_like | |
Returns | |
------- | |
feature_vector : array | |
1D feature vector ``[cos(x[0]), sin(x[0]), cos(x[1]), sin(x[1]), ...]``. | |
""" | |
x = np.deg2rad(x) | |
return np.ravel(np.transpose([np.cos(x), np.sin(x)])) | |
def residues_to_dihedrals(residues): | |
"""Return list of [phi1, psi1, phi2, psi2, ...] dihedral objects""" | |
# flatten the list, see https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python | |
return list(chain.from_iterable( | |
(res.phi_selection().dihedral, res.psi_selection().dihedral) | |
for res in residues)) | |
def featurize_dihedrals(dihedrals): | |
angles = [dihedral.value() for dihedral in dihedrals] | |
return angle2sincos(angles) | |
def convert_trajectory(top, trj_in, trj_out, force=False): | |
if os.path.exists(trj_out) and not force: | |
warnings.warn("output trajectory {} exists, no conversion".format(trj_out)) | |
return trj_out | |
u = mda.Universe(top, trj_in) | |
with mda.Writer(trj_out, n_atoms=u.atoms.n_atoms) as W: | |
for ts in u.trajectory: | |
W.write(u.atoms) | |
return trj_out | |
if __name__ == "__main__": | |
print(mda.__version__) | |
nstep = 5 | |
FILESDIR = os.path.abspath(os.path.join( | |
os.path.dirname(__file__), os.path.pardir, 'files')) | |
TOP = os.path.join(FILESDIR, 'adk4AKE.psf') | |
DCD = os.path.join(FILESDIR, '1ake_007-nowater-core-dt240ps.dcd') | |
XTC = os.path.join(FILESDIR, '1ake_007-nowater-core-dt240ps.xtc') | |
convert_trajectory(TOP, DCD, XTC) | |
timer = StopWatch() | |
timer.tic('init') | |
u = mda.Universe(TOP, XTC) | |
timer.tic('load Universe') | |
mobile = u.select_atoms("protein") | |
residues = mobile.residues[1:-1] | |
timer.tic('select protein') | |
dihedrals = residues_to_dihedrals(residues) | |
timer.tic('generate dihedrals') | |
# 1 frame | |
dih_series = featurize_dihedrals(dihedrals) | |
timer.tic('featurize_dihedrals() (1 step)') | |
results = [] | |
traj_iter = u.trajectory[::nstep] | |
nframes = len(traj_iter) | |
for ts in traj_iter: | |
results.append(featurize_dihedrals(dihedrals)) | |
timer.tic('featurize_dihedrals() ({} frames)'.format(nframes)) | |
# feature vector of [cos(x) sin(x) ... cos(x) sin(x)] | |
# per frame | |
results = np.array(results) | |
timer.tic('list --> array') | |
timer.show() | |
# per-frame (pulling out data from timer instance is ugly) | |
label = 'featurize_dihedrals() ({} frames)'.format(nframes) | |
labels = list(timer.keys()) | |
ix = labels.index(label) | |
runtime = timer[label] - timer[labels[ix-1]] | |
runtime_per_frame = runtime / nframes | |
print("\n") | |
print("{0:40s} {1:g} s".format("runtime per frame", | |
runtime_per_frame)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Note that the
MDAnalysis.analysis.dihedrals.Dihedral
functionality was added in PR MDAnalysis/mdanalysis#2033 and will be available in the upcoming release 0.19.0 of MDAnalysis.At the moment (August 2018) you need to install the development branch of MDAnalysis to use it.