Skip to content

Instantly share code, notes, and snippets.

@orbeckst
Last active August 27, 2018 16:59
Show Gist options
  • Save orbeckst/26081375f3ea3152f08bbcc90c14c5eb to your computer and use it in GitHub Desktop.
Save orbeckst/26081375f3ea3152f08bbcc90c14c5eb to your computer and use it in GitHub Desktop.
benchmarking dihedral featurization in MDAnalysis
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Benchmarking dihedral featurization in MDAnalysis\n",
"\n",
"We compare a simple use of the `AtomGroup.dihedral.value()` over a list of dihedral angles against the recently implemented `MDAnalysis.analysis.dihedrals.Dihedral` analyis class, which uses optimized functions for calculating multiple dihedral angles."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Packages "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.18.1-dev\n"
]
}
],
"source": [
"import MDAnalysis as mda\n",
"print(mda.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test data\n",
"As test data we use our standard AdK trajectory from \n",
"\n",
"* Seyler, Sean; Beckstein, Oliver (2017): Molecular dynamics trajectory for benchmarking MDAnalysis. figshare. Fileset. https://doi.org/10.6084/m9.figshare.5108170.v1\n",
"\n",
"which can be downloaded with\n",
"```bash\n",
"wget https://ndownloader.figshare.com/files/8672074 -O 1ake_007-nowater-core-dt240ps.dcd\n",
"wget https://ndownloader.figshare.com/files/8672230 -O adk4AKE.psf\n",
"```\n",
"\n",
"Drop the trajectories into `../files`.\n",
"\n",
"In the code we convert the DCD trajectory to a XTC trajectory because all our previous benchmarks were done with XTC. If the trajectory already exists, no conversion is performed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Code\n",
"\n",
"The two test scripts are attached to the gist. The key differences are shown here."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### dihedral_featurizer_simple.py\n",
"This code creates a list of `dihedral` instances and evalutes their `value()` at each time step.\n",
"\n",
"```python\n",
"\n",
"def angle2sincos(x):\n",
" \"\"\"Convert angle x to (cos x, sin x).\n",
"\n",
" Parameters\n",
" ----------\n",
" x : float or array_like\n",
"\n",
" Returns\n",
" -------\n",
" feature_vector : array\n",
" 1D feature vector ``[cos(x[0]), sin(x[0]), cos(x[1]), sin(x[1]), ...]``.\n",
" \"\"\"\n",
" x = np.deg2rad(x)\n",
" return np.ravel(np.transpose([np.cos(x), np.sin(x)]))\n",
"\n",
"\n",
"def residues_to_dihedrals(residues):\n",
" \"\"\"Return list of [phi1, psi1, phi2, psi2, ...] dihedral objects\"\"\"\n",
" # flatten the list, see https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python\n",
" return list(chain.from_iterable(\n",
" (res.phi_selection().dihedral, res.psi_selection().dihedral)\n",
" for res in residues))\n",
"\n",
"def featurize_dihedrals(dihedrals):\n",
" angles = [dihedral.value() for dihedral in dihedrals]\n",
" return angle2sincos(angles)\n",
"\n",
"\n",
"# in the main code\n",
"if __name__ == \"__main__\":\n",
" \n",
" u = mda.Universe(TOP, XTC)\n",
" mobile = u.select_atoms(\"protein\")\n",
" residues = mobile.residues[1:-1]\n",
"\n",
" dihedrals = residues_to_dihedrals(residues)\n",
"\n",
" results = []\n",
" for ts in u.trajectory[::nstep]:\n",
" results.append(featurize_dihedrals(dihedrals))\n",
"\n",
" results = np.array(results)\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### dihedral_featurizer_analysis.py\n",
"\n",
"This code uses the new class [MDAnalysis.analysis.dihedrals.Dihedral](https://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/dihedrals.html#MDAnalysis.analysis.dihedrals.Dihedral) which calculates dihedrals for large groups efficiently with the [MDAnalysis.lib.distances.calc_dihedrals](https://www.mdanalysis.org/mdanalysis/documentation_pages/lib/distances.html#MDAnalysis.lib.distances.calc_dihedrals) function.\n",
"\n",
"The key code components are\n",
"```python\n",
"def residues_to_dihedral_groups(residues):\n",
" \"\"\"Return list of [phi1, psi1, phi2, psi2, ...] atom groups\"\"\"\n",
" # flatten the list, see https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python\n",
" return list(chain.from_iterable(\n",
" (res.phi_selection(), res.psi_selection())\n",
" for res in residues))\n",
"\n",
"# in the main code\n",
"if __name__ == \"__main__\":\n",
"\n",
" u = mda.Universe(TOP, XTC)\n",
" mobile = u.select_atoms(\"protein\")\n",
" residues = mobile.residues[1:-1]\n",
" \n",
" dihedral_groups = residues_to_dihedral_groups(residues)\n",
" \n",
" DA = MDAnalysis.analysis.dihedrals.Dihedral(dihedral_groups, step=nstep)\n",
" DA.run()\n",
" x = np.deg2rad(DA.angles)\n",
" results = np.concatenate((np.cos(x), np.sin(x)), axis=1)\n",
"```\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results\n",
"We analyze the $\\phi$ and $\\psi$ backbone dihedrals for residues 2 to 213 of AdK, i.e., $2 \\times 212 = 424$ dihedral angles.\n",
"\n",
"The results are for a step size `nstep=5`, i.e., only every fifth frame is analyzed. \n",
"\n",
"The output result is ordered differently between the two approaches. For the `dihedral_featurizer_analysis.py` it contains a \"dihedral feature vector\" per frame with `[cos(x0), cos(x1), ..., sin(x0), sin(x1) ...]` whereas `dihedral_featurizer_simple.py` produces `[cos(x0), sin(x0), cos(x1), sin(x1) ...]` per frame (this has historical reasons). They contain exactly the same data.\n",
"\n",
"The benchmarks were run on a Macbook Pro with a SSD and 3.1 GHz Intel Core i7, Python 3.6."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Results for `dihedral_featurizer_analysis.py`"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.18.1-dev\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Volumes/Data/oliver/Biop/Projects/SPIDAL/Research/parallel/notebooks/dihedral_featurizer_analysis.py:47: UserWarning: output trajectory /Volumes/Data/oliver/Biop/Projects/SPIDAL/Research/parallel/files/1ake_007-nowater-core-dt240ps.xtc exists, no conversion\n",
" \"\"\"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------[ TIMING ]--------------------\n",
"load Universe 0.1141 s\n",
"select protein 0.0057 s\n",
"generate dihedral groups 0.4075 s\n",
"setup MDAnalysis.analysis.dihedrals.Dihedral 0.0420 s\n",
"get trajectory length 0.0000 s\n",
"featurize_dihedrals() (838 frames) 0.3550 s\n",
"result --> (cos(x), sin(x)) 0.0050 s\n",
"--------------------------------------------------------\n",
"total time 0.9293 s\n",
"--------------------------------------------------------\n",
"\n",
"\n",
"runtime per frame 0.000423605 s\n"
]
}
],
"source": [
"%run dihedral_featurizer_analysis.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Results for `dihedral_featurizer_simple.py`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.18.1-dev\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Volumes/Data/oliver/Biop/Projects/SPIDAL/Research/parallel/notebooks/dihedral_featurizer_simple.py:63: UserWarning: output trajectory /Volumes/Data/oliver/Biop/Projects/SPIDAL/Research/parallel/files/1ake_007-nowater-core-dt240ps.xtc exists, no conversion\n",
" warnings.warn(\"output trajectory {} exists, no conversion\".format(trj_out))\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------[ TIMING ]--------------------\n",
"load Universe 0.1203 s\n",
"select protein 0.0059 s\n",
"generate dihedrals 0.4407 s\n",
"featurize_dihedrals() (1 step) 0.0396 s\n",
"featurize_dihedrals() (838 frames) 31.2707 s\n",
"list --> array 0.0048 s\n",
"--------------------------------------------------------\n",
"total time 31.8820 s\n",
"--------------------------------------------------------\n",
"\n",
"\n",
"runtime per frame 0.0373159 s\n"
]
}
],
"source": [
"%run dihedral_featurizer_simple.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Comparison \n",
"The version with the analysis function is *much* faster: Comparing the run time per frame:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"88.09126426741894"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"0.0373159 / 0.000423605"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The speed-up is about 80 fold!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"Even for moderate numbers of dihedral calculations, using the optimized analysis class [MDAnalysis.analysis.dihedrals.Dihedral](https://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/dihedrals.html#MDAnalysis.analysis.dihedrals.Dihedral) (available in the upcoming 0.19.0 release of MDAnalysis) is *much faster* (> 80 fold in the test here). Thus, **`MDAnalysis.analysis.dihedrals.Dihedral` should be used when possible.**\n",
"\n",
"(Note that for the specific case of backbone angles a specialized class [MDAnalysis.analysis.dihedrals.Ramachandran](https://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/dihedrals.html#MDAnalysis.analysis.dihedrals.Ramachandran) exists that automatically generates the selections. We could have used this class for the benchmark but instead opted for the more general class to keep the comparison between the two scripts as close as possible.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
#!/usr/bin/env python
import os
import warnings
import sys
import time
from collections import OrderedDict
from itertools import chain
import numpy as np
import MDAnalysis as mda
import MDAnalysis.analysis.dihedrals
class StopWatch(OrderedDict):
fmt = "{0:40s} {1:10.4f} s"
def tic(self, label):
if label in self:
raise ValueError("label {} already exists".format(label))
self[label] = time.time()
def show(self):
print("--------------------------[ TIMING ]--------------------")
labels = list(self.keys())
start = labels[0]
for stop in labels[1:]:
dt = self[stop] - self[start]
print(self.fmt.format(stop, dt))
start = stop
print("".join(56*["-"]))
print(self.fmt.format("total time",
self[labels[-1]] - self[labels[0]]))
print("".join(56*["-"]))
def residues_to_dihedral_groups(residues):
"""Return list of [phi1, psi1, phi2, psi2, ...] atom groups"""
# flatten the list, see https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
return list(chain.from_iterable(
(res.phi_selection(), res.psi_selection())
for res in residues))
def convert_trajectory(top, trj_in, trj_out, force=False):
if os.path.exists(trj_out) and not force:
warnings.warn("output trajectory {} exists, no conversion".format(trj_out))
return trj_out
u = mda.Universe(top, trj_in)
with mda.Writer(trj_out, n_atoms=u.atoms.n_atoms) as W:
for ts in u.trajectory:
W.write(u.atoms)
return trj_out
if __name__ == "__main__":
print(mda.__version__)
nstep = 5
FILESDIR = os.path.abspath(os.path.join(
os.path.dirname(__file__), os.path.pardir, 'files'))
TOP = os.path.join(FILESDIR, 'adk4AKE.psf')
DCD = os.path.join(FILESDIR, '1ake_007-nowater-core-dt240ps.dcd')
XTC = os.path.join(FILESDIR, '1ake_007-nowater-core-dt240ps.xtc')
convert_trajectory(TOP, DCD, XTC)
timer = StopWatch()
timer.tic('init')
u = mda.Universe(TOP, XTC)
timer.tic('load Universe')
mobile = u.select_atoms("protein")
residues = mobile.residues[1:-1]
timer.tic('select protein')
dihedral_groups = residues_to_dihedral_groups(residues)
timer.tic('generate dihedral groups')
DA = MDAnalysis.analysis.dihedrals.Dihedral(dihedral_groups, step=nstep)
timer.tic('setup MDAnalysis.analysis.dihedrals.Dihedral')
# 1 frame
#timer.tic('featurize_dihedrals() (1 step)')
nframes = len(u.trajectory[::nstep])
timer.tic("get trajectory length")
DA.run()
timer.tic('featurize_dihedrals() ({} frames)'.format(nframes))
# reformat as a feature vector of [cos(x) cos(x) ... sin(x) sin(x)...]
# per frame
x = np.deg2rad(DA.angles)
results = np.concatenate((np.cos(x), np.sin(x)), axis=1)
timer.tic('result --> (cos(x), sin(x))')
timer.show()
# per-frame (pulling out data from timer instance is ugly)
label = 'featurize_dihedrals() ({} frames)'.format(nframes)
labels = list(timer.keys())
ix = labels.index(label)
runtime = timer[label] - timer[labels[ix-1]]
runtime_per_frame = runtime / nframes
print("\n")
print("{0:40s} {1:g} s".format("runtime per frame",
runtime_per_frame))
#!/usr/bin/env python
import os
import warnings
import sys
import time
from collections import OrderedDict
from itertools import chain
import numpy as np
import MDAnalysis as mda
class StopWatch(OrderedDict):
fmt = "{0:40s} {1:10.4f} s"
def tic(self, label):
if label in self:
raise ValueError("label {} already exists".format(label))
self[label] = time.time()
def show(self):
print("--------------------------[ TIMING ]--------------------")
labels = list(self.keys())
start = labels[0]
for stop in labels[1:]:
dt = self[stop] - self[start]
print(self.fmt.format(stop, dt))
start = stop
print("".join(56*["-"]))
print(self.fmt.format("total time",
self[labels[-1]] - self[labels[0]]))
print("".join(56*["-"]))
def angle2sincos(x):
"""Convert angle x to (cos x, sin x).
Parameters
----------
x : float or array_like
Returns
-------
feature_vector : array
1D feature vector ``[cos(x[0]), sin(x[0]), cos(x[1]), sin(x[1]), ...]``.
"""
x = np.deg2rad(x)
return np.ravel(np.transpose([np.cos(x), np.sin(x)]))
def residues_to_dihedrals(residues):
"""Return list of [phi1, psi1, phi2, psi2, ...] dihedral objects"""
# flatten the list, see https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
return list(chain.from_iterable(
(res.phi_selection().dihedral, res.psi_selection().dihedral)
for res in residues))
def featurize_dihedrals(dihedrals):
angles = [dihedral.value() for dihedral in dihedrals]
return angle2sincos(angles)
def convert_trajectory(top, trj_in, trj_out, force=False):
if os.path.exists(trj_out) and not force:
warnings.warn("output trajectory {} exists, no conversion".format(trj_out))
return trj_out
u = mda.Universe(top, trj_in)
with mda.Writer(trj_out, n_atoms=u.atoms.n_atoms) as W:
for ts in u.trajectory:
W.write(u.atoms)
return trj_out
if __name__ == "__main__":
print(mda.__version__)
nstep = 5
FILESDIR = os.path.abspath(os.path.join(
os.path.dirname(__file__), os.path.pardir, 'files'))
TOP = os.path.join(FILESDIR, 'adk4AKE.psf')
DCD = os.path.join(FILESDIR, '1ake_007-nowater-core-dt240ps.dcd')
XTC = os.path.join(FILESDIR, '1ake_007-nowater-core-dt240ps.xtc')
convert_trajectory(TOP, DCD, XTC)
timer = StopWatch()
timer.tic('init')
u = mda.Universe(TOP, XTC)
timer.tic('load Universe')
mobile = u.select_atoms("protein")
residues = mobile.residues[1:-1]
timer.tic('select protein')
dihedrals = residues_to_dihedrals(residues)
timer.tic('generate dihedrals')
# 1 frame
dih_series = featurize_dihedrals(dihedrals)
timer.tic('featurize_dihedrals() (1 step)')
results = []
traj_iter = u.trajectory[::nstep]
nframes = len(traj_iter)
for ts in traj_iter:
results.append(featurize_dihedrals(dihedrals))
timer.tic('featurize_dihedrals() ({} frames)'.format(nframes))
# feature vector of [cos(x) sin(x) ... cos(x) sin(x)]
# per frame
results = np.array(results)
timer.tic('list --> array')
timer.show()
# per-frame (pulling out data from timer instance is ugly)
label = 'featurize_dihedrals() ({} frames)'.format(nframes)
labels = list(timer.keys())
ix = labels.index(label)
runtime = timer[label] - timer[labels[ix-1]]
runtime_per_frame = runtime / nframes
print("\n")
print("{0:40s} {1:g} s".format("runtime per frame",
runtime_per_frame))
@orbeckst
Copy link
Author

Note that the MDAnalysis.analysis.dihedrals.Dihedral functionality was added in PR MDAnalysis/mdanalysis#2033 and will be available in the upcoming release 0.19.0 of MDAnalysis.

At the moment (August 2018) you need to install the development branch of MDAnalysis to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment