Skip to content

Instantly share code, notes, and snippets.

@ChrisWellsWood
Created April 20, 2018 13:53
Show Gist options
  • Save ChrisWellsWood/41a8baf9785e1b39a39c70d651d98129 to your computer and use it in GitHub Desktop.
Save ChrisWellsWood/41a8baf9785e1b39a39c70d651d98129 to your computer and use it in GitHub Desktop.
Evaluation of Models Produced by ISAMBARD
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluating Models"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import itertools\n",
"from pprint import pprint\n",
"\n",
"import nglview as nv\n",
"import numpy"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import isambard.specifications as specifications\n",
"import isambard.modelling as modelling"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def show_ball_and_stick(ampal):\n",
" view = nv.show_text(ampal.pdb)\n",
" view.add_ball_and_stick()\n",
" view.remove_cartoon()\n",
" return view"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we start, let's create some simple models to evaluate:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"dimer = specifications.CoiledCoil(2)\n",
"dimer = modelling.pack_side_chains_scwrl(\n",
" dimer, ['EIAALKQEIAALKKENAALKWEIAALKQ', 'EIAALKQEIAALKKENAALKWEIAALKQ'])\n",
"weird_dimer = specifications.CoiledCoil.from_parameters(2, 60, 10, 80, 180)\n",
"weird_trimer = specifications.CoiledCoil.from_parameters(3, 20, 6, 1000, 278)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Geometric Evaluation"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import ampal"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `analyse_protein` module provides a range of tools to evaluating the backbone geometry of your model. One of the most useful metrics we can use is the residues per turn (RPT) of the α-helices in the model. At extreme parameter values, RPT can move away out of the range of values observed in α-helices of known protein structures (figure 2), which indicates that there is backbone strain in the model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<figure>\n",
" <img src=\"imgs/figure2.png\" alt=\"Values of residues per turn observed in known structures.\"\n",
" height=\"60%\" width=\"60%\">\n",
" <figcaption>\n",
" **Figure 2. Values of residues per turn observed in known structures**. Grey bars: Values of residues per \n",
" turn from helices in know structures. Mean = 3.65 (SD = 0.07). White bars: Values of RPT found in coiled-\n",
" coil crystal structures. Mean = 3.62 (SD = 0.07)\n",
" </figcaption>\n",
"</figure>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, this distribution is very tight, so if your number of residues per turn moves too far away from the mean, it's probably indicating that the model isn't very good. Let measure the RPT values on our models."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"dimer_rpt_values = [ampal.analyse_protein.residues_per_turn(p) for p in dimer]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[3.6505441188017955,\n",
" 3.6484983349076057,\n",
" 3.5746743031510766,\n",
" 3.5604492295087686,\n",
" 3.572250356600477,\n",
" 3.5910463517784237,\n",
" 3.5909617525888486,\n",
" 3.57224876932938,\n",
" 3.5807695978522007,\n",
" 3.595791630739698,\n",
" 3.5804148769732222,\n",
" 3.5721923679720984,\n",
" 3.5910457875081745,\n",
" 3.5912803606407313,\n",
" 3.571856594261234,\n",
" 3.5807785304653974,\n",
" 3.595555954507623,\n",
" 3.5801770411956553,\n",
" 3.5719633354189217,\n",
" 3.5916399389680036,\n",
" 3.590787557354051,\n",
" 3.571948736521602,\n",
" 3.581108163389263,\n",
" 3.569271088022024,\n",
" 3.5704004673760315,\n",
" 3.688781943424599,\n",
" 3.56563352995949,\n",
" None],\n",
" [3.6505441188017955,\n",
" 3.6484983349076057,\n",
" 3.5746743031510766,\n",
" 3.5604492295087686,\n",
" 3.572250356600477,\n",
" 3.5910463517784237,\n",
" 3.5909617525888486,\n",
" 3.57224876932938,\n",
" 3.5807695978522007,\n",
" 3.595791630739698,\n",
" 3.5804148769732222,\n",
" 3.5721923679720984,\n",
" 3.5910457875081745,\n",
" 3.5912803606407313,\n",
" 3.571856594261234,\n",
" 3.5807785304653974,\n",
" 3.595555954507623,\n",
" 3.5801770411956553,\n",
" 3.5719633354189217,\n",
" 3.5916399389680036,\n",
" 3.590787557354051,\n",
" 3.571948736521602,\n",
" 3.581108163389263,\n",
" 3.569271088022024,\n",
" 3.5704004673760315,\n",
" 3.688781943424599,\n",
" 3.56563352995949,\n",
" None]]\n"
]
}
],
"source": [
"pprint(dimer_rpt_values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`residues_per_turn` take a `Polypeptide` object and returns a list of RPT values for each residue, so we needed to apply the function to each `Polypeptide` in the dimer `Assembly` (we used a list comprehension to do that, if you're not familiar with the comprehension syntax in Python, have a look at [the docs](https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions)). `None` is returned for the last residues as RPT is undefined c-terminus."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's calculate the average value. First we want to merge the per chain values together and filter out `None`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"dimer_rpt_values = itertools.chain(*dimer_rpt_values) # This flattens a list of lists to a single list\n",
"dimer_rpt_values = [x for x in dimer_rpt_values if x is not None] # This removes the Nones"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[3.6505441188017955,\n",
" 3.6484983349076057,\n",
" 3.5746743031510766,\n",
" 3.5604492295087686,\n",
" 3.572250356600477,\n",
" 3.5910463517784237,\n",
" 3.5909617525888486,\n",
" 3.57224876932938,\n",
" 3.5807695978522007,\n",
" 3.595791630739698,\n",
" 3.5804148769732222,\n",
" 3.5721923679720984,\n",
" 3.5910457875081745,\n",
" 3.5912803606407313,\n",
" 3.571856594261234,\n",
" 3.5807785304653974,\n",
" 3.595555954507623,\n",
" 3.5801770411956553,\n",
" 3.5719633354189217,\n",
" 3.5916399389680036,\n",
" 3.590787557354051,\n",
" 3.571948736521602,\n",
" 3.581108163389263,\n",
" 3.569271088022024,\n",
" 3.5704004673760315,\n",
" 3.688781943424599,\n",
" 3.56563352995949,\n",
" 3.6505441188017955,\n",
" 3.6484983349076057,\n",
" 3.5746743031510766,\n",
" 3.5604492295087686,\n",
" 3.572250356600477,\n",
" 3.5910463517784237,\n",
" 3.5909617525888486,\n",
" 3.57224876932938,\n",
" 3.5807695978522007,\n",
" 3.595791630739698,\n",
" 3.5804148769732222,\n",
" 3.5721923679720984,\n",
" 3.5910457875081745,\n",
" 3.5912803606407313,\n",
" 3.571856594261234,\n",
" 3.5807785304653974,\n",
" 3.595555954507623,\n",
" 3.5801770411956553,\n",
" 3.5719633354189217,\n",
" 3.5916399389680036,\n",
" 3.590787557354051,\n",
" 3.571948736521602,\n",
" 3.581108163389263,\n",
" 3.569271088022024,\n",
" 3.5704004673760315,\n",
" 3.688781943424599,\n",
" 3.56563352995949]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimer_rpt_values"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mean = 3.5889655821932, STD = 0.028395400495633494\n"
]
}
],
"source": [
"# We can use the numpy package to calculate the means/std\n",
"print('Mean = {}, STD = {}'.format(numpy.mean(dimer_rpt_values), numpy.std(dimer_rpt_values)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So the mean is just under 3.6, which is quite low, but well within the distribution of observed RPT values."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> ### Problem\n",
"> Calculate the RPT values for the `weird_dimer` and `weird_trimer`. Do they fall inside the observed distributions?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> ### Note\n",
">There are two other useful metrics to evaluate backbone geometry rise per residue and residues per turn. You can access these directly on the polymer object (try this `dimer[0].rise_per_residue()` and `dimer[0].radii_of_curvature()`). We won't talk about these here but they can also be useful for evaluating structures."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## All-Atom Force Fields"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import budeff"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another useful way to evaluate your structures is with an all-atom scoring function. The BUDE force field from the Sessions lab in Bristol (see [this paper](http://dx.doi.org/10.1093/comjnl/bxr091) for more details) is currently bundled with ISAMBARD. We like BUDE because it's fast and suits our purposes, but you can use any force field you like. We can calculate the BUDE energies using built-in properties of the AMPAL objects."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<BUFF Score -824.24: 9.22 St | -439.92 De | -393.55 Ch>\n"
]
}
],
"source": [
"dimer_score = budeff.get_internal_energy(dimer)\n",
"print(dimer_score)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dimer we created earlier has a total BUDE energy of -824.24, which is composed of a steric component (St), an energy of desolvation (De) and a charged component (Ch). This returns a BUDE Force Feild (BUFF) score object, which contains lots of information about the score including the individual components, and all the interactions that the make up the score."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> ### Note\n",
"> BUDE follows the thermodynamic convention of lower = more favourable."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-824.243057366055"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimer_score.total_energy"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"9.224133372376127"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimer_score.steric"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-439.92041419821857"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimer_score.desolvation"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-393.54677654021606"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimer_score.charge"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[((<Carbon Atom (CA). Coordinates: (-4.531, -2.214, 0.311)>,\n",
" <Carbon Atom (CB). Coordinates: (-2.472, 2.243, 0.403)>),\n",
" [0.0, -0.10226744144925334, 0.0]),\n",
" ((<Carbon Atom (CA). Coordinates: (-4.531, -2.214, 0.311)>,\n",
" <Carbon Atom (CD1). Coordinates: (-0.914, 2.655, -1.572)>),\n",
" [0.0, -0.06900849794588693, 0.0]),\n",
" ((<Carbon Atom (CA). Coordinates: (-4.531, -2.214, 0.311)>,\n",
" <Carbon Atom (CG1). Coordinates: (-1.282, 1.745, -0.420)>),\n",
" [0.0, -0.09619723357205417, 0.0]),\n",
" ((<Carbon Atom (CA). Coordinates: (-4.531, -2.214, 0.311)>,\n",
" <Carbon Atom (CG2). Coordinates: (-2.135, 3.506, 1.181)>),\n",
" [0.0, -0.07105751651638885, 0.0]),\n",
" ((<Carbon Atom (C). Coordinates: (-3.944, -1.131, 1.217)>,\n",
" <Carbon Atom (CB). Coordinates: (-2.472, 2.243, 0.403)>),\n",
" [0.0, -0.17137887804990476, 0.0])]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dimer_score.inter_scores[:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `inter_scores` list contains all the non-zero pairwise atom interaction scores from the structure. Each list item contains a tuple with the pair of atoms and a list of the different BUDE components _i.e._ `[steric, desolvation, charge]`. It can be useful to examine these scores to find clashes or important residues."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"clashes = [x for x in dimer_score.inter_scores if x[1][0] > 0] # filtering for clashes"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[((<Carbon Atom (CD1). Coordinates: (-0.276, -1.314, 3.587)>,\n",
" <Carbon Atom (CD1). Coordinates: (0.276, 1.314, 3.587)>),\n",
" [1.6336594546051064, -0.15007, 0.0]),\n",
" ((<Carbon Atom (CD1). Coordinates: (-0.646, -1.182, 14.124)>,\n",
" <Carbon Atom (CD1). Coordinates: (0.646, 1.182, 14.124)>),\n",
" [1.5420104154177583, -0.15007, 0.0]),\n",
" ((<Oxygen Atom (OD1). Coordinates: (0.448, 1.204, 21.296)>,\n",
" <Oxygen Atom (OD1). Coordinates: (-0.448, -1.204, 21.296)>),\n",
" [2.8595532748119767, 0.0, 2.8547649]),\n",
" ((<Carbon Atom (CD1). Coordinates: (-0.959, -0.941, 24.655)>,\n",
" <Carbon Atom (CD1). Coordinates: (0.959, 0.941, 24.655)>),\n",
" [1.6148625410099382, -0.15007, 0.0]),\n",
" ((<Carbon Atom (CD1). Coordinates: (-1.191, -0.626, 35.194)>,\n",
" <Carbon Atom (CD1). Coordinates: (1.191, 0.626, 35.194)>),\n",
" [1.5740476865313469, -0.15007, 0.0])]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clashes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The third interaction is a clash between 2 oxygen atoms. We can get more information on the residues involved from the atoms themselves."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"(atom1, atom2), score_components = clashes[2] # we can unpack the interaction to get the atoms"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('A', '16', 122) <Residue containing 8 Atoms. Residue code: ASN>\n",
"('B', '16', 339) <Residue containing 8 Atoms. Residue code: ASN>\n"
]
}
],
"source": [
"print(atom1.unique_id, atom1.parent)\n",
"print(atom2.unique_id, atom2.parent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see here that it is the oxygen atoms on the asparagine side chains in the core are clashing slightly, which you can see in the structure of the first example. This indicates that the parameters used to create this model could be improved for this sequence, so let's do that!"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a0ffcee589694efc92a1ef2e448c07cd",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"NGLWidget()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"show_ball_and_stick(dimer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> ### Problem\n",
"> Try and find other interesting interactions, such as the atom pair with the lowest charged interaction."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parametric modelling is very powerful because it can create sensible models for regions of protein-folding space that have not been observed in nature. However, y#ou should be aware that just because you can create a model, it doesn't mean that proteins **can actually** adopt that conformation. Always thoroughly evaluate your models using as many metrics as you have available to determine if your model is physiochemically viable. These could include:\n",
"\n",
"* Geometric analysis\n",
"* All-atom scoring functions\n",
"* Statistical potentials\n",
"* Predicting secondary structure\n",
"* Measuring contact order/packing quality\n",
"* Model quality\n",
"* Molecular dynamics\n",
"\n",
"We provide tools for some of these methods, but a huge range of other tools are available."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment