ChrisWellsWood/2_Advanced_Selections_and_Analysis.ipynb

## 2_Advanced_Selections_and_Analysis.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Advanced Selections and Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are lots of tools in `isambard` for analysing protein structure, most of them are baked into the AMPAL objects themselves."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Selecting Secondary Structure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load-up a PDB file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import isambard\n",
    "my_protein_3qy1 = isambard.ampal.convert_pdb_to_ampal('3qy1.pdb')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can select all the helices or strands simply by typing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Assembly (3qy1) containing 20 Polypeptides>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_protein_3qy1.helices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Assembly (3qy1) containing 10 Polypeptides>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_protein_3qy1.strands"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These attributes return `Assembly` objects, where each of the helices/strands are represented as a separate `Polypeptide`. All of the normal attributes and methods associated with `Assembly` and `Polypeptide` objects work just the same as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_helices_3qy1 = my_protein_3qy1.helices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['IDTLISNNALWSKMLVEE',\n",
       " 'GFFEKLA',\n",
       " 'AERLT',\n",
       " 'LNCLSVVQ',\n",
       " 'GGIKAAVE',\n",
       " 'INNWLLHIRDIWLK',\n",
       " 'SSLLG',\n",
       " 'RLDALYELNVMEQVYNLGH',\n",
       " 'TIMQSAWK',\n",
       " 'RETLENGYHKGISALSLKY',\n",
       " 'IDTLISNNALWSKMLVEE',\n",
       " 'GFFEKLAQ',\n",
       " 'AERLT',\n",
       " 'LNCLSVVQ',\n",
       " 'GGIKAAVE',\n",
       " 'INNWLLHIRDIWLK',\n",
       " 'SSLLG',\n",
       " 'RLDALYELNVMEQVYNLGH',\n",
       " 'TIMQSAWK',\n",
       " 'RETLENGYHKGISALSLKY']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helices_3qy1.sequences"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Polypeptide containing 18 Residues. Sequence: IDTLISNNALWS...>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helices_3qy1[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'IDTLISNNALWSKMLVEE'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helices_3qy1[0].sequence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2076.36968"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helices_3qy1[0].molecular_weight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Residue containing 7 Atoms. Residue code: VAL>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helices_3qy1[0][15]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Carbon Atom (CA). Coordinates: (23.340, -8.324, -33.970)>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helices_3qy1[0][15]['CA']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is worth noting that the `ampal_parent` attribute of the individual helix refers to the `Polypeptide` to which it belongs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_helix = my_helices_3qy1[2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helix.ampal_parent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get back to the original `Assembly`, therefore, we call `ampal_parent` twice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helix.ampal_parent.ampal_parent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helix.ampal_parent == my_helices_3qy1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helix.ampal_parent.ampal_parent == my_protein_3qy1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Selecting All Residues or Atoms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes it's convinient to select all of the `Residues` or `Atoms` in an `Assembly` or `Polypeptide` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<itertools.chain at 0x10e7a1908>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_protein_3qy1.get_monomers()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<itertools.chain at 0x10e7a1860>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_helices_3qy1.get_atoms()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see an itertools object is returned. This might be slightly confusing as you might expect a list, but this is what's known as an `iterator`, you can loop over it like a list or string, but if you want to use it repeatedly or examine its contents you'll need to convert it to a list. The advantage of returning an `iterator` is that it's much more memory efficient, and these lists could potentially be very large. If you'd like to know more about `iterables` and `iterators`, as well as related objects called `generators`, see the following link (this is quite advanced Python and is not essential for using any of the AMPAL framework: [Iterators and Generators](http://anandology.com/python-practice-book/iterators.html). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Nitrogen Atom (N). Coordinates: (14.714, -30.168, -26.423)>,\n",
       " <Carbon Atom (CA). Coordinates: (15.518, -30.153, -25.207)>,\n",
       " <Carbon Atom (C). Coordinates: (16.111, -28.769, -24.931)>,\n",
       " <Oxygen Atom (O). Coordinates: (15.960, -27.855, -25.734)>,\n",
       " <Carbon Atom (CB). Coordinates: (16.613, -31.220, -25.270)>,\n",
       " <Carbon Atom (CG). Coordinates: (16.067, -32.624, -25.153)>,\n",
       " <Oxygen Atom (OD1). Coordinates: (14.899, -32.777, -24.743)>,\n",
       " <Oxygen Atom (OD2). Coordinates: (16.807, -33.576, -25.474)>,\n",
       " <Nitrogen Atom (N). Coordinates: (16.782, -28.637, -23.789)>,\n",
       " <Carbon Atom (CA). Coordinates: (17.360, -27.364, -23.339)>,\n",
       " <Carbon Atom (C). Coordinates: (18.466, -26.876, -24.299)>,\n",
       " <Oxygen Atom (O). Coordinates: (18.513, -25.674, -24.586)>,\n",
       " <Carbon Atom (CB). Coordinates: (17.842, -27.509, -21.862)>,\n",
       " <Carbon Atom (CG1). Coordinates: (16.643, -27.442, -20.889)>,\n",
       " <Carbon Atom (CG2). Coordinates: (18.956, -26.516, -21.456)>,\n",
       " <Carbon Atom (CD1). Coordinates: (15.873, -26.033, -20.777)>,\n",
       " <Nitrogen Atom (N). Coordinates: (19.310, -27.788, -24.836)>,\n",
       " <Carbon Atom (CA). Coordinates: (20.370, -27.411, -25.786)>,\n",
       " <Carbon Atom (C). Coordinates: (19.801, -26.707, -27.030)>,\n",
       " <Oxygen Atom (O). Coordinates: (20.438, -25.781, -27.539)>]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(my_protein_3qy1.get_atoms())[:20]  # Showing the first 20 elements for clarity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Residue containing 8 Atoms. Residue code: ASP>,\n",
       " <Residue containing 8 Atoms. Residue code: ILE>,\n",
       " <Residue containing 8 Atoms. Residue code: ASP>,\n",
       " <Residue containing 7 Atoms. Residue code: THR>,\n",
       " <Residue containing 8 Atoms. Residue code: LEU>,\n",
       " <Residue containing 8 Atoms. Residue code: ILE>,\n",
       " <Residue containing 6 Atoms. Residue code: SER>,\n",
       " <Residue containing 8 Atoms. Residue code: ASN>,\n",
       " <Residue containing 8 Atoms. Residue code: ASN>,\n",
       " <Residue containing 5 Atoms. Residue code: ALA>,\n",
       " <Residue containing 8 Atoms. Residue code: LEU>,\n",
       " <Residue containing 14 Atoms. Residue code: TRP>,\n",
       " <Residue containing 6 Atoms. Residue code: SER>,\n",
       " <Residue containing 9 Atoms. Residue code: LYS>,\n",
       " <Residue containing 8 Atoms. Residue code: MET>,\n",
       " <Residue containing 8 Atoms. Residue code: LEU>,\n",
       " <Residue containing 7 Atoms. Residue code: VAL>,\n",
       " <Residue containing 9 Atoms. Residue code: GLU>,\n",
       " <Residue containing 9 Atoms. Residue code: GLU>,\n",
       " <Residue containing 8 Atoms. Residue code: ASP>]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(my_protein_3qy1.get_monomers())[:20]  # Showing the first 20 elements for clarity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Analysing Composition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can easily look at the composition of sequences and structures using a `Counter` object. `Counter` can be fed any iterable (`lists` and `strings` are the most commonly used) and will count the occurence of each element inside. We can start by looking at the composition of amino acids in a sequence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from collections import Counter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'DIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGELFVHRNVANLVIHTDLNCLSVVQYAVDVLEVEHIIICGHSGCGGIKAAVENPELGLINNWLLHIRDIWLKHSSLLGKMPEEQRLDALYELNVMEQVYNLGHSTIMQSAWKRGQNVTIHGWAYSINDGLLRDLDVTATNRETLENGYHKGISALSLKYI'"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_protein_3qy1['A'].sequence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Counter({'A': 13,\n",
       "         'C': 4,\n",
       "         'D': 11,\n",
       "         'E': 16,\n",
       "         'F': 4,\n",
       "         'G': 16,\n",
       "         'H': 9,\n",
       "         'I': 16,\n",
       "         'K': 9,\n",
       "         'L': 29,\n",
       "         'M': 4,\n",
       "         'N': 14,\n",
       "         'P': 6,\n",
       "         'Q': 7,\n",
       "         'R': 9,\n",
       "         'S': 13,\n",
       "         'T': 8,\n",
       "         'V': 15,\n",
       "         'W': 6,\n",
       "         'Y': 6})"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Counter(my_protein_3qy1['A'].sequence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But as stated before, you can use Counters on any iterable, not just strings. Let's make a list of all the pdb molecule codes of the ligands:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "my_ligands = my_protein_3qy1.get_ligands()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Ligand containing 1 Atom. Ligand code: ZN>"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_ligands[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'ZN'"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_ligands[0].mol_code  # This can be used to find the pdb molecule code of any residue"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are two ways to generate a list of the codes, a `for` loop or a `list` comprehension, use whichever you are comfortable with. If you'd like to know more about `list` comprehensions, please see the following link (this is relatively advanced Python but while not essential for using any of the AMPAL framework, it is very useful: [List Comprehensions](https://docs.python.org/3.5/tutorial/datastructures.html) (scroll down to the relevant section). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# With a for loop\n",
    "mol_codes_1 = []\n",
    "for lig in my_ligands:\n",
    "    mol_codes_1.append(lig.mol_code)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['ZN', 'HOH', 'HOH', 'HOH', 'HOH']"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol_codes_1[:5]  # The first 5 elements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# A list comprehension\n",
    "mol_codes_2 = [lig.mol_code for lig in my_ligands]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['ZN', 'HOH', 'HOH', 'HOH', 'HOH']"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol_codes_2[:5]  # Showing the first 5 elements for clarity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use either of these methods, use whichever one you're more comfortable with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mol_codes_1 == mol_codes_2  # The lists that re produced are exactly the same"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the `list` of mol codes can be used to make a `Counter` object: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Counter({'HOH': 447, 'ZN': 2})"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Counter(mol_codes_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, there are 447 water molecules and 2 zinc ions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Distance Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can select all atoms in the protein and understand the structures composition, we can perform some simple analysis. Let's try and find all the residues that are close to the zinc ions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "zinc_1 = my_ligands[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Ligand containing 1 Atom. Ligand code: ZN>"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "zinc_1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All `Ligand` objects are `Monomers`, even if they only contain a single atom. So we use the zinc `Atom` itself when measuring distances:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Zinc Atom (ZN). Coordinates: (-5.817, -20.172, -18.798)>"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "zinc_1['ZN']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Measuring distances is simple, you can use the `isambard.geometry.distance` function. It takes two 3D vectors as an input, these can be in list form, tuples, or even `Atom` objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "28.179990720367528"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "isambard.geometry.distance(zinc_1['ZN'], (0, 0, 0))  # Distance from the origin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "first_ca = my_protein_3qy1['A'][0]['CA']  # CA of the first residue in chain A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Carbon Atom (CA). Coordinates: (15.518, -30.153, -25.207)>"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_ca"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "24.41060972200408"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "isambard.geometry.distance(zinc_1['ZN'], first_ca)  # Distance in angstroms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we need to loop over all the atoms and find which are close (<= 3 Å) to the zinc. We can use the distance function in geometry to do this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "atoms_close_to_zinc = []\n",
    "for at in my_protein_3qy1.get_atoms():\n",
    "    if isambard.geometry.distance(zinc_1['ZN'], at) <= 3.0:\n",
    "        atoms_close_to_zinc.append(at)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Sulfur Atom (SG). Coordinates: (-4.322, -18.933, -17.640)>,\n",
       " <Oxygen Atom (OD2). Coordinates: (-4.771, -22.057, -19.213)>,\n",
       " <Nitrogen Atom (NE2). Coordinates: (-6.209, -19.569, -20.787)>,\n",
       " <Sulfur Atom (SG). Coordinates: (-7.753, -20.619, -17.709)>,\n",
       " <Zinc Atom (ZN). Coordinates: (-5.817, -20.172, -18.798)>]"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "atoms_close_to_zinc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are 5 atoms within 3 Å of the zinc, *including* the zinc atom itself. One way to get ignore this atom in the above block of code is to use the `ligands=False` flag in `get_atoms()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "atoms_close_to_zinc = []\n",
    "for at in my_protein_3qy1.get_atoms(ligands=False):\n",
    "    if isambard.geometry.distance(zinc_1['ZN'], at) <= 3.0:\n",
    "        atoms_close_to_zinc.append(at)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Sulfur Atom (SG). Coordinates: (-4.322, -18.933, -17.640)>,\n",
       " <Oxygen Atom (OD2). Coordinates: (-4.771, -22.057, -19.213)>,\n",
       " <Nitrogen Atom (NE2). Coordinates: (-6.209, -19.569, -20.787)>,\n",
       " <Sulfur Atom (SG). Coordinates: (-7.753, -20.619, -17.709)>]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "atoms_close_to_zinc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now there are 4 atoms within 3 Å of the zinc, 2 sulphur atoms, an oxygen and a nitrogen. Let's find the residues that are coordinating the zinc:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Sulfur Atom (SG). Coordinates: (-4.322, -18.933, -17.640)>"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "atoms_close_to_zinc[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Residue containing 6 Atoms. Residue code: CYS>"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "atoms_close_to_zinc[0].ampal_parent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can get all the residues using a for loop or a list comprehension:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "residues_close_to_zinc = []\n",
    "for at in atoms_close_to_zinc:\n",
    "    residues_close_to_zinc.append(at.ampal_parent)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Residue containing 6 Atoms. Residue code: CYS>,\n",
       " <Residue containing 8 Atoms. Residue code: ASP>,\n",
       " <Residue containing 10 Atoms. Residue code: HIS>,\n",
       " <Residue containing 6 Atoms. Residue code: CYS>]"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "residues_close_to_zinc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# The list comprehension is much more concise\n",
    "residues_close_to_zinc_2 = [at.ampal_parent for at in atoms_close_to_zinc]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Residue containing 6 Atoms. Residue code: CYS>,\n",
       " <Residue containing 8 Atoms. Residue code: ASP>,\n",
       " <Residue containing 10 Atoms. Residue code: HIS>,\n",
       " <Residue containing 6 Atoms. Residue code: CYS>]"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "residues_close_to_zinc_2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It looks like the zinc is coordinated by two cysteines, an aspartate and a histidine residue."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Is Within"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This kind of operation is very common when analysing proteins. So we have some built-in methods for handling this on `Assembly` and `Polymer` objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Sulfur Atom (SG). Coordinates: (-4.322, -18.933, -17.640)>,\n",
       " <Oxygen Atom (OD2). Coordinates: (-4.771, -22.057, -19.213)>,\n",
       " <Nitrogen Atom (NE2). Coordinates: (-6.209, -19.569, -20.787)>,\n",
       " <Sulfur Atom (SG). Coordinates: (-7.753, -20.619, -17.709)>,\n",
       " <Zinc Atom (ZN). Coordinates: (-5.817, -20.172, -18.798)>]"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_protein_3qy1.is_within(3, zinc_1['ZN'])  # It takes a distance and a point"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Nitrogen Atom (N). Coordinates: (-7.278, -11.112, -10.445)>,\n",
       " <Carbon Atom (C). Coordinates: (-7.525, -10.089, -8.339)>,\n",
       " <Oxygen Atom (O). Coordinates: (-8.676, -9.655, -8.177)>,\n",
       " <Carbon Atom (CA). Coordinates: (-11.364, -9.364, -12.255)>,\n",
       " <Carbon Atom (CB). Coordinates: (-10.337, -10.223, -12.972)>,\n",
       " <Carbon Atom (CD). Coordinates: (-10.752, -12.047, -11.188)>,\n",
       " <Oxygen Atom (OE1). Coordinates: (-10.046, -11.618, -10.265)>,\n",
       " <Nitrogen Atom (ND2). Coordinates: (-11.667, -9.798, -7.931)>]"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_protein_3qy1.is_within(3, (-10, -10, -10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Sulfur Atom (SG). Coordinates: (-4.322, -18.933, -17.640)>,\n",
       " <Oxygen Atom (OD2). Coordinates: (-4.771, -22.057, -19.213)>,\n",
       " <Nitrogen Atom (NE2). Coordinates: (-6.209, -19.569, -20.787)>,\n",
       " <Sulfur Atom (SG). Coordinates: (-7.753, -20.619, -17.709)>,\n",
       " <Zinc Atom (ZN). Coordinates: (-5.817, -20.172, -18.798)>]"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_protein_3qy1['A'].is_within(3, zinc_1['ZN'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_protein_3qy1['B'].is_within(3, zinc_1['ZN']) \n",
    "# This list is empty as nothing on chain B is close to the zinc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a partner method to is `is_within`, every `Monomer` (this includes `Residues` and `Ligands`) has an `close_monomers` method. This returns all `Monomers` within a given cutoff value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Residue containing 6 Atoms. Residue code: CYS>,\n",
       " <Residue containing 8 Atoms. Residue code: ASP>,\n",
       " <Residue containing 10 Atoms. Residue code: HIS>,\n",
       " <Residue containing 6 Atoms. Residue code: CYS>,\n",
       " <Ligand containing 1 Atom. Ligand code: ZN>]"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "zinc_1.close_monomers(my_protein_3qy1) # default cutoff is 4.0 Å"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Residue containing 6 Atoms. Residue code: CYS>,\n",
       " <Residue containing 6 Atoms. Residue code: SER>,\n",
       " <Residue containing 8 Atoms. Residue code: ASP>,\n",
       " <Residue containing 6 Atoms. Residue code: SER>,\n",
       " <Residue containing 11 Atoms. Residue code: ARG>,\n",
       " <Residue containing 7 Atoms. Residue code: VAL>,\n",
       " <Residue containing 5 Atoms. Residue code: ALA>,\n",
       " <Residue containing 8 Atoms. Residue code: ASN>,\n",
       " <Residue containing 10 Atoms. Residue code: HIS>,\n",
       " <Residue containing 4 Atoms. Residue code: GLY>,\n",
       " <Residue containing 6 Atoms. Residue code: CYS>,\n",
       " <Residue containing 4 Atoms. Residue code: GLY>,\n",
       " <Residue containing 4 Atoms. Residue code: GLY>,\n",
       " <Residue containing 8 Atoms. Residue code: ILE>,\n",
       " <Ligand containing 1 Atom. Ligand code: ZN>,\n",
       " <Ligand containing 1 Atom. Ligand code: HOH>,\n",
       " <Ligand containing 1 Atom. Ligand code: HOH>,\n",
       " <Ligand containing 1 Atom. Ligand code: HOH>,\n",
       " <Ligand containing 1 Atom. Ligand code: HOH>]"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "zinc_1.close_monomers(my_protein_3qy1, cutoff=6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "zinc_1.close_monomers(my_helices_3qy1, cutoff=1) # Nothing is closer than 1 Å from the zinc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Geometry in ISAMBARD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are a range of tools in ISAMBARD for performing geometric operations. We've already covered distance, but other commonly used functions include `angle_between_vectors`, `dihedral`, `unit_vector`, `find_foot`, `radius_of_circumcircle`. Be sure to check out the source code if you need a specific geometric function or have a look through the documentation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `dihedral` function is probably the most useful of these for analysing proteins, so let's use it to measure some torsion angles. It requires 4 3D vectors to calculate the dihedral, again these can be `lists`, `tuples`, `numpy.arrays` or `Atoms`. **Note:** This method of calculating torsion angles is only as an example, see the Tagging tutorial for the proper, low-effort method!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "r1 = my_protein_3qy1['B'][4]\n",
    "r2 = my_protein_3qy1['B'][5]\n",
    "r3 = my_protein_3qy1['B'][6]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "omega = isambard.geometry.dihedral(r1['CA'], r1['C'], r2['N'], r2['CA'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "phi = isambard.geometry.dihedral(r1['C'], r2['N'], r2['CA'], r2['C'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "psi = isambard.geometry.dihedral(r2['N'], r2['CA'], r2['C'], r3['N'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-179.8002001034783 -64.10861163046212 -45.84373968479124\n"
     ]
    }
   ],
   "source": [
    "print(omega, phi, psi)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use it to calculate the $\\chi$ torsion angles too. R2 is leucine, so we can calculate the $\\chi_1$ and $\\chi_2$ angles:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "OrderedDict([('N',\n",
       "              <Nitrogen Atom (N). Coordinates: (-5.186, -2.004, -31.807)>),\n",
       "             ('CA',\n",
       "              <Carbon Atom (CA). Coordinates: (-4.911, -3.362, -31.310)>),\n",
       "             ('C', <Carbon Atom (C). Coordinates: (-5.985, -4.346, -31.786)>),\n",
       "             ('O', <Oxygen Atom (O). Coordinates: (-5.650, -5.434, -32.255)>),\n",
       "             ('CB',\n",
       "              <Carbon Atom (CB). Coordinates: (-4.788, -3.418, -29.770)>),\n",
       "             ('CG',\n",
       "              <Carbon Atom (CG). Coordinates: (-3.838, -2.437, -29.061)>),\n",
       "             ('CD1',\n",
       "              <Carbon Atom (CD1). Coordinates: (-3.653, -2.831, -27.613)>),\n",
       "             ('CD2',\n",
       "              <Carbon Atom (CD2). Coordinates: (-2.478, -2.359, -29.736)>)])"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "r2.atoms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "chi1 = isambard.geometry.dihedral(r2['N'], r2['CA'], r2['CB'], r2['CG'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "chi2 = isambard.geometry.dihedral(r2['CA'], r2['CB'], r2['CG'], r2['CD1'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-51.38040366349739 -168.70006206564494\n"
     ]
    }
   ],
   "source": [
    "print(chi1, chi2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our simple analysis shows that the leucine residue is in the gauche<sup>-</sup>/trans conformation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Summary and activities"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are lots of tools for making complex selections in ISAMBARD. Combined with the tools for geometry, detailed analysis can be performed on these selections."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Find all the residues that are:\n",
    "    1. within 5 Å of crystal water.\n",
    "    1. *not* within 5 Å of crystal water.\n",
    "1. Find how many cis-peptide bonds there are in this structure.\n",
    "1. Perform these activities on another PDB file."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}