Skip to content

Instantly share code, notes, and snippets.

@ChrisWellsWood
Created June 27, 2018 15:06
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save ChrisWellsWood/578fcea671acbb68d4a130315874027b to your computer and use it in GitHub Desktop.
Save ChrisWellsWood/578fcea671acbb68d4a130315874027b to your computer and use it in GitHub Desktop.
Upgrading to ISAMBARD 2
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Upgrading to ISAMBARD 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TL,DR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Development of ISAMBARD has now been moved into its own [organisation](https://github.com/isambard-uob)\n",
"* ISAMBARD no longer uses `.isambard_settings`, external dependencies must be available on PATH.\n",
"* ISAMBARD has been broken down into separate modules and submodules:\n",
" * `ampal` is now an independent (https://github.com/isambard-uob/ampal)\n",
" * `buff` is now an independent package and has been renamed `budeff` (https://github.com/isambard-uob/budeff)\n",
" * `isambard` consists of 4 packages: `specifications`, `modelling`, `evaluation` and `optimisation`. These need to\n",
" be individually imported\n",
"* The `ampal_parent` attribute has been renamed `parent`.\n",
"* The `helices` and `stands` methods have been removed. All DSSP functionality is now in `ampal.dssp`.\n",
"* The `pack_new_sequences`/`pack_new_sequence` methods have been removed. Please use\n",
" `isambard.modelling.pack_side_chains_scwrl` instead. Don't forget to `import isambard.modelling` before using this.\n",
"* The `buff_internal_eval` and `buff_interaction_eval` class methods have been removed from all the optimizers. Please use the default constructor and supply an evaluation function."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rationale"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Over the past year or so, the ISAMBARD code base has become increasingly awkward to develop, so I've taken the decision to perform a major version update which unfortunately broke backwards compatibility. There were a few major goals for the update:\n",
"\n",
"* Make the development process more transparent\n",
"* Simplify installation process\n",
"* Break the project down into smaller, more convenient modules\n",
"* Fix some bad API design decisions!\n",
"* Clearly delineate between functionality performed by ISAMBARD and external programs\n",
"* Remove some of the \"magic\" that hid what ISAMBARD was actually doing\n",
"\n",
"I'll go over each of these things and explain what's changed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Making the Development Process More Transparent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to make the development process a bit more transparent, we've created an organisation specifically for ISAMBARD, which is available [here](https://github.com/isambard-uob). Now we have a fully public organisation, we can have multiple repositories associated with ISAMBARD (more on that soon) and the development process can become slightly less decentralised.\n",
"\n",
"I also took the opportunity to change the version numbering over to proper [semantic versioning](https://semver.org/). Semantic versioning makes it easy to see what an update to ISAMBARD might contain. The version number consists of 3 parts the major, minor and path number (MAJOR.MINOR.PATCH). For example ISAMBARD 2.1.0 would be major version 2, minor version 1 and patch number 0. The patch number is incremented when ISAMBARD gets any bug fixes. The minor patch number is incremented when new features are added. The major version number is incremented when the update breaks backwards compatibility.\n",
"\n",
"Whenever you perform analysis or design using ISAMBARD, be sure to note the version of ISAMBARD that you've used and include this information in your supplementary information when you publish, this ensures that your work can be reproduced. I highly recommend that you use [`pipenv`](https://github.com/pypa/pipenv) to manage this, but you can also use `pip freeze` to dump out package versions.\n",
"\n",
"Previously we did use semantic versioning for our internal development builds of ISAMBARD, but used a year based version number of releases _i.e._ `2017.3.0`. These versions have been removed from PYPI, but they are still available through the [old ISAMBARD repository](https://github.com/woolfson-group/isambard)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simplifying the Installation Process"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before ISAMBARD 2, a `.isambard_settings` file was required to use ISAMBARD. This file contained paths to external programs such as SCWRL as well as various options for running these programs. The `.isambard_settings` file was the single largest source of problems during installation! As a result I decided to get rid of it. If you use a module that requires an external program, ISAMBARD now expects that program to be available on your system path with its default name i.e. `Scwrl4` for SCWRL, `mkdssp` for DSSP. If your version of dssp is called something else, thing about creating an alias for it in your `.bashrc`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Breaking Down the Project into Smaller Modules"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ISAMBARD project is getting pretty large now and contains a range of quite varied functionality. I decided to break down the functionality into submodules, and where appropriate make whole new modules. As a result we have two new modules [AMPAL](https://github.com/isambard-uob/ampal) and [BUDEFF](https://github.com/isambard-uob/budeff). This means that if you're not planning on doing any modelling, you can just use AMPAL by itself:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>\n"
]
}
],
"source": [
"import ampal\n",
"\n",
"my_structure = ampal.load_pdb('3qy1.pdb')\n",
"print(my_structure)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Most of the functionality that was previously in the `isambard.ampal` submodule is now in this standalone module, some functionality is now in ISAMBARD submodules with the remainder did not make it accross. If there's anything that missing that you used to use, please comment on [this issue](https://github.com/isambard-uob/isambard/issues/4).\n",
"\n",
"AMPAL is obviously still a dependency of ISAMBARD, and so it's still automatically imported:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on package ampal:\n",
"\n",
"NAME\n",
" ampal\n",
"\n",
"PACKAGE CONTENTS\n",
" align\n",
" amino_acids\n",
" ampal_warnings\n",
" analyse_protein\n",
" assembly\n",
" base_ampal\n",
" data\n",
" dssp\n",
" geometry\n",
" interactions\n",
" ligands\n",
" nucleic_acid\n",
" pdb_parser\n",
" protein\n",
" pseudo_atoms\n",
"\n",
"VERSION\n",
" 1.2.0\n",
"\n",
"FILE\n",
" /home/cw12401/code/work/ampal/src/ampal/__init__.py\n",
"\n",
"\n"
]
}
],
"source": [
"import isambard\n",
"\n",
"help(isambard.ampal)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All the rest of the ISAMBARD functionality has been moved into 4 submodules: `specifications`, `modelling`, `evaluation` and `optimisation`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on package isambard:\n",
"\n",
"NAME\n",
" isambard\n",
"\n",
"PACKAGE CONTENTS\n",
" evaluation (package)\n",
" modelling (package)\n",
" optimisation (package)\n",
" specifications (package)\n",
"\n",
"FILE\n",
" /home/cw12401/code/work/isambard/src/isambard/__init__.py\n",
"\n",
"\n"
]
}
],
"source": [
"help(isambard)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each of these packages need to be imported explicitly if you want to use them:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import isambard.specifications as specs\n",
"import isambard.modelling as modelling"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"my_cc = specs.CoiledCoil(2)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['EIAALKQEIAALKKENAALKWEIAALKQ', 'EIAALKQEIAALKKENAALKWEIAALKQ']"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_cc.basis_set_sequences"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Assembly containing 2 Polypeptides>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"modelling.pack_side_chains_scwrl(my_cc, ['EIAALKQEIAALKKENAALKWEIAALKQ']*2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, the documentation for both [AMPAL](https://isambard-uob.github.io/ampal/) and [ISAMBARD](https://isambard-uob.github.io/isambard/) has also been revamped, so please use these for the most up-to-date documentation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fix Some Bad API Design Decisions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are some small changes to the API that fix things that were annoying! Most important is `ampal_parent` being renamed `parent`:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"ca = my_structure[0][0]['CA']"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"ename": "AttributeError",
"evalue": "'Atom' object has no attribute 'ampal_parent'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-9-8c5a3c02f41c>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mca\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mampal_parent\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m: 'Atom' object has no attribute 'ampal_parent'"
]
}
],
"source": [
"ca.ampal_parent"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<Carbon Atom (CA). Coordinates: (15.518, -30.153, -25.207)>\n",
"<Residue containing 8 Atoms. Residue code: ASP>\n",
"<Polypeptide containing 215 Residues. Sequence: DIDTLISNNALW...>\n",
"<Assembly (3qy1) containing 2 Polypeptides, 449 Ligands>\n"
]
}
],
"source": [
"print(ca)\n",
"print(ca.parent)\n",
"print(ca.parent.parent)\n",
"print(ca.parent.parent.parent)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delineate Between ISAMBARD and External Programs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At times it was difficult to distinguish between ISAMBARD functionality and functionality provided by external dependencies as they were tightly integrated. For example, in previous versions there was the `helices` and `strands` method on all AMPAL `Assemblies` and `Polypeptides`. This was convenient as you could type `my_structure.helices` and receive a list of all the helices. However, it was not clear that this functionality required DSSP, which was run in the background and the output was parsed to extract regions of secondary structure. The problem with this approach is that when that functionality is broken, it's not clear exactly what's not working, is it AMPAL or DSSP?\n",
"\n",
"To address this problem, any functionality that uses an external program has been moved into a separate module. The two most important programs are probably SCWRL and DSSP."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"SCWRL functionality can be found inside `isambard.modelling.scwrl`, the most important function is `pack_side_chains_scwrl`. This function takes an `Assembly` and returns a _**new**_ `Assembly`, leaving the input `Assembly` untouched. `pack_side_chains_scwrl` also has an optional arguement for specifying whether the rigid rotamer or flexible rotamer model is used."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['GGGGGGGGGGGGGGGGGGGGGGGGGGGG', 'GGGGGGGGGGGGGGGGGGGGGGGGGGGG']\n"
]
}
],
"source": [
"print(my_cc.sequences)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"packed_cc = modelling.pack_side_chains_scwrl(my_cc, ['EIAALKQEIAALKKENAALKWEIAALKQ']*2, rigid_rotamer_model=False)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['GGGGGGGGGGGGGGGGGGGGGGGGGGGG', 'GGGGGGGGGGGGGGGGGGGGGGGGGGGG']\n"
]
}
],
"source": [
"print(my_cc.sequences)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['EIAALKQEIAALKKENAALKWEIAALKQ', 'EIAALKQEIAALKKENAALKWEIAALKQ']\n"
]
}
],
"source": [
"print(packed_cc.sequences)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All functionality related to DSSP has been moved into the `ampal.dssp` module. You can use this to tag secondary structure:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"isambard.ampal.dssp.tag_dssp_data(packed_cc)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'dssp_data': {'ss_definition': 'H', 'solvent_accessibility': 47, 'phi': -64.4, 'psi': -40.6}}\n"
]
}
],
"source": [
"print(packed_cc['A']['24'].tags)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"EIAALKQEIAALKKENAALKWEIAALKQ\n",
" HHHHHHHHHHHHHHHHHHHHHHHHHH \n"
]
}
],
"source": [
"print(packed_cc[0].sequence)\n",
"print(''.join(x.tags['dssp_data']['ss_definition'] for x in packed_cc[0]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Removing the Magic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we were making some of the classes in ISAMBARD, we added class methods that helped to streamline use for what _we thought_ the default use case would be. It turns out this obfuscated what was actually going on a lot of the time and made it harder to modify or extend ISAMBARD's functionality. The clearest example of this is in the optimisation module."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `buff_internal_eval` and `buff_interaction_eval` class methods have been removed from all the optimizers, leaving just the default constructor. This means that you have to specify an evaluation function that is used:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"gen\tevals\tavg \tstd \tmin \tmax \n",
"0 \t75 \t-838.005\t45.2509\t-945.125\t-769.93\n",
"1 \t66 \t-879.655\t31.2709\t-951.853\t-826.846\n",
"2 \t48 \t-899.646\t25.7537\t-971.998\t-859.392\n",
"3 \t54 \t-911.689\t24.3328\t-971.998\t-879.196\n",
"4 \t63 \t-933.19 \t21.2215\t-972.672\t-900.244\n",
"Evaluated 406 models in total in 0:00:28.426540\n",
"Best fitness is (-972.6722789788769,)\n",
"Best parameters are [2, 28, 4.588555668410354, 149.00936761585876, 277.38397547340793]\n"
]
}
],
"source": [
"import budeff\n",
"import isambard.optimisation.evo_optimizers as ev_opts\n",
"from isambard.optimisation.evo_optimizers import Parameter\n",
"\n",
"specification = specs.CoiledCoil.from_parameters\n",
"sequences = [\n",
" 'EIAALKQEIAALKKENAALKWEIAALKQ',\n",
" 'EIAALKQEIAALKKENAALKWEIAALKQ'\n",
"]\n",
"parameters = [\n",
" Parameter.static('Oligomeric State', 2),\n",
" Parameter.static('Helix Length', 28),\n",
" Parameter.dynamic('Radius', 5.0, 1.0),\n",
" Parameter.dynamic('Pitch', 200, 60),\n",
" Parameter.dynamic('PhiCA', 283, 27), # 283 is equivalent a g position\n",
"]\n",
"\n",
"def get_buff_total_energy(ampal_object):\n",
" return budeff.get_internal_energy(ampal_object).total_energy\n",
"\n",
"opt_ga = ev_opts.GA(specification, sequences, parameters, get_buff_total_energy)\n",
"opt_ga.run_opt(100, 5, cores=8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This might seem a pain, but it makes it pretty obvious how you'd use a different scoring function. Let optimize using the BUDEFF total energy and Mike Levitt's hydrophobic fitness function:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"import isambard.evaluation\n",
"\n",
"def hf_scaled(ampal_object):\n",
" bude_internal = budeff.get_internal_energy(ampal_object).total_energy\n",
" hf = isambard.evaluation.calculate_hydrophobic_fitness(ampal_object)\n",
" return bude_internal * (-hf)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"gen\tevals\tavg \tstd \tmin \tmax \n",
"0 \t81 \t-15013.7\t1704.28\t-18843.9\t-11338.6\n",
"1 \t63 \t-16533.3\t882.483\t-18843.9\t-14992.6\n",
"2 \t71 \t-17065 \t623.827\t-18843.9\t-16213.5\n",
"3 \t66 \t-17481.1\t506.559\t-18902.5\t-16685.7\n",
"4 \t62 \t-17768.5\t499.436\t-18902.7\t-17048 \n",
"Evaluated 443 models in total in 0:00:30.869721\n",
"Best fitness is (-18902.68646835705,)\n",
"Best parameters are [2, 28, 4.413133348291819, 161.59287629657942, 275.0212384062046]\n"
]
}
],
"source": [
"opt_ga = ev_opts.GA(specification, sequences, parameters, hf_scaled)\n",
"opt_ga.run_opt(100, 5, cores=8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hopefully with this example you can see why it's advantageous to use the full interface. I was approached by many people asking if this type of thing was possible, which it always has been, but of course it was hidden behind the other class methods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusions\n",
"\n",
"Sorry I've broken your code, but hopefully you can see why this has been a useful and necessary process. If you have any problems please contact me on the [GitHub issues tracker](https://github.com/isambard-uob/isambard/issues), by email or on Twitter (@ChrisWellsWood)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment