Created
June 15, 2023 10:58
-
-
Save konstin/a3c4950eae3634ee9417d671beb9212b to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"execution_count": null, | |
"metadata": { | |
"colab_type": "text", | |
"id": "view-in-github" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"execution_count": null, | |
"metadata": { | |
"id": "G4yBrceuFbf3" | |
}, | |
"source": [ | |
"<img src=\"https://raw.githubusercontent.com/sokrypton/ColabFold/main/.github/ColabFold_Marv_Logo_Small.png\" height=\"200\" align=\"right\" style=\"height:240px\">\n", | |
"\n", | |
"##ColabFold v1.5.2-patch: AlphaFold2 using MMseqs2\n", | |
"\n", | |
"Easy to use protein structure and complex prediction using [AlphaFold2](https://www.nature.com/articles/s41586-021-03819-2) and [Alphafold2-multimer](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1). Sequence alignments/templates are generated through [MMseqs2](mmseqs.com) and [HHsearch](https://github.com/soedinglab/hh-suite). For more details, see <a href=\"#Instructions\">bottom</a> of the notebook, checkout the [ColabFold GitHub](https://github.com/sokrypton/ColabFold) and read our manuscript.\n", | |
"Old versions: [v1.4](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.4.0/AlphaFold2.ipynb), [v1.5.1](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.1/AlphaFold2.ipynb)\n", | |
"\n", | |
"[Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: Making protein folding accessible to all.\n", | |
"*Nature Methods*, 2022](https://www.nature.com/articles/s41592-022-01488-1)\n", | |
"\n", | |
"-----------\n", | |
"\n", | |
"### News\n", | |
"- <b><font color='green'>2023/06/12: New databases! UniRef30 updated to 2023_02 and PDB to 230517. We now use PDB100 instead of PDB70 (see [notes](#pdb100)).</font></b>\n", | |
"- <b><font color='green'>2023/06/12: We introduced a new default pairing strategy: Previously, for multimer predictions with more than 2 chains, we only pair if all sequences taxonomically match (\"complete\" pairing). The new default \"greedy\" strategy pairs any taxonomically matching subsets.</font></b>" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"cellView": "form", | |
"id": "kOblAo-xetgx" | |
}, | |
"outputs": [], | |
"source": [ | |
"#@title Input protein sequence(s), then hit `Runtime` -> `Run all`\n", | |
"from google.colab import files\n", | |
"import os\n", | |
"import re\n", | |
"import hashlib\n", | |
"import random\n", | |
"\n", | |
"from sys import version_info\n", | |
"python_version = f\"{version_info.major}.{version_info.minor}\"\n", | |
"\n", | |
"def add_hash(x,y):\n", | |
" return x+\"_\"+hashlib.sha1(y.encode()).hexdigest()[:5]\n", | |
"\n", | |
"query_sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK' #@param {type:\"string\"}\n", | |
"#@markdown - Use `:` to specify inter-protein chainbreaks for **modeling complexes** (supports homo- and hetro-oligomers). For example **PI...SK:PI...SK** for a homodimer\n", | |
"jobname = 'test' #@param {type:\"string\"}\n", | |
"# number of models to use\n", | |
"num_relax = 0 #@param [0, 1, 5] {type:\"raw\"}\n", | |
"#@markdown - specify how many of the top ranked structures to relax using amber\n", | |
"template_mode = \"none\" #@param [\"none\", \"pdb100\",\"custom\"]\n", | |
"#@markdown - `none` = no template information is used. `pdb100` = detect templates in pdb100 (see [notes](#pdb100)). `custom` - upload and search own templates (PDB or mmCIF format, see [notes](#custom_templates))\n", | |
"\n", | |
"use_amber = num_relax > 0\n", | |
"\n", | |
"# remove whitespaces\n", | |
"query_sequence = \"\".join(query_sequence.split())\n", | |
"\n", | |
"basejobname = \"\".join(jobname.split())\n", | |
"basejobname = re.sub(r'\\W+', '', basejobname)\n", | |
"jobname = add_hash(basejobname, query_sequence)\n", | |
"\n", | |
"# check if directory with jobname exists\n", | |
"def check(folder):\n", | |
" if os.path.exists(folder):\n", | |
" return False\n", | |
" else:\n", | |
" return True\n", | |
"if not check(jobname):\n", | |
" n = 0\n", | |
" while not check(f\"{jobname}_{n}\"): n += 1\n", | |
" jobname = f\"{jobname}_{n}\"\n", | |
"\n", | |
"# make directory to save results\n", | |
"os.makedirs(jobname, exist_ok=True)\n", | |
"\n", | |
"# save queries\n", | |
"queries_path = os.path.join(jobname, f\"{jobname}.csv\")\n", | |
"with open(queries_path, \"w\") as text_file:\n", | |
" text_file.write(f\"id,sequence\\n{jobname},{query_sequence}\")\n", | |
"\n", | |
"if template_mode == \"pdb100\":\n", | |
" use_templates = True\n", | |
" custom_template_path = None\n", | |
"elif template_mode == \"custom\":\n", | |
" custom_template_path = os.path.join(jobname,f\"template\")\n", | |
" os.makedirs(custom_template_path, exist_ok=True)\n", | |
" uploaded = files.upload()\n", | |
" use_templates = True\n", | |
" for fn in uploaded.keys():\n", | |
" os.rename(fn,os.path.join(custom_template_path,fn))\n", | |
"else:\n", | |
" custom_template_path = None\n", | |
" use_templates = False\n", | |
"\n", | |
"print(\"jobname\",jobname)\n", | |
"print(\"sequence\",query_sequence)\n", | |
"print(\"length\",len(query_sequence.replace(\":\",\"\")))\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"cellView": "form", | |
"id": "AzIKiDiCaHAn" | |
}, | |
"outputs": [], | |
"source": [ | |
"#@title Install dependencies\n", | |
"%%time\n", | |
"import os\n", | |
"USE_AMBER = use_amber\n", | |
"USE_TEMPLATES = use_templates\n", | |
"PYTHON_VERSION = python_version\n", | |
"\n", | |
"if not os.path.isfile(\"COLABFOLD_READY\"):\n", | |
" print(\"installing colabfold...\")\n", | |
" os.system(\"pip install -q --no-warn-conflicts 'colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold'\")\n", | |
" os.system(\"ln -s /usr/local/lib/python3.*/dist-packages/colabfold colabfold\")\n", | |
" os.system(\"ln -s /usr/local/lib/python3.*/dist-packages/alphafold alphafold\")\n", | |
" # patch for jax > 0.3.25\n", | |
" os.system(\"sed -i 's/weights = jax.nn.softmax(logits)/logits=jnp.clip(logits,-1e8,1e8);weights=jax.nn.softmax(logits)/g' alphafold/model/modules.py\")\n", | |
" os.system(\"touch COLABFOLD_READY\")\n", | |
"\n", | |
"if USE_AMBER or USE_TEMPLATES:\n", | |
" if not os.path.isfile(\"CONDA_READY\"):\n", | |
" print(\"installing conda...\")\n", | |
" os.system(\"wget -qnc https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh\")\n", | |
" os.system(\"bash Mambaforge-Linux-x86_64.sh -bfp /usr/local\")\n", | |
" os.system(\"mamba config --set auto_update_conda false\")\n", | |
" os.system(\"touch CONDA_READY\")\n", | |
"\n", | |
"if USE_TEMPLATES and not os.path.isfile(\"HH_READY\") and USE_AMBER and not os.path.isfile(\"AMBER_READY\"):\n", | |
" print(\"installing hhsuite and amber...\")\n", | |
" os.system(f\"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 openmm=7.7.0 python='{PYTHON_VERSION}' pdbfixer\")\n", | |
" os.system(\"touch HH_READY\")\n", | |
" os.system(\"touch AMBER_READY\")\n", | |
"else:\n", | |
" if USE_TEMPLATES and not os.path.isfile(\"HH_READY\"):\n", | |
" print(\"installing hhsuite...\")\n", | |
" os.system(f\"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 python='{PYTHON_VERSION}'\")\n", | |
" os.system(\"touch HH_READY\")\n", | |
" if USE_AMBER and not os.path.isfile(\"AMBER_READY\"):\n", | |
" print(\"installing amber...\")\n", | |
" os.system(f\"mamba install -y -c conda-forge openmm=7.7.0 python='{PYTHON_VERSION}' pdbfixer\")\n", | |
" os.system(\"touch AMBER_READY\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"cellView": "form", | |
"id": "C2_sh2uAonJH" | |
}, | |
"outputs": [], | |
"source": [ | |
"#@markdown ### MSA options (custom MSA upload, single sequence, pairing mode)\n", | |
"msa_mode = \"mmseqs2_uniref_env\" #@param [\"mmseqs2_uniref_env\", \"mmseqs2_uniref\",\"single_sequence\",\"custom\"]\n", | |
"pair_mode = \"unpaired_paired\" #@param [\"unpaired_paired\",\"paired\",\"unpaired\"] {type:\"string\"}\n", | |
"#@markdown - \"unpaired_paired\" = pair sequences from same species + unpaired MSA, \"unpaired\" = seperate MSA for each chain, \"paired\" - only use paired sequences.\n", | |
"\n", | |
"# decide which a3m to use\n", | |
"if \"mmseqs2\" in msa_mode:\n", | |
" a3m_file = os.path.join(jobname,f\"{jobname}.a3m\")\n", | |
"\n", | |
"elif msa_mode == \"custom\":\n", | |
" a3m_file = os.path.join(jobname,f\"{jobname}.custom.a3m\")\n", | |
" if not os.path.isfile(a3m_file):\n", | |
" custom_msa_dict = files.upload()\n", | |
" custom_msa = list(custom_msa_dict.keys())[0]\n", | |
" header = 0\n", | |
" import fileinput\n", | |
" for line in fileinput.FileInput(custom_msa,inplace=1):\n", | |
" if line.startswith(\">\"):\n", | |
" header = header + 1\n", | |
" if not line.rstrip():\n", | |
" continue\n", | |
" if line.startswith(\">\") == False and header == 1:\n", | |
" query_sequence = line.rstrip()\n", | |
" print(line, end='')\n", | |
"\n", | |
" os.rename(custom_msa, a3m_file)\n", | |
" queries_path=a3m_file\n", | |
" print(f\"moving {custom_msa} to {a3m_file}\")\n", | |
"\n", | |
"else:\n", | |
" a3m_file = os.path.join(jobname,f\"{jobname}.single_sequence.a3m\")\n", | |
" with open(a3m_file, \"w\") as text_file:\n", | |
" text_file.write(\">1\\n%s\" % query_sequence)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"cellView": "form", | |
"id": "ADDuaolKmjGW" | |
}, | |
"outputs": [], | |
"source": [ | |
"#@markdown ### Advanced settings\n", | |
"model_type = \"auto\" #@param [\"auto\", \"alphafold2_ptm\", \"alphafold2_multimer_v1\", \"alphafold2_multimer_v2\", \"alphafold2_multimer_v3\"]\n", | |
"#@markdown - if `auto` selected, will use `alphafold2_ptm` for monomer prediction and `alphafold2_multimer_v3` for complex prediction.\n", | |
"#@markdown Any of the mode_types can be used (regardless if input is monomer or complex).\n", | |
"num_recycles = \"auto\" #@param [\"auto\", \"0\", \"1\", \"3\", \"6\", \"12\", \"24\", \"48\"]\n", | |
"recycle_early_stop_tolerance = \"auto\" #@param [\"auto\", \"0.0\", \"0.5\", \"1.0\"]\n", | |
"#@markdown - if `auto` selected, will use 20 recycles if `model_type=alphafold2_multimer_v3` (with tol=0.5), all else 3 recycles (with tol=0.0).\n", | |
"pairing_strategy = \"greedy\" #@param [\"greedy\", \"complete\"] {type:\"string\"}\n", | |
"#@markdown - `greedy` = pair any taxonomically matching subsets, `complete` = all sequences have to match in one line.\n", | |
"\n", | |
"\n", | |
"#@markdown #### Sample settings\n", | |
"#@markdown - enable dropouts and increase number of seeds to sample predictions from uncertainty of the model.\n", | |
"#@markdown - decrease `max_msa` to increase uncertainity\n", | |
"max_msa = \"auto\" #@param [\"auto\", \"512:1024\", \"256:512\", \"64:128\", \"32:64\", \"16:32\"]\n", | |
"num_seeds = 1 #@param [1,2,4,8,16] {type:\"raw\"}\n", | |
"use_dropout = False #@param {type:\"boolean\"}\n", | |
"\n", | |
"num_recycles = None if num_recycles == \"auto\" else int(num_recycles)\n", | |
"recycle_early_stop_tolerance = None if recycle_early_stop_tolerance == \"auto\" else float(recycle_early_stop_tolerance)\n", | |
"if max_msa == \"auto\": max_msa = None\n", | |
"\n", | |
"#@markdown #### Save settings\n", | |
"save_all = False #@param {type:\"boolean\"}\n", | |
"save_recycles = False #@param {type:\"boolean\"}\n", | |
"save_to_google_drive = False #@param {type:\"boolean\"}\n", | |
"#@markdown - if the save_to_google_drive option was selected, the result zip will be uploaded to your Google Drive\n", | |
"dpi = 200 #@param {type:\"integer\"}\n", | |
"#@markdown - set dpi for image resolution\n", | |
"\n", | |
"if save_to_google_drive:\n", | |
" from pydrive.drive import GoogleDrive\n", | |
" from pydrive.auth import GoogleAuth\n", | |
" from google.colab import auth\n", | |
" from oauth2client.client import GoogleCredentials\n", | |
" auth.authenticate_user()\n", | |
" gauth = GoogleAuth()\n", | |
" gauth.credentials = GoogleCredentials.get_application_default()\n", | |
" drive = GoogleDrive(gauth)\n", | |
" print(\"You are logged into Google Drive and are good to go!\")\n", | |
"\n", | |
"#@markdown Don't forget to hit `Runtime` -> `Run all` after updating the form." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"cellView": "form", | |
"id": "mbaIO9pWjaN0" | |
}, | |
"outputs": [], | |
"source": [ | |
"#@title Run Prediction\n", | |
"display_images = True #@param {type:\"boolean\"}\n", | |
"\n", | |
"import sys\n", | |
"import warnings\n", | |
"warnings.simplefilter(action='ignore', category=FutureWarning)\n", | |
"from Bio import BiopythonDeprecationWarning\n", | |
"warnings.simplefilter(action='ignore', category=BiopythonDeprecationWarning)\n", | |
"from pathlib import Path\n", | |
"from colabfold.download import download_alphafold_params, default_data_dir\n", | |
"from colabfold.utils import setup_logging\n", | |
"from colabfold.batch import get_queries, run, set_model_type\n", | |
"from colabfold.plot import plot_msa_v2\n", | |
"\n", | |
"import os\n", | |
"import numpy as np\n", | |
"try:\n", | |
" K80_chk = os.popen('nvidia-smi | grep \"Tesla K80\" | wc -l').read()\n", | |
"except:\n", | |
" K80_chk = \"0\"\n", | |
" pass\n", | |
"if \"1\" in K80_chk:\n", | |
" print(\"WARNING: found GPU Tesla K80: limited to total length < 1000\")\n", | |
" if \"TF_FORCE_UNIFIED_MEMORY\" in os.environ:\n", | |
" del os.environ[\"TF_FORCE_UNIFIED_MEMORY\"]\n", | |
" if \"XLA_PYTHON_CLIENT_MEM_FRACTION\" in os.environ:\n", | |
" del os.environ[\"XLA_PYTHON_CLIENT_MEM_FRACTION\"]\n", | |
"\n", | |
"from colabfold.colabfold import plot_protein\n", | |
"from pathlib import Path\n", | |
"import matplotlib.pyplot as plt\n", | |
"\n", | |
"# For some reason we need that to get pdbfixer to import\n", | |
"if use_amber and f\"/usr/local/lib/python{python_version}/site-packages/\" not in sys.path:\n", | |
" sys.path.insert(0, f\"/usr/local/lib/python{python_version}/site-packages/\")\n", | |
"\n", | |
"def input_features_callback(input_features):\n", | |
" if display_images:\n", | |
" plot_msa_v2(input_features)\n", | |
" plt.show()\n", | |
" plt.close()\n", | |
"\n", | |
"def prediction_callback(protein_obj, length,\n", | |
" prediction_result, input_features, mode):\n", | |
" _model_name, relaxed = mode\n", | |
" if not relaxed:\n", | |
" if display_images:\n", | |
" plot_protein(protein_obj, Ls=length, dpi=150)\n", | |
" plt.show()\n", | |
" plt.close()\n", | |
"\n", | |
"result_dir = jobname\n", | |
"if 'logging_setup' not in globals():\n", | |
" setup_logging(Path(os.path.join(jobname,\"log.txt\")))\n", | |
" logging_setup = True\n", | |
"\n", | |
"queries, is_complex = get_queries(queries_path)\n", | |
"model_type = set_model_type(is_complex, model_type)\n", | |
"\n", | |
"if \"multimer\" in model_type and max_msa is not None:\n", | |
" use_cluster_profile = False\n", | |
"else:\n", | |
" use_cluster_profile = True\n", | |
"\n", | |
"download_alphafold_params(model_type, Path(\".\"))\n", | |
"results = run(\n", | |
" queries=queries,\n", | |
" result_dir=result_dir,\n", | |
" use_templates=use_templates,\n", | |
" custom_template_path=custom_template_path,\n", | |
" num_relax=num_relax,\n", | |
" msa_mode=msa_mode,\n", | |
" model_type=model_type,\n", | |
" num_models=5,\n", | |
" num_recycles=num_recycles,\n", | |
" recycle_early_stop_tolerance=recycle_early_stop_tolerance,\n", | |
" num_seeds=num_seeds,\n", | |
" use_dropout=use_dropout,\n", | |
" model_order=[1,2,3,4,5],\n", | |
" is_complex=is_complex,\n", | |
" data_dir=Path(\".\"),\n", | |
" keep_existing_results=False,\n", | |
" rank_by=\"auto\",\n", | |
" pair_mode=pair_mode,\n", | |
" pairing_strategy=pairing_strategy,\n", | |
" stop_at_score=float(100),\n", | |
" prediction_callback=prediction_callback,\n", | |
" dpi=dpi,\n", | |
" zip_results=False,\n", | |
" save_all=save_all,\n", | |
" max_msa=max_msa,\n", | |
" use_cluster_profile=use_cluster_profile,\n", | |
" input_features_callback=input_features_callback,\n", | |
" save_recycles=save_recycles,\n", | |
")\n", | |
"results_zip = f\"{jobname}.result.zip\"\n", | |
"os.system(f\"zip -r {results_zip} {jobname}\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"cellView": "form", | |
"id": "KK7X9T44pWb7" | |
}, | |
"outputs": [], | |
"source": [ | |
"#@title Display 3D structure {run: \"auto\"}\n", | |
"import py3Dmol\n", | |
"import glob\n", | |
"import matplotlib.pyplot as plt\n", | |
"from colabfold.colabfold import plot_plddt_legend\n", | |
"from colabfold.colabfold import pymol_color_list, alphabet_list\n", | |
"rank_num = 1 #@param [\"1\", \"2\", \"3\", \"4\", \"5\"] {type:\"raw\"}\n", | |
"color = \"lDDT\" #@param [\"chain\", \"lDDT\", \"rainbow\"]\n", | |
"show_sidechains = False #@param {type:\"boolean\"}\n", | |
"show_mainchains = False #@param {type:\"boolean\"}\n", | |
"\n", | |
"tag = results[\"rank\"][0][rank_num - 1]\n", | |
"jobname_prefix = \".custom\" if msa_mode == \"custom\" else \"\"\n", | |
"pdb_filename = f\"{jobname}/{jobname}{jobname_prefix}_unrelaxed_{tag}.pdb\"\n", | |
"pdb_file = glob.glob(pdb_filename)\n", | |
"\n", | |
"def show_pdb(rank_num=1, show_sidechains=False, show_mainchains=False, color=\"lDDT\"):\n", | |
" view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js',)\n", | |
" view.addModel(open(pdb_file[0],'r').read(),'pdb')\n", | |
"\n", | |
" if color == \"lDDT\":\n", | |
" view.setStyle({'cartoon': {'colorscheme': {'prop':'b','gradient': 'roygb','min':50,'max':90}}})\n", | |
" elif color == \"rainbow\":\n", | |
" view.setStyle({'cartoon': {'color':'spectrum'}})\n", | |
" elif color == \"chain\":\n", | |
" chains = len(queries[0][1]) + 1 if is_complex else 1\n", | |
" for n,chain,color in zip(range(chains),alphabet_list,pymol_color_list):\n", | |
" view.setStyle({'chain':chain},{'cartoon': {'color':color}})\n", | |
"\n", | |
" if show_sidechains:\n", | |
" BB = ['C','O','N']\n", | |
" view.addStyle({'and':[{'resn':[\"GLY\",\"PRO\"],'invert':True},{'atom':BB,'invert':True}]},\n", | |
" {'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", | |
" view.addStyle({'and':[{'resn':\"GLY\"},{'atom':'CA'}]},\n", | |
" {'sphere':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", | |
" view.addStyle({'and':[{'resn':\"PRO\"},{'atom':['C','O'],'invert':True}]},\n", | |
" {'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", | |
" if show_mainchains:\n", | |
" BB = ['C','O','N','CA']\n", | |
" view.addStyle({'atom':BB},{'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", | |
"\n", | |
" view.zoomTo()\n", | |
" return view\n", | |
"\n", | |
"show_pdb(rank_num, show_sidechains, show_mainchains, color).show()\n", | |
"if color == \"lDDT\":\n", | |
" plot_plddt_legend().show()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"cellView": "form", | |
"id": "11l8k--10q0C" | |
}, | |
"outputs": [], | |
"source": [ | |
"#@title Plots {run: \"auto\"}\n", | |
"from IPython.display import display, HTML\n", | |
"import base64\n", | |
"from html import escape\n", | |
"\n", | |
"# see: https://stackoverflow.com/a/53688522\n", | |
"def image_to_data_url(filename):\n", | |
" ext = filename.split('.')[-1]\n", | |
" prefix = f'data:image/{ext};base64,'\n", | |
" with open(filename, 'rb') as f:\n", | |
" img = f.read()\n", | |
" return prefix + base64.b64encode(img).decode('utf-8')\n", | |
"\n", | |
"pae = image_to_data_url(os.path.join(jobname,f\"{jobname}{jobname_prefix}_pae.png\"))\n", | |
"cov = image_to_data_url(os.path.join(jobname,f\"{jobname}{jobname_prefix}_coverage.png\"))\n", | |
"plddt = image_to_data_url(os.path.join(jobname,f\"{jobname}{jobname_prefix}_plddt.png\"))\n", | |
"display(HTML(f\"\"\"\n", | |
"<style>\n", | |
" img {{\n", | |
" float:left;\n", | |
" }}\n", | |
" .full {{\n", | |
" max-width:100%;\n", | |
" }}\n", | |
" .half {{\n", | |
" max-width:50%;\n", | |
" }}\n", | |
" @media (max-width:640px) {{\n", | |
" .half {{\n", | |
" max-width:100%;\n", | |
" }}\n", | |
" }}\n", | |
"</style>\n", | |
"<div style=\"max-width:90%; padding:2em;\">\n", | |
" <h1>Plots for {escape(jobname)}</h1>\n", | |
" <img src=\"{pae}\" class=\"full\" />\n", | |
" <img src=\"{cov}\" class=\"half\" />\n", | |
" <img src=\"{plddt}\" class=\"half\" />\n", | |
"</div>\n", | |
"\"\"\"))\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"cellView": "form", | |
"id": "33g5IIegij5R" | |
}, | |
"outputs": [], | |
"source": [ | |
"#@title Package and download results\n", | |
"#@markdown If you are having issues downloading the result archive, try disabling your adblocker and run this cell again. If that fails click on the little folder icon to the left, navigate to file: `jobname.result.zip`, right-click and select \\\"Download\\\" (see [screenshot](https://pbs.twimg.com/media/E6wRW2lWUAEOuoe?format=jpg&name=small)).\n", | |
"\n", | |
"if msa_mode == \"custom\":\n", | |
" print(\"Don't forget to cite your custom MSA generation method.\")\n", | |
"\n", | |
"files.download(f\"{jobname}.result.zip\")\n", | |
"\n", | |
"if save_to_google_drive == True and drive:\n", | |
" uploaded = drive.CreateFile({'title': f\"{jobname}.result.zip\"})\n", | |
" uploaded.SetContentFile(f\"{jobname}.result.zip\")\n", | |
" uploaded.Upload()\n", | |
" print(f\"Uploaded {jobname}.result.zip to Google Drive with ID {uploaded.get('id')}\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"execution_count": null, | |
"metadata": { | |
"id": "UGUBLzB3C6WN", | |
"pycharm": { | |
"name": "#%% md\n" | |
} | |
}, | |
"source": [ | |
"# Instructions <a name=\"Instructions\"></a>\n", | |
"**Quick start**\n", | |
"1. Paste your protein sequence(s) in the input field.\n", | |
"2. Press \"Runtime\" -> \"Run all\".\n", | |
"3. The pipeline consists of 5 steps. The currently running step is indicated by a circle with a stop sign next to it.\n", | |
"\n", | |
"**Result zip file contents**\n", | |
"\n", | |
"1. PDB formatted structures sorted by avg. pLDDT and complexes are sorted by pTMscore. (unrelaxed and relaxed if `use_amber` is enabled).\n", | |
"2. Plots of the model quality.\n", | |
"3. Plots of the MSA coverage.\n", | |
"4. Parameter log file.\n", | |
"5. A3M formatted input MSA.\n", | |
"6. A `predicted_aligned_error_v1.json` using [AlphaFold-DB's format](https://alphafold.ebi.ac.uk/faq#faq-7) and a `scores.json` for each model which contains an array (list of lists) for PAE, a list with the average pLDDT and the pTMscore.\n", | |
"7. BibTeX file with citations for all used tools and databases.\n", | |
"\n", | |
"At the end of the job a download modal box will pop up with a `jobname.result.zip` file. Additionally, if the `save_to_google_drive` option was selected, the `jobname.result.zip` will be uploaded to your Google Drive.\n", | |
"\n", | |
"**MSA generation for complexes**\n", | |
"\n", | |
"For the complex prediction we use unpaired and paired MSAs. Unpaired MSA is generated the same way as for the protein structures prediction by searching the UniRef100 and environmental sequences three iterations each.\n", | |
"\n", | |
"The paired MSA is generated by searching the UniRef100 database and pairing the best hits sharing the same NCBI taxonomic identifier (=species or sub-species). We only pair sequences if all of the query sequences are present for the respective taxonomic identifier.\n", | |
"\n", | |
"**Using a custom MSA as input**\n", | |
"\n", | |
"To predict the structure with a custom MSA (A3M formatted): (1) Change the `msa_mode`: to \"custom\", (2) Wait for an upload box to appear at the end of the \"MSA options ...\" box. Upload your A3M. The first fasta entry of the A3M must be the query sequence without gaps.\n", | |
"\n", | |
"It is also possilbe to proide custom MSAs for complex predictions. Read more about the format [here](https://github.com/sokrypton/ColabFold/issues/76).\n", | |
"\n", | |
"As an alternative for MSA generation the [HHblits Toolkit server](https://toolkit.tuebingen.mpg.de/tools/hhblits) can be used. After submitting your query, click \"Query Template MSA\" -> \"Download Full A3M\". Download the A3M file and upload it in this notebook.\n", | |
"\n", | |
"**PDB100** <a name=\"pdb100\"></a>\n", | |
"\n", | |
"As of 23/06/08, we have transitioned from using the PDB70 to a 100% clustered PDB, the PDB100. The construction methodology of PDB100 differs from that of PDB70.\n", | |
"\n", | |
"The PDB70 was constructed by running each PDB70 representative sequence through [HHblits](https://github.com/soedinglab/hh-suite) against the [Uniclust30](https://uniclust.mmseqs.com/). On the other hand, the PDB100 is built by searching each PDB100 representative structure with [Foldseek](https://github.com/steineggerlab/foldseek) against the [AlphaFold Database](https://alphafold.ebi.ac.uk).\n", | |
"\n", | |
"To maintain compatibility with older Notebook versions and local installations, the generated files and API responses will continue to be named \"PDB70\", even though we're now using the PDB100.\n", | |
"\n", | |
"**Using custom templates** <a name=\"custom_templates\"></a>\n", | |
"\n", | |
"To predict the structure with a custom template (PDB or mmCIF formatted): (1) change the `template_mode` to \"custom\" in the execute cell and (2) wait for an upload box to appear at the end of the \"Input Protein\" box. Select and upload your templates (multiple choices are possible).\n", | |
"\n", | |
"* Templates must follow the four letter PDB naming with lower case letters.\n", | |
"\n", | |
"* Templates in mmCIF format must contain `_entity_poly_seq`. An error is thrown if this field is not present. The field `_pdbx_audit_revision_history.revision_date` is automatically generated if it is not present.\n", | |
"\n", | |
"* Templates in PDB format are automatically converted to the mmCIF format. `_entity_poly_seq` and `_pdbx_audit_revision_history.revision_date` are automatically generated.\n", | |
"\n", | |
"If you encounter problems, please report them to this [issue](https://github.com/sokrypton/ColabFold/issues/177).\n", | |
"\n", | |
"**Comparison to the full AlphaFold2 and AlphaFold2 Colab**\n", | |
"\n", | |
"This notebook replaces the homology detection and MSA pairing of AlphaFold2 with MMseqs2. For a comparison against the [AlphaFold2 Colab](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb) and the full [AlphaFold2](https://github.com/deepmind/alphafold) system read our [paper](https://www.nature.com/articles/s41592-022-01488-1).\n", | |
"\n", | |
"**Troubleshooting**\n", | |
"* Check that the runtime type is set to GPU at \"Runtime\" -> \"Change runtime type\".\n", | |
"* Try to restart the session \"Runtime\" -> \"Factory reset runtime\".\n", | |
"* Check your input sequence.\n", | |
"\n", | |
"**Known issues**\n", | |
"* Google Colab assigns different types of GPUs with varying amount of memory. Some might not have enough memory to predict the structure for a long sequence.\n", | |
"* Your browser can block the pop-up for downloading the result file. You can choose the `save_to_google_drive` option to upload to Google Drive instead or manually download the result file: Click on the little folder icon to the left, navigate to file: `jobname.result.zip`, right-click and select \\\"Download\\\" (see [screenshot](https://pbs.twimg.com/media/E6wRW2lWUAEOuoe?format=jpg&name=small)).\n", | |
"\n", | |
"**Limitations**\n", | |
"* Computing resources: Our MMseqs2 API can handle ~20-50k requests per day.\n", | |
"* MSAs: MMseqs2 is very precise and sensitive but might find less hits compared to HHblits/HMMer searched against BFD or MGnify.\n", | |
"* We recommend to additionally use the full [AlphaFold2 pipeline](https://github.com/deepmind/alphafold).\n", | |
"\n", | |
"**Description of the plots**\n", | |
"* **Number of sequences per position** - We want to see at least 30 sequences per position, for best performance, ideally 100 sequences.\n", | |
"* **Predicted lDDT per position** - model confidence (out of 100) at each position. The higher the better.\n", | |
"* **Predicted Alignment Error** - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better.\n", | |
"\n", | |
"**Bugs**\n", | |
"- If you encounter any bugs, please report the issue to https://github.com/sokrypton/ColabFold/issues\n", | |
"\n", | |
"**License**\n", | |
"\n", | |
"The source code of ColabFold is licensed under [MIT](https://raw.githubusercontent.com/sokrypton/ColabFold/main/LICENSE). Additionally, this notebook uses the AlphaFold2 source code and its parameters licensed under [Apache 2.0](https://raw.githubusercontent.com/deepmind/alphafold/main/LICENSE) and [CC BY 4.0](https://creativecommons.org/licenses/by-sa/4.0/) respectively. Read more about the AlphaFold license [here](https://github.com/deepmind/alphafold).\n", | |
"\n", | |
"**Acknowledgments**\n", | |
"- We thank the AlphaFold team for developing an excellent model and open sourcing the software.\n", | |
"\n", | |
"- [KOBIC](https://kobic.re.kr) and [Söding Lab](https://www.mpinat.mpg.de/soeding) for providing the computational resources for the MMseqs2 MSA server.\n", | |
"\n", | |
"- Richard Evans for helping to benchmark the ColabFold's Alphafold-multimer support.\n", | |
"\n", | |
"- [David Koes](https://github.com/dkoes) for his awesome [py3Dmol](https://3dmol.csb.pitt.edu/) plugin, without whom these notebooks would be quite boring!\n", | |
"\n", | |
"- Do-Yoon Kim for creating the ColabFold logo.\n", | |
"\n", | |
"- A colab by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)).\n" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"pygments_lexer": "ipython3", | |
"nbconvert_exporter": "python", | |
"version": "3.11.2" | |
}, | |
"accelerator": "GPU", | |
"colab": { | |
"gpuType": "T4", | |
"include_colab_link": true, | |
"machine_shape": "hm", | |
"name": "AlphaFold2.ipynb", | |
"provenance": [] | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment