Skip to content

Instantly share code, notes, and snippets.

@leiterenato
Created April 25, 2023 21:53
Show Gist options
  • Save leiterenato/2450b06c053267d3d46858de9468453f to your computer and use it in GitHub Desktop.
Save leiterenato/2450b06c053267d3d46858de9468453f to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Copyright 2022 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# http://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Quick Start: Colabfold inference pipeline with Cloud Batch and Workflows\n",
"\n",
"This notebook demonstrates how to submit inference pipeline runs.\n",
"\n",
"You use the utility functions in the `workflow_executor` module to configure and submit the runs. The `workflow_executor` module contains two functions:\n",
"- `prepare_args_for_experiment` - This function formats the runtime parameters for the Google Workflows workflows that implements the pipeline. It also sets default values for a number of runtime parameters\n",
"- `execute_workflow` - This function executes the Google Workflows workflow.\n",
"\n",
"This is a complete list of required and optional parameters accepted by the functions:\n",
"\n",
"```\n",
" project_id: str\n",
" region: str\n",
" input_dir: str\n",
" image_uri: str\n",
" job_gcs_path: str\n",
" labels: dict\n",
" machine_type: str = 'n1-standard-4'\n",
" cpu_milli: int = 8000\n",
" memory_mib: int = 30000\n",
" boot_disk_mib: int = 200000\n",
" gpu_type: str = \"nvidia-tesla-t4\"\n",
" gpu_count: int = 1\n",
" job_gcsfuse_local_dir: str = '/mnt/disks/gcs/colabfold'\n",
" parallelism: int = 8\n",
" template_mode: str = \"none\"\n",
" use_cpu: bool = False\n",
" use_gpu_relax: bool = False\n",
" use_amber: bool = False\n",
" msa_mode: str = 'mmseqs2_uniref_env'\n",
" model_type: str = 'auto'\n",
" num_models: int = 5\n",
" num_recycle: int = 3\n",
" custom_template_path: str = None\n",
" overwrite_existing_results: bool = False\n",
" rank_by: str = 'auto'\n",
" pair_mode: str = 'unpaired_paired'\n",
" stop_at_score: int = 100\n",
" zip_results: bool = False\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install python libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Install packages\n",
"! pip install -U google-cloud-firestore google-cloud-workflows google-cloud-storage"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Reload the kernel before proceeding\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Execute Workflow"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from src import workflow_executor"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Please set the following variables according to the setup of your environment."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"project_id = 'rl-llm-dev' # Project ID. Example: \"my_project_id\"\n",
"region = 'us-central1' # Region where resources will be created. Example: \"us-central1\"\n",
"\n",
"input_dir = 'colabfold-results/input' # GCS path where you will upload FASTA files.\n",
" # Example: 'my_bucket/input_folder'\n",
"image_uri = 'gcr.io/rl-llm-dev/colabfold-batch' # Image built to execute Colabfold\n",
"job_gcs_path = 'colabfold-results' # Bucket name where the resulting artifacts will be created.\n",
" # Example: 'my_bucket'\n",
"\n",
"labels = {} # Labels to identify your job"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Copy local FASTA files to the GCS path."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_input_dir = '/path/to/my/files' # Local directory where your FASTA files are located"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Copy local files to GCS\n",
"! gsutil -m cp {local_input_dir}/*.fasta gs://{input_dir}"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Execute the following cell to start the Colabfold execution."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Prepare the environment for execution\n",
"args = workflow_executor.prepare_args_for_experiment(\n",
" project_id = project_id,\n",
" region = region,\n",
" input_dir = input_dir,\n",
" image_uri = image_uri,\n",
" job_gcs_path = job_gcs_path,\n",
" labels = labels\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"header = {args['project_id']: 'rl-llm-dev',\n",
" args['region']: 'us-central1',\n",
" args['image_uri']: 'gcr.io/rl-llm-dev/colabfold-batch',\n",
" args['job_gcs_path']: 'colabfold-results',\n",
" args['parallelism']: 8,\n",
" args['job_gcsfuse_local_dir']: '/mnt/disks/gcs/colabfold',\n",
" args['machine_type']: 'n1-standard-4',\n",
" args['cpu_milli']: 8000,\n",
" args['memory_mib']: 30000,\n",
" args['boot_disk_mib']: 200000,\n",
" args['gpu_type']: 'nvidia-tesla-t4',\n",
" args['gpu_count']: 1}"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"def split_list(list_of_items, number_of_items_per_list, header):\n",
" \"\"\"\n",
" Splits a list into smaller lists of the specified size.\n",
"\n",
" Args:\n",
" list_of_items: The list to split.\n",
" number_of_items_per_list: The size of each smaller list.\n",
"\n",
" Returns:\n",
" A list of smaller lists, each of which contains the specified number of items.\n",
" \"\"\"\n",
"\n",
" number_of_lists = len(list_of_items) // number_of_items_per_list\n",
" remaining_items = len(list_of_items) % number_of_items_per_list\n",
"\n",
" smaller_lists = []\n",
" for i in range(number_of_lists):\n",
" smaller_lists.append(\n",
" {**header, \n",
" 'runners': list_of_items[i * number_of_items_per_list: (i + 1) * number_of_items_per_list]})\n",
"\n",
" if remaining_items:\n",
" smaller_lists.append(\n",
" {**header,\n",
" 'runners': list_of_items[number_of_lists * number_of_items_per_list:]})\n",
"\n",
" return smaller_lists"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"execution_plan = split_list(args['runners'], 400, header)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Execute the workflow\n",
"\n",
"for execution_args in execution_plan:\n",
" workflow_executor.execute_workflow(\n",
" workflow_name='colabfold-workflow',\n",
" args=execution_args\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"include_colab_link": true,
"name": "AlphaFold2_batch.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
},
"vscode": {
"interpreter": {
"hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment