leiterenato/example.ipynb

## example.ipynb
{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Copyright 2022 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#      http://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Quick Start: Colabfold inference pipeline with Cloud Batch and Workflows\n",
        "\n",
        "This notebook demonstrates how to submit inference pipeline runs.\n",
        "\n",
        "You use the utility functions in the `workflow_executor` module to configure and submit the runs. The `workflow_executor` module contains two functions:\n",
        "- `prepare_args_for_experiment` - This function formats the runtime parameters for the Google Workflows workflows that implements the pipeline. It also sets default values for a number of runtime parameters\n",
        "- `execute_workflow` - This function executes the Google Workflows workflow.\n",
        "\n",
        "This is a complete list of required and optional parameters accepted by the functions:\n",
        "\n",
        "```\n",
        "    project_id: str\n",
        "    region: str\n",
        "    input_dir: str\n",
        "    image_uri: str\n",
        "    job_gcs_path: str\n",
        "    labels: dict\n",
        "    machine_type: str = 'n1-standard-4'\n",
        "    cpu_milli: int = 8000\n",
        "    memory_mib: int = 30000\n",
        "    boot_disk_mib: int = 200000\n",
        "    gpu_type: str = \"nvidia-tesla-t4\"\n",
        "    gpu_count: int = 1\n",
        "    job_gcsfuse_local_dir: str = '/mnt/disks/gcs/colabfold'\n",
        "    parallelism: int = 8\n",
        "    template_mode: str = \"none\"\n",
        "    use_cpu: bool = False\n",
        "    use_gpu_relax: bool = False\n",
        "    use_amber: bool = False\n",
        "    msa_mode: str = 'mmseqs2_uniref_env'\n",
        "    model_type: str = 'auto'\n",
        "    num_models: int = 5\n",
        "    num_recycle: int = 3\n",
        "    custom_template_path: str = None\n",
        "    overwrite_existing_results: bool = False\n",
        "    rank_by: str = 'auto'\n",
        "    pair_mode: str = 'unpaired_paired'\n",
        "    stop_at_score: int = 100\n",
        "    zip_results: bool = False\n",
        "```"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Install python libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Install packages\n",
        "! pip install -U google-cloud-firestore google-cloud-workflows google-cloud-storage"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Reload the kernel before proceeding\n",
        "%load_ext autoreload\n",
        "%autoreload 2"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Execute Workflow"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "from src import workflow_executor"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Please set the following variables according to the setup of your environment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": [
        "project_id = 'rl-llm-dev'    # Project ID. Example: \"my_project_id\"\n",
        "region = 'us-central1'    # Region where resources will be created. Example: \"us-central1\"\n",
        "\n",
        "input_dir = 'colabfold-results/input'   # GCS path where you will upload FASTA files.\n",
        "                                                 # Example: 'my_bucket/input_folder'\n",
        "image_uri = 'gcr.io/rl-llm-dev/colabfold-batch'    # Image built to execute Colabfold\n",
        "job_gcs_path = 'colabfold-results'     # Bucket name where the resulting artifacts will be created.\n",
        "                                        # Example: 'my_bucket'\n",
        "\n",
        "labels = {}     # Labels to identify your job"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copy local FASTA files to the GCS path."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "local_input_dir = '/path/to/my/files'   # Local directory where your FASTA files are located"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Copy local files to GCS\n",
        "! gsutil -m cp {local_input_dir}/*.fasta gs://{input_dir}"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Execute the following cell to start the Colabfold execution."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Prepare the environment for execution\n",
        "args = workflow_executor.prepare_args_for_experiment(\n",
        "    project_id = project_id,\n",
        "    region = region,\n",
        "    input_dir = input_dir,\n",
        "    image_uri = image_uri,\n",
        "    job_gcs_path = job_gcs_path,\n",
        "    labels = labels\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [],
      "source": [
        "header = {args['project_id']: 'rl-llm-dev',\n",
        " args['region']: 'us-central1',\n",
        " args['image_uri']: 'gcr.io/rl-llm-dev/colabfold-batch',\n",
        " args['job_gcs_path']: 'colabfold-results',\n",
        " args['parallelism']: 8,\n",
        " args['job_gcsfuse_local_dir']: '/mnt/disks/gcs/colabfold',\n",
        " args['machine_type']: 'n1-standard-4',\n",
        " args['cpu_milli']: 8000,\n",
        " args['memory_mib']: 30000,\n",
        " args['boot_disk_mib']: 200000,\n",
        " args['gpu_type']: 'nvidia-tesla-t4',\n",
        " args['gpu_count']: 1}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {},
      "outputs": [],
      "source": [
        "def split_list(list_of_items, number_of_items_per_list, header):\n",
        "  \"\"\"\n",
        "  Splits a list into smaller lists of the specified size.\n",
        "\n",
        "  Args:\n",
        "    list_of_items: The list to split.\n",
        "    number_of_items_per_list: The size of each smaller list.\n",
        "\n",
        "  Returns:\n",
        "    A list of smaller lists, each of which contains the specified number of items.\n",
        "  \"\"\"\n",
        "\n",
        "  number_of_lists = len(list_of_items) // number_of_items_per_list\n",
        "  remaining_items = len(list_of_items) % number_of_items_per_list\n",
        "\n",
        "  smaller_lists = []\n",
        "  for i in range(number_of_lists):\n",
        "    smaller_lists.append(\n",
        "      {**header, \n",
        "       'runners': list_of_items[i * number_of_items_per_list: (i + 1) * number_of_items_per_list]})\n",
        "\n",
        "  if remaining_items:\n",
        "    smaller_lists.append(\n",
        "      {**header,\n",
        "      'runners': list_of_items[number_of_lists * number_of_items_per_list:]})\n",
        "\n",
        "  return smaller_lists"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "metadata": {},
      "outputs": [],
      "source": [
        "execution_plan = split_list(args['runners'], 400, header)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Execute the workflow\n",
        "\n",
        "for execution_args in execution_plan:\n",
        "    workflow_executor.execute_workflow(\n",
        "        workflow_name='colabfold-workflow',\n",
        "        args=execution_args\n",
        "    )"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "include_colab_link": true,
      "name": "AlphaFold2_batch.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "base",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.16"
    },
    "vscode": {
      "interpreter": {
        "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
	{
	"cells": [
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Copyright 2022 Google LLC\n",
	"#\n",
	"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
	"# you may not use this file except in compliance with the License.\n",
	"# You may obtain a copy of the License at\n",
	"#\n",
	"# http://www.apache.org/licenses/LICENSE-2.0\n",
	"#\n",
	"# Unless required by applicable law or agreed to in writing, software\n",
	"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
	"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
	"# See the License for the specific language governing permissions and\n",
	"# limitations under the License."
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Quick Start: Colabfold inference pipeline with Cloud Batch and Workflows\n",
	"\n",
	"This notebook demonstrates how to submit inference pipeline runs.\n",
	"\n",
	"You use the utility functions in the `workflow_executor` module to configure and submit the runs. The `workflow_executor` module contains two functions:\n",
	"- `prepare_args_for_experiment` - This function formats the runtime parameters for the Google Workflows workflows that implements the pipeline. It also sets default values for a number of runtime parameters\n",
	"- `execute_workflow` - This function executes the Google Workflows workflow.\n",
	"\n",
	"This is a complete list of required and optional parameters accepted by the functions:\n",
	"\n",
	"```\n",
	" project_id: str\n",
	" region: str\n",
	" input_dir: str\n",
	" image_uri: str\n",
	" job_gcs_path: str\n",
	" labels: dict\n",
	" machine_type: str = 'n1-standard-4'\n",
	" cpu_milli: int = 8000\n",
	" memory_mib: int = 30000\n",
	" boot_disk_mib: int = 200000\n",
	" gpu_type: str = \"nvidia-tesla-t4\"\n",
	" gpu_count: int = 1\n",
	" job_gcsfuse_local_dir: str = '/mnt/disks/gcs/colabfold'\n",
	" parallelism: int = 8\n",
	" template_mode: str = \"none\"\n",
	" use_cpu: bool = False\n",
	" use_gpu_relax: bool = False\n",
	" use_amber: bool = False\n",
	" msa_mode: str = 'mmseqs2_uniref_env'\n",
	" model_type: str = 'auto'\n",
	" num_models: int = 5\n",
	" num_recycle: int = 3\n",
	" custom_template_path: str = None\n",
	" overwrite_existing_results: bool = False\n",
	" rank_by: str = 'auto'\n",
	" pair_mode: str = 'unpaired_paired'\n",
	" stop_at_score: int = 100\n",
	" zip_results: bool = False\n",
	"```"
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Install python libraries"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Install packages\n",
	"! pip install -U google-cloud-firestore google-cloud-workflows google-cloud-storage"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Reload the kernel before proceeding\n",
	"%load_ext autoreload\n",
	"%autoreload 2"
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Execute Workflow"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"from src import workflow_executor"
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Please set the following variables according to the setup of your environment."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [],
	"source": [
	"project_id = 'rl-llm-dev' # Project ID. Example: \"my_project_id\"\n",
	"region = 'us-central1' # Region where resources will be created. Example: \"us-central1\"\n",
	"\n",
	"input_dir = 'colabfold-results/input' # GCS path where you will upload FASTA files.\n",
	" # Example: 'my_bucket/input_folder'\n",
	"image_uri = 'gcr.io/rl-llm-dev/colabfold-batch' # Image built to execute Colabfold\n",
	"job_gcs_path = 'colabfold-results' # Bucket name where the resulting artifacts will be created.\n",
	" # Example: 'my_bucket'\n",
	"\n",
	"labels = {} # Labels to identify your job"
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Copy local FASTA files to the GCS path."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"local_input_dir = '/path/to/my/files' # Local directory where your FASTA files are located"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Copy local files to GCS\n",
	"! gsutil -m cp {local_input_dir}/*.fasta gs://{input_dir}"
	]
	},
	{
	"attachments": {},
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Execute the following cell to start the Colabfold execution."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Prepare the environment for execution\n",
	"args = workflow_executor.prepare_args_for_experiment(\n",
	" project_id = project_id,\n",
	" region = region,\n",
	" input_dir = input_dir,\n",
	" image_uri = image_uri,\n",
	" job_gcs_path = job_gcs_path,\n",
	" labels = labels\n",
	")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [],
	"source": [
	"header = {args['project_id']: 'rl-llm-dev',\n",
	" args['region']: 'us-central1',\n",
	" args['image_uri']: 'gcr.io/rl-llm-dev/colabfold-batch',\n",
	" args['job_gcs_path']: 'colabfold-results',\n",
	" args['parallelism']: 8,\n",
	" args['job_gcsfuse_local_dir']: '/mnt/disks/gcs/colabfold',\n",
	" args['machine_type']: 'n1-standard-4',\n",
	" args['cpu_milli']: 8000,\n",
	" args['memory_mib']: 30000,\n",
	" args['boot_disk_mib']: 200000,\n",
	" args['gpu_type']: 'nvidia-tesla-t4',\n",
	" args['gpu_count']: 1}"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 28,
	"metadata": {},
	"outputs": [],
	"source": [
	"def split_list(list_of_items, number_of_items_per_list, header):\n",
	" \"\"\"\n",
	" Splits a list into smaller lists of the specified size.\n",
	"\n",
	" Args:\n",
	" list_of_items: The list to split.\n",
	" number_of_items_per_list: The size of each smaller list.\n",
	"\n",
	" Returns:\n",
	" A list of smaller lists, each of which contains the specified number of items.\n",
	" \"\"\"\n",
	"\n",
	" number_of_lists = len(list_of_items) // number_of_items_per_list\n",
	" remaining_items = len(list_of_items) % number_of_items_per_list\n",
	"\n",
	" smaller_lists = []\n",
	" for i in range(number_of_lists):\n",
	" smaller_lists.append(\n",
	" {**header, \n",
	" 'runners': list_of_items[i * number_of_items_per_list: (i + 1) * number_of_items_per_list]})\n",
	"\n",
	" if remaining_items:\n",
	" smaller_lists.append(\n",
	" {**header,\n",
	" 'runners': list_of_items[number_of_lists * number_of_items_per_list:]})\n",
	"\n",
	" return smaller_lists"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 29,
	"metadata": {},
	"outputs": [],
	"source": [
	"execution_plan = split_list(args['runners'], 400, header)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Execute the workflow\n",
	"\n",
	"for execution_args in execution_plan:\n",
	" workflow_executor.execute_workflow(\n",
	" workflow_name='colabfold-workflow',\n",
	" args=execution_args\n",
	" )"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"accelerator": "GPU",
	"colab": {
	"collapsed_sections": [],
	"include_colab_link": true,
	"name": "AlphaFold2_batch.ipynb",
	"provenance": []
	},
	"kernelspec": {
	"display_name": "base",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.9.16"
	},
	"vscode": {
	"interpreter": {
	"hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
	}
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}