Skip to content

Instantly share code, notes, and snippets.

@j-wags
Last active June 21, 2024 17:14
Show Gist options
  • Save j-wags/57cd258af4465610205da9e486791140 to your computer and use it in GitHub Desktop.
Save j-wags/57cd258af4465610205da9e486791140 to your computer and use it in GitHub Desktop.
57cd258af4465610205da9e486791140
name: 2024-smirnoff-workshop
channels:
- conda-forge
dependencies:
- openff-qcsubmit
- qcportal=0.53
- nglview
- ipywidgets=7.6.0
- plotly
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "c26bbfcd-1535-4650-826b-f806308620a0",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "c26bbfcd-1535-4650-826b-f806308620a0",
"outputId": "e27bc904-4270-403e-8e54-abb994c6d9fc"
},
"outputs": [],
"source": [
"!pip install -U https://github.com/conda-incubator/condacolab/archive/cuda-version-12.tar.gz\n",
"import condacolab\n",
"condacolab.install_mambaforge()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2eb5268c-d168-4ed5-9c6a-0dfa09e8e39d",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2eb5268c-d168-4ed5-9c6a-0dfa09e8e39d",
"outputId": "15209d12-b27c-4bdf-adbc-2e4215e95fc1"
},
"outputs": [],
"source": [
"!wget -qN https://gist.githubusercontent.com/j-wags/57cd258af4465610205da9e486791140/raw/736b90c0e34056963372e89048674ba5b33bb12a/env.yaml\n",
"!mamba env update -q --prefix /usr/local --file=env.yaml"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9470a40b-638c-427b-8471-7175f2f6d599",
"metadata": {
"id": "9470a40b-638c-427b-8471-7175f2f6d599"
},
"outputs": [],
"source": [
"from google.colab import output\n",
"output.enable_custom_widget_manager()"
]
},
{
"cell_type": "markdown",
"id": "a868818f",
"metadata": {
"id": "a868818f"
},
"source": [
"# Retrieving datasets from QCFractal with `openff-qcsubmit`\n",
"\n",
"**Based on the existing example**: https://github.com/openforcefield/openff-qcsubmit/blob/main/examples/retrieving-results.ipynb\n",
"\n",
"This example shows how QCSubmit can be used to retrieve the results of quantum chemical (QC) calculations from a [QCFractal] instance such as [QCArchive].\n",
"\n",
"In particular, it demonstrates how:\n",
"\n",
"* raw torsion drive, optimised geometry and hessian result records can be retrieved from the public\n",
" [QCArchive] server and stored in a result collection\n",
"\n",
"* the retrieved result records can be filtered and curated using a set of built-in filters\n",
"\n",
"* the result collection can be saved and loaded from disk\n",
"\n",
"[QCFractal]: http://docs.qcarchive.molssi.org/projects/qcfractal/en/latest/\n",
"[QCArchive]: https://qcarchive.molssi.org/\n",
"\n",
"For the sake of clarity all verbose warnings will be disabled in this tutorial:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21b7f1f5",
"metadata": {
"id": "21b7f1f5",
"pycharm": {
"name": "#%%\n"
},
"tags": []
},
"outputs": [],
"source": [
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\")\n",
"import logging\n",
"\n",
"logging.getLogger(\"openff.toolkit\").setLevel(logging.ERROR)"
]
},
{
"cell_type": "markdown",
"id": "e1d9d3f7",
"metadata": {
"id": "e1d9d3f7",
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Retrieving result collections\n",
"\n",
"QCSubmit provides a suite of utilities for retrieving and curating collections of QC results directly from a running QCFractal server, or an already computed QCPortal dataset. This functionality is provided through three main classes:\n",
"\n",
"* `BasicResultCollection` - stores references to simple QCPortal result record that may contain energies, gradients, or hessians computed for a molecule in a single conformation.\n",
"\n",
"* `OptimizationResultCollection` - stores references to full optimization result records (i.e. `OptimizationRecord`\n",
" objects), as well as the final minimised conformer produced by the optimization.\n",
"\n",
"* `TorsionDriveResultCollection` - stores references to full torsion drive result records (i.e. `TorsionDriveRecord`\n",
" objects), as well as the minimum energy conformer associated with each torsion angle that was scanned.\n",
"\n",
"Each of these collections can be generated directly from a running `QCFractal` server using the `from_server` class\n",
"method.\n",
"\n",
"We begin by creating a QCPortal `FractalClient` instance that will allow us to communicate with the running\n",
"server. By default, `FractalClient` connects to the main QCArchive server:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ec53772",
"metadata": {
"id": "7ec53772",
"pycharm": {
"name": "#%%\n"
},
"tags": []
},
"outputs": [],
"source": [
"from qcportal import PortalClient\n",
"\n",
"qc_client = PortalClient(\"https://api.qcarchive.molssi.org:443\")"
]
},
{
"cell_type": "markdown",
"id": "bb9705e5",
"metadata": {
"id": "bb9705e5",
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Other servers can be accessed by providing the server's URI.\n",
"\n",
"We can then use this to generate our result collections:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8669729e",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 373
},
"id": "8669729e",
"outputId": "04d3b2a1-7f86-4102-ef7d-b824f164aac2",
"pycharm": {
"name": "#%%\n"
},
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"from openff.qcsubmit.results import (\n",
" BasicResultCollection,\n",
" OptimizationResultCollection,\n",
" TorsionDriveResultCollection,\n",
")\n",
"\n",
"# Pull down the torsion drive records from the 'OpenFF Protein Capped 3-mer Backbones v1.0' dataset.\n",
"torsion_drive_result_collection = TorsionDriveResultCollection.from_server(\n",
" client=qc_client,\n",
" datasets=\"OpenFF Protein Capped 3-mer Backbones v1.0\",\n",
" spec_name=\"default\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "1d0dae9c",
"metadata": {
"id": "1d0dae9c",
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"*Note: currently only complete results are pulled down by the `from_server` method*\n",
"\n",
"There are two main inputs to the `from_server` method, in addition to the fractal client:\n",
"\n",
"* the name(s) of the existing datasets to retrieve the results of. This can either be the name of a single dataset or a list of dataset names\n",
"* the name of the specification used to compute the records. Each specification corresponds to a particular basis, method, program and additional settings.\n",
"\n",
"Let's print out some basic information about each of these result collections:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbcd16da",
"metadata": {
"id": "cbcd16da",
"pycharm": {
"name": "#%%\n"
},
"tags": []
},
"outputs": [],
"source": [
"print(\"===TORSION DRIVE RESULTS===\")\n",
"\n",
"print(f\"N RESULTS: {torsion_drive_result_collection.n_results}\")\n",
"print(f\"N MOLECULES: {torsion_drive_result_collection.n_molecules}\")"
]
},
{
"cell_type": "markdown",
"id": "b540312e",
"metadata": {
"id": "b540312e",
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"This allows results generated by multiple different servers (e.g. a local fractal instance and the public QCArchive\n",
"server) to be stored in a single result collection object.\n",
"\n",
"The references to the actual data are then stored in corresponding lists:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e70715d6",
"metadata": {
"id": "e70715d6",
"pycharm": {
"name": "#%%\n"
},
"tags": []
},
"outputs": [],
"source": [
"torsion_drive_result_collection.entries[qc_client.address][:10]"
]
},
{
"cell_type": "markdown",
"id": "c5695286",
"metadata": {
"id": "c5695286",
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"After running the above command, notice that the entries stored in the collection are not the actual result\n",
"records generated and stored on the server, but rather a reference to them. In particular, the unique ID of the record is stored along with a SMILES depiction of the molecule the result was generated for.\n",
"\n",
"The main reason for doing this is that we often would like to be able to state which data we would like to use in\n",
"an application without having to create multiple copies of the data. Not only can this take up large amounts of disk space, it runs the risk of data becoming out of sync with the original if the format the records are stored in changes or the local copy of the data is accidentally mutated. Storing a reference to the original data and then retrieving it when needed is typically a cleaner and safer solution.\n",
"\n",
"## Retrieving the result records\n",
"\n",
"The raw result record objects can be easily retrieved using the result collection objects. This allows us to filter the collection to only retrieve the results we want. For example, we can apply a SMARTS string to retrieve the cysteine record:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba06d9de",
"metadata": {
"id": "ba06d9de",
"tags": []
},
"outputs": [],
"source": [
"from openff.qcsubmit.results.filters import SMARTSFilter\n",
"\n",
"filtered_collection = torsion_drive_result_collection.filter(SMARTSFilter(smarts_to_include=[\"C[SH]\"]))\n",
"filtered_collection"
]
},
{
"cell_type": "markdown",
"id": "62d2b53d",
"metadata": {
"id": "62d2b53d"
},
"source": [
"Then we can download the actual results. This can be very slow, so it's worth filtering aggressively:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "067f6de7",
"metadata": {
"id": "067f6de7",
"pycharm": {
"name": "#%%\n"
},
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"torsion_drive_records = filtered_collection.to_records()\n",
"torsion_drive_records"
]
},
{
"cell_type": "markdown",
"id": "33a913bb",
"metadata": {
"id": "33a913bb",
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"QCSubmit seamlessly takes care of pulling the data from the server in the most efficient way making sure to take\n",
"advantage of the pagination that QCFractal provides. Further, it attempts to cache all calls to the server so that\n",
"multiple calls to `to_records` does not need to constantly query the server.\n",
"\n",
"Notice that not only are the raw result records retrieved, but also an OpenFF `Molecule` object is created for each result record. This molecule has the correct ordering and also stores any conformers associated with the\n",
"result collection. For basic collections, the conformer is the one that was used in any calculations; for optimization collections, it is the final conformer yielded by the optimization; and for torsion drives, it is the lowest energy conformer for each sampled torsion angle.\n",
"\n",
"## Inspecting results\n",
"\n",
"In the case of torsion drive records, we can easily iterate over the grid ID, the associated conformer, and the\n",
"associated energy in one go:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71dc2247",
"metadata": {
"id": "71dc2247",
"pycharm": {
"name": "#%%\n"
},
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"torsion_drive_record, molecule = torsion_drive_records[0]\n",
"import numpy as np\n",
"from matplotlib import pyplot\n",
"from openff.units import unit\n",
"\n",
"energy_grid = np.zeros((24, 24))\n",
"psi_labels = [\"\"] * 24\n",
"phi_labels = [\"\"] * 24\n",
"for (phi, psi), qc_conformer in zip(\n",
" molecule.properties[\"grid_ids\"], molecule.conformers\n",
"):\n",
" qc_energy = torsion_drive_record.final_energies[(phi, psi)]\n",
"\n",
" phi_bin = int(phi + 165) // 15\n",
" psi_bin = int(psi + 165) // 15\n",
" energy_grid[psi_bin, phi_bin] = qc_energy\n",
" psi_labels[psi_bin] = psi\n",
" phi_labels[phi_bin] = phi\n",
" # print(f\"({phi}, {psi}) {phi_bin},{psi_bin} E={qc_energy:.4f} Ha\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63b4665a",
"metadata": {
"id": "63b4665a",
"tags": []
},
"outputs": [],
"source": [
"import plotly.graph_objects as go\n",
"from ipywidgets import widgets\n",
"\n",
"fig = go.FigureWidget(\n",
" data=go.Heatmap(\n",
" z=energy_grid,\n",
" x=phi_labels,\n",
" y=psi_labels,\n",
" colorbar={\"title\": \"Energy (Ha)\"},\n",
" hovertemplate=\"phi: %{x}\\npsi: %{y}\\nenergy: %{z} Ha\",\n",
" ),\n",
" layout=go.Layout(\n",
" title=\"Val-Ala-Val - central backbone torsiondrive (Ha)\",\n",
" xaxis_title=\"Phi\",\n",
" yaxis_title=\"Psi\",\n",
" # autosize=False,\n",
" yaxis_scaleanchor=\"x\",\n",
" xaxis_scaleanchor=\"y\",\n",
" ),\n",
")\n",
"\n",
"view = molecule.visualize(\"nglview\")\n",
"\n",
"def on_click(trace, points, selector):\n",
" print(points)\n",
" for x, y in points.point_inds:\n",
" view.frame = x * 24 + y\n",
"\n",
"\n",
"heatmap = fig.data[0]\n",
"heatmap.on_click(on_click)\n",
"\n",
"container = widgets.GridBox(\n",
" [fig, view],\n",
" # layout=widgets.Layout(grid_template_columns=f\"repeat(2, {600}px)\"),\n",
")\n",
"container"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d9c4a7f6",
"metadata": {
"id": "d9c4a7f6"
},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment