Skip to content

Instantly share code, notes, and snippets.

@ssiddhantsharma
Created October 21, 2021 19:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ssiddhantsharma/25f94dc6741f2d1c45c43a333dca2331 to your computer and use it in GitHub Desktop.
Save ssiddhantsharma/25f94dc6741f2d1c45c43a333dca2331 to your computer and use it in GitHub Desktop.
Simple conversion of a SMILES String to 3 different chemical data formats. See Figure 1. https://iopscience.iop.org/article/10.1088/2632-2153/aba947/meta
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Molecular_Representations.ipynb",
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Lpqzq6S--hXf"
},
"source": [
"### **We will start by installing RDKit, SELFIES v2. and DeepSMILES using !pip**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "M8nbMHcL-9K5",
"outputId": "21dbb97f-7cd9-45ee-df7d-7f5400393766"
},
"source": [
"!pip install rdkit-pypi \n",
"!pip install selfies --upgrade \n",
"!pip install --upgrade deepsmiles"
],
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Requirement already satisfied: rdkit-pypi in /usr/local/lib/python3.7/dist-packages (2021.3.5.1)\n",
"Requirement already satisfied: numpy>=1.19 in /usr/local/lib/python3.7/dist-packages (from rdkit-pypi) (1.19.5)\n",
"Requirement already satisfied: selfies in /usr/local/lib/python3.7/dist-packages (2.0.0)\n",
"Requirement already satisfied: deepsmiles in /usr/local/lib/python3.7/dist-packages (1.0.1)\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "psqSkp58_nXR"
},
"source": [
"### **Importing relevant libraries and drawing small organic molecule: 3,4-Methylenedioxymethamphetamine**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 317
},
"id": "z8XQwFgEAHys",
"outputId": "525a8261-1b4e-4e3a-84a5-73f15d6946ba"
},
"source": [
"from rdkit import Chem \n",
"from rdkit.Chem.Draw import IPythonConsole #RDKit molecule drawing capabilites \n",
"from rdkit.Chem import Draw\n",
"IPythonConsole.drawOptions.addAtomIndices = True\n",
"IPythonConsole.molSize = 300,300\n",
"import selfies as sf #importing selfies\n",
"import deepsmiles # importing deepsmiles\n",
"mol = Chem.MolFromSmiles('CNC(C)CC1=CC=C2C(=C1)OCO2') #SMILES string for 3,4-Methylenedioxymethamphetamine\n",
"mol"
],
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"data": {
"image/png": "\n",
"text/plain": [
"<rdkit.Chem.rdchem.Mol at 0x7f0bdd97f080>"
]
},
"metadata": {},
"execution_count": 3
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fBGPOIqSBiGt"
},
"source": [
"### **Converting SMILES String to SELFIES v2. and InChI**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "SYd7RO4UBygy",
"outputId": "2e0de0f5-fa93-48cd-f8a0-f92c112b85bf"
},
"source": [
"SMILES = \"CNC(C)CC1=CC=C2C(=C1)OCO2\"\n",
"SELFIES = sf.encoder(SMILES) # SMILES --> SEFLIES v2.\n",
"print(f\"Generated SELFIES: {SELFIES}\")\n",
"\n",
"InChI = Chem.MolToInchi(mol) # SMILES --> InChI\n",
"print(f\"Generated Inchi: {InChI}\")\n"
],
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Generated SELFIES: [C][N][C][Branch1][C][C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1]\n",
"Generated Inchi: InChI=1S/C11H15NO2/c1-8(12-2)5-9-3-4-10-11(6-9)14-7-13-10/h3-4,6,8,12H,5,7H2,1-2H3\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wb9JG0guCugN"
},
"source": [
"### **Converting SMILES String to DeepSMILES**"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xI8vJ-rDCzyp",
"outputId": "b9265321-c0fc-47d0-e61c-a0012178ebe6"
},
"source": [
"converter = deepsmiles.Converter(rings=True, branches=True)\n",
"DeepSMILES = converter.encode(\"CNC(C)CC1=CC=C2C(=C1)OCO2\")\n",
"print(f\"Generated DeepSMILES: {DeepSMILES}\")\n"
],
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Generated DeepSMILES: CNCC)CC=CC=CC=C6)OCO5\n"
]
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment