Skip to content

Instantly share code, notes, and snippets.

@adelenelai
Last active October 25, 2020 14:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save adelenelai/59a8794e1f030941c19bcb50aa8adf3f to your computer and use it in GitHub Desktop.
Save adelenelai/59a8794e1f030941c19bcb50aa8adf3f to your computer and use it in GitHub Desktop.
20201024_inchikey_cis_trans
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Issue: Nitrogen sp2 isomers get the same InChI Key\n",
"\n",
"https://sourceforge.net/p/rdkit/mailman/message/37135337/"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import rdkit.Chem as Chem\n",
"import rdkit.Chem.Draw as Draw"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"m1_cis = Chem.MolFromSmiles(\"C/N=C(/NC#N)NCCSCc1nc[nH]c1C\")\n",
"m1_trans = Chem.MolFromSmiles(\"C/N=C(\\\\NC#N)NCCSCc1nc[nH]c1C\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<PIL.PngImagePlugin.PngImageFile image mode=RGB size=600x200 at 0x12023FA90>"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Draw.MolsToGridImage([m1_cis,m1_trans],legends=[\"cis\",\"trans\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compare InChIKeys"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'AQIXAKUUQRKLND-UHFFFAOYSA-N'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchikey_cis = Chem.inchi.MolToInchiKey(m1_cis)\n",
"inchikey_cis"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'AQIXAKUUQRKLND-UHFFFAOYSA-N'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchikey_trans = Chem.inchi.MolToInchiKey(m1_trans)\n",
"inchikey_trans"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchikey_cis == inchikey_trans\n",
"#True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compare InChIs"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'InChI=1S/C10H16N6S/c1-8-9(16-7-15-8)5-17-4-3-13-10(12-2)14-6-11/h7H,3-5H2,1-2H3,(H,15,16)(H2,12,13,14)'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchi_cis = Chem.inchi.MolToInchi(m1_cis)\n",
"inchi_cis"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'InChI=1S/C10H16N6S/c1-8-9(16-7-15-8)5-17-4-3-13-10(12-2)14-6-11/h7H,3-5H2,1-2H3,(H,15,16)(H2,12,13,14)'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchi_trans = Chem.inchi.MolToInchi(m1_trans)\n",
"inchi_trans"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchi_cis == inchi_trans\n",
"#True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compare Auxilliary Info"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('InChI=1S/C10H16N6S/c1-8-9(16-7-15-8)5-17-4-3-13-10(12-2)14-6-11/h7H,3-5H2,1-2H3,(H,15,16)(H2,12,13,14)',\n",
" 'AuxInfo=1/1/N:17,1,8,9,11,5,14,16,12,3,6,2,7,4,15,13,10/rA:17CNCNCNNCCSCCNCNCC/rB:s1;d+2;s3;s4;t5;s3;s7;s8;s9;s10;s11;s12;d13;s14;d12s15;s16;/rC:;;;;;;;;;;;;;;;;;')"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchi_aux_cis = Chem.inchi.MolToInchiAndAuxInfo(m1_cis)\n",
"inchi_aux_cis"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('InChI=1S/C10H16N6S/c1-8-9(16-7-15-8)5-17-4-3-13-10(12-2)14-6-11/h7H,3-5H2,1-2H3,(H,15,16)(H2,12,13,14)',\n",
" 'AuxInfo=1/1/N:17,1,8,9,11,5,14,16,12,3,6,2,7,4,15,13,10/rA:17CNCNCNNCCSCCNCNCC/rB:s1;d-2;s3;s4;t5;s3;s7;s8;s9;s10;s11;s12;d13;s14;d12s15;s16;/rC:;;;;;;;;;;;;;;;;;')"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchi_aux_trans = Chem.inchi.MolToInchiAndAuxInfo(m1_trans)\n",
"inchi_aux_trans"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchi_aux_cis == inchi_aux_trans"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It seems the isomers differ in their aux info : d+2 vs. d-2. \n",
"\n",
"Their InChIs and InChIKeys as generated by RDKit are otherwise identical. Not sure why this discrepancy arises."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Practical Application\n",
"\n",
"Originally, Gustavo wanted to generate all stereoisomers, then filter out duplicates on InChIKey.\n",
"\n",
"Why not filter out duplicates using canonical SMILES? "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'C/N=C(/NC#N)NCCSCc1nc[nH]c1C'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1_cis_can = Chem.CanonSmiles('C/N=C(/NC#N)NCCSCc1nc[nH]c1C')\n",
"m1_cis_can"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'C/N=C(\\\\NC#N)NCCSCc1nc[nH]c1C'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m2_trans_can = Chem.CanonSmiles('C/N=C(\\\\NC#N)NCCSCc1nc[nH]c1C')\n",
"m2_trans_can"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1_cis_can == m2_trans_can"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above, canonical SMILES (within RDKit) are able to distinguish the stereoisomers. \n",
"\n",
"However, note that different toolkits have different forms of canonical SMILES, so you'd need to stay within RDKit for your deduplication task!\n",
"\n",
"See Greg's post on canonical SMILES:\n",
"https://github.com/rdkit/rdkit/issues/2747\n",
"\n",
"See Paolo's gist on comparing molecule identity:\n",
"https://gist.github.com/ptosco/1e1c23ad24c90444993fa1db21ccb48b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Possible related issue\n",
"\n",
"Example from \n",
"\n",
"https://sourceforge.net/p/inchi/mailman/inchi-discuss/thread/CA%2BZ3Zfc0nLLRJPYwQ2EJL-ePPH8RgzedbFxkRMKi6ifcsp4FHw%40mail.gmail.com/#msg31493868"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"m2_cis = Chem.MolFromSmiles(\"C1CCCCCC(=O)OCC/C=C\\C1\") \n",
"m2_trans = Chem.MolFromSmiles(\"C1CCCCCC(=O)OCC/C=C/C1\") "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'FZKPUQQWULXMCD-ALCCZGGFSA-N'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchikey_cis = Chem.inchi.MolToInchiKey(m2_cis)\n",
"inchikey_cis"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'FZKPUQQWULXMCD-FNORWQNLSA-N'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchikey_trans = Chem.inchi.MolToInchiKey(m2_trans)\n",
"inchikey_trans"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchikey_cis == inchikey_trans"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"InChiKey function of PubChem seems to be working OK for cis/trans isomers in this case.\n",
"\n",
"Check their aux info:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('InChI=1S/C12H20O2/c13-12-10-8-6-4-2-1-3-5-7-9-11-14-12/h5,7H,1-4,6,8-11H2/b7-5-',\n",
" 'AuxInfo=1/0/N:1,2,14,3,13,4,12,5,11,6,10,7,8,9/rA:14CCCCCCCOOCCCCC/rB:s1;s2;s3;s4;s5;s6;d7;s7;s9;s10;s11;d-12;s1s13;/rC:;;;;;;;;;;;;;;')"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchi_aux_cis = Chem.inchi.MolToInchiAndAuxInfo(m2_cis)\n",
"inchi_aux_cis"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('InChI=1S/C12H20O2/c13-12-10-8-6-4-2-1-3-5-7-9-11-14-12/h5,7H,1-4,6,8-11H2/b7-5+',\n",
" 'AuxInfo=1/0/N:1,2,14,3,13,4,12,5,11,6,10,7,8,9/rA:14CCCCCCCOOCCCCC/rB:s1;s2;s3;s4;s5;s6;d7;s7;s9;s10;s11;d+12;s1s13;/rC:;;;;;;;;;;;;;;')"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchi_aux_cis = Chem.inchi.MolToInchiAndAuxInfo(m2_trans)\n",
"inchi_aux_cis"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inchi_aux_cis == inchi_aux_trans"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As before, the aux info differs: d-12 vs. d+12."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<PIL.PngImagePlugin.PngImageFile image mode=RGB size=600x200 at 0x12084DCD0>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Draw.MolsToGridImage([m2_cis,m2_trans],legends=[\"cis\",\"trans\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Not sure why they are depicted identically, they shouldn't be. Issues with depiction..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "rdkit-kernel",
"language": "python",
"name": "my-rdkit-env"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment