Skip to content

Instantly share code, notes, and snippets.

@leelasd
Created March 2, 2017 01:55
Show Gist options
  • Save leelasd/43219a222bf57d3e01c2c83f2ad9b031 to your computer and use it in GitHub Desktop.
Save leelasd/43219a222bf57d3e01c2c83f2ad9b031 to your computer and use it in GitHub Desktop.
Convert Smiles code to 3D and save to SDF
import pandas as pd
from rdkit import Chem
from rdkit.Chem import AllChem
df = pd.read_csv('SMILES.csv')
mols = [Chem.MolFromSmiles(smi) for smi in df.SMILES]
hmols = [Chem.AddHs(m) for m in mols]
for mol in hmols:
AllChem.EmbedMolecule(mol,AllChem.ETKDG())
print(AllChem.UFFOptimizeMolecule(mol,1000))
smiles = list(df.SMILES)
sid = list(df.SOURCE_ID)
libs = df[df.columns[0]]
writer = Chem.SDWriter('TEST.sdf')
for n in range(len(libs)):
hmols[n].SetProp("_Library","%s"%libs[n])
hmols[n].SetProp("_Name","%s"%sid[n])
hmols[n].SetProp("_SourceID","%s"%sid[n])
hmols[n].SetProp("_SMILES","%s"%smiles[n])
writer.write(hmols[n])
writer.close()
@Ruibin-Liu
Copy link

Ruibin-Liu commented Jun 13, 2023

Your SMILES.csv file should contain the header line so that the file content is:

SMILES
CC1N=C(c2cc3ccccc3o2)N=C1c1ccco1

Update: the above change only deals with the 'SMILES' column but not the 'SOURCE_ID' and the first column whose name is not known to us but looks like a library name for the corresponding compounds. So to get the code to work, the 'SMILES.csv' should look like:

LIBRARY,SMILES,SOURCE_ID
Pubchem,CC1N=C(c2cc3ccccc3o2)N=C1c1ccco1,11234

Where 'Pubchem' and '11234' are just example values for LIBRARY and SOURCE_ID respectively. Also, the 'LIBRARY' column has to be in the first place but the others can change their orders.

For those having more knowledge of the pandas library and the csv format, you should change the original code to be consistent with your csv file instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment