Skip to content

Instantly share code, notes, and snippets.

@leelasd
Created March 2, 2017 01:55
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save leelasd/43219a222bf57d3e01c2c83f2ad9b031 to your computer and use it in GitHub Desktop.
Save leelasd/43219a222bf57d3e01c2c83f2ad9b031 to your computer and use it in GitHub Desktop.
Convert Smiles code to 3D and save to SDF
import pandas as pd
from rdkit import Chem
from rdkit.Chem import AllChem
df = pd.read_csv('SMILES.csv')
mols = [Chem.MolFromSmiles(smi) for smi in df.SMILES]
hmols = [Chem.AddHs(m) for m in mols]
for mol in hmols:
AllChem.EmbedMolecule(mol,AllChem.ETKDG())
print(AllChem.UFFOptimizeMolecule(mol,1000))
smiles = list(df.SMILES)
sid = list(df.SOURCE_ID)
libs = df[df.columns[0]]
writer = Chem.SDWriter('TEST.sdf')
for n in range(len(libs)):
hmols[n].SetProp("_Library","%s"%libs[n])
hmols[n].SetProp("_Name","%s"%sid[n])
hmols[n].SetProp("_SourceID","%s"%sid[n])
hmols[n].SetProp("_SMILES","%s"%smiles[n])
writer.write(hmols[n])
writer.close()
@filipsPL
Copy link

Thanks!

@gebdu
Copy link

gebdu commented Sep 30, 2021

CC1N=C(c2cc3ccccc3o2)N=C1c1ccco1

Tried above SMILE file, but got error:

mols = [Chem.MolFromSmiles(smi) for smi in df.SMILES]
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib64/python3.9/site-packages/pandas/core/generic.py", line 5487, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'SMILES'

@Ruibin-Liu
Copy link

Ruibin-Liu commented Jun 13, 2023

Your SMILES.csv file should contain the header line so that the file content is:

SMILES
CC1N=C(c2cc3ccccc3o2)N=C1c1ccco1

Update: the above change only deals with the 'SMILES' column but not the 'SOURCE_ID' and the first column whose name is not known to us but looks like a library name for the corresponding compounds. So to get the code to work, the 'SMILES.csv' should look like:

LIBRARY,SMILES,SOURCE_ID
Pubchem,CC1N=C(c2cc3ccccc3o2)N=C1c1ccco1,11234

Where 'Pubchem' and '11234' are just example values for LIBRARY and SOURCE_ID respectively. Also, the 'LIBRARY' column has to be in the first place but the others can change their orders.

For those having more knowledge of the pandas library and the csv format, you should change the original code to be consistent with your csv file instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment