Created
March 2, 2017 01:55
-
-
Save leelasd/43219a222bf57d3e01c2c83f2ad9b031 to your computer and use it in GitHub Desktop.
Convert Smiles code to 3D and save to SDF
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
from rdkit import Chem | |
from rdkit.Chem import AllChem | |
df = pd.read_csv('SMILES.csv') | |
mols = [Chem.MolFromSmiles(smi) for smi in df.SMILES] | |
hmols = [Chem.AddHs(m) for m in mols] | |
for mol in hmols: | |
AllChem.EmbedMolecule(mol,AllChem.ETKDG()) | |
print(AllChem.UFFOptimizeMolecule(mol,1000)) | |
smiles = list(df.SMILES) | |
sid = list(df.SOURCE_ID) | |
libs = df[df.columns[0]] | |
writer = Chem.SDWriter('TEST.sdf') | |
for n in range(len(libs)): | |
hmols[n].SetProp("_Library","%s"%libs[n]) | |
hmols[n].SetProp("_Name","%s"%sid[n]) | |
hmols[n].SetProp("_SourceID","%s"%sid[n]) | |
hmols[n].SetProp("_SMILES","%s"%smiles[n]) | |
writer.write(hmols[n]) | |
writer.close() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Your
SMILES.csv
file should contain the header line so that the file content is:Update: the above change only deals with the 'SMILES' column but not the 'SOURCE_ID' and the first column whose name is not known to us but looks like a library name for the corresponding compounds. So to get the code to work, the 'SMILES.csv' should look like:
Where 'Pubchem' and '11234' are just example values for LIBRARY and SOURCE_ID respectively. Also, the 'LIBRARY' column has to be in the first place but the others can change their orders.
For those having more knowledge of the pandas library and the csv format, you should change the original code to be consistent with your csv file instead.