Skip to content

Instantly share code, notes, and snippets.

@iwatobipen
Created August 8, 2021 13:27
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save iwatobipen/6d8708d8c77c615cfffbb89409be730d to your computer and use it in GitHub Desktop.
Save iwatobipen/6d8708d8c77c615cfffbb89409be730d to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ale94mleon
Copy link

Hi I was wondering about this example:

smi1 = 'CCCOCC'
smi2 = 'CCCC(=O)CC'

In this case I was expecting that the ether oxygen and the carbonyl group are the difference . But it is not the result

@TSAndrews
Copy link

Hi I was wondering about this example:

smi1 = 'CCCOCC'
smi2 = 'CCCC(=O)CC'

In this case I was expecting that the ether oxygen and the carbonyl group are the difference . But it is not the result

Unfortunately, rdFMCS.FindMCS finds the maximum common substructure and not the differences between two molecules and so this code can only highlight what is not in the maximum common substructure. Since the ether oxygen does not match the carbonyl carbon the maximal substructure can only include a single proyl chain as the mismatching ether/carbonyl breaks the molecule in half and so the other propyl chain is not in the same substructure.

If you want to extend this code to allow it to identify differences such as the one you describe, what you would have to do is iteratively split the molecules along the bonds that connect the matched/unmatched subgroups, followed by repeating the matching step between the unmatched subgroups until rdFMCS.FindMCS cannot find anymore.

The connecting bonds can be found with a function like:

def find_connecting_bonds(mol:Chem.rdchem.Mol,atomIds:list[int]):
        target_atoms=set(atomIds)
        bonds=[]
        for atomA in atomIds:
            atom_bonds=mol.GetAtomWithIdx(atomA).GetBonds()
            for b in atom_bonds:
                atomB=b.GetEndAtomIdx() if b.GetEndAtomIdx()!=atomA else b.GetBeginAtomIdx()
                if (atomB not in target_atoms): 
                    bonds.append(b.GetIdx())
        return bonds

Where atomIds is the list of atom Ids that were highlighted as mismatching by view_differences (i.e. target_atm1/target_atm2).
You could then use something like this for each iteration:

    target_atm1,target_atm2=difference(mol1, mol2)
    excess1=find_connecting_bonds(mol1,target_atm1)
    if excess1:
            atom_map=[]         #List to contain a map so we can track atoms through split
            split=Chem.FragmentOnBonds(mol, tuple(excess1))
            mol1_frags=Chem.GetMolFrags(split, asMols=True, frags= atom_map)

            unaccounted_frags1=set()
            for atom in target_atm1:
                unaccounted_frags1.add(atom_map[atom])

            if len(unaccounted_frags)==1:
                mol1_unnacounted = mol_frags[unaccounted_frags.pop()]

You can then compare mol1_unnacounted to mol2_unnacounted in the same way as you did mol1 to mol2! Rinse and repeat until you're left with your carbonyl and ether and (if you've tracked atom IDs properly) highlight them in your original molecule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment