Skip to content

Instantly share code, notes, and snippets.

View johnmay's full-sized avatar

John Mayfield johnmay

View GitHub Profile
@johnmay
johnmay / CdkExprs.java
Created March 16, 2021 20:27
CDK SMARTS Expr API
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
IAtomContainer mol = new QueryAtomContainer(bldr);
if (!Smarts.parse(mol, "[C,N;H0,H1+]-*", Smarts.FLAVOR_LOOSE)) {
System.err.println("ERROR - " + Smarts.getLastErrorMesg());
System.err.println(Smarts.getLastErrorLocation());
return;
}
QueryAtom qatom1 = (QueryAtom) mol.getAtom(0); // instanceof for safety
Expr expr = qatom1.getExpression();
System.err.println(expr); // AND(OR(ALIPHATIC_ELEMENT=6,ALIPHATIC_ELEMENT=7),OR(TOTAL_H_COUNT=0,AND(TOTAL_H_COUNT=1,FORMAL_CHARGE=1)))
curl ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Weekly/2020-01-05/Extras/CID-SMILES.gz | \
gunzip -c | \
head -n 10000000 | \
awk '{print $2 " " $1}' > pubchem_first10m.smi
elements = {'H': 1, 'He': 2, etc}
def strtoelem(element):
return elemements.get(element, 0)
@johnmay
johnmay / CDKAlign.java
Created October 11, 2018 18:17
Align CDK molecule to substructure
public static void alignMoleculeToSubstructure(IAtomContainer mol,
IAtomContainer sub,
boolean fixBonds) throws CDKException {
Pattern substructurePattern = Pattern.findSubstructure(sub);
Mappings mappings = substructurePattern.matchAll(mol);
Set<IAtom> fixedAtoms = new HashSet<IAtom>();
Set<IBond> fixedBonds = new HashSet<IBond>();
for (Map<IChemObject, IChemObject> map : mappings.toAtomBondMap()) {
GeometryUtil.scaleMolecule(sub,
@johnmay
johnmay / Main.java
Last active January 8, 2018 21:02
Using CDK's RGroupQuery
public static void main(String[] args) throws CDKException {
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance();
SmilesParser smipar = new SmilesParser(bldr);
IAtomContainer root = smipar.parseSmiles("CC1=CC=NC2=C1C=CC(=C2)C1=CC([R1])=C([R2])C([R3])=C1");
Map<IAtom, Map<Integer,IBond>> rootAttach = new HashMap<>();
Map<Integer,RGroupList> rgrpMap = new HashMap<>();
defineRgroup(root, rootAttach, rgrpMap, "R1", newRGroupList("[H].[CH2]CO.[CH2]Cl", 1));
defineRgroup(root, rootAttach, rgrpMap, "R2", newRGroupList("[H].[CH2]CN.[CH2]F", 2));
defineRgroup(root, rootAttach, rgrpMap, "R3", newRGroupList("[H].[CH2]CCl.[CH2]F", 3));
Building on Sand: Standard InChIs on Non-Standard Molfiles
John Mayfield
The molfile serves as a de facto standard for chemical information exchange. It is perhaps the most
widely supported format with its core syntax being easy to understand, parse, and generate. Beyond
the core syntax, more advanced features such as sgroups and enhanced stereochemistry are rarely
supported, often only being partially implemented and used. Additionally, several vendors,
toolkits, and service providers have added extended syntaxto their molfiles to solve particular
corner cases or representation problems.This talk will provide a brief summary of the less widely
supported features of the molfile including sgroups and enhanced stereochemistry. Additionally,
public static String toSmiles(CircularFingerprinter.FP fp, IAtomContainer mol) throws CDKException
{
IAtomContainer part = mol.getBuilder().newAtomContainer();
Set<IAtom> aset = new HashSet<>();
int[] hcounts = new int[mol.getAtomCount()];
for (int idx : fp.atoms) {
IAtom atom = mol.getAtom(idx);
aset.add(atom);
part.addAtom(atom);
hcounts[idx] = atom.getImplicitHydrogenCount();
Aromaticity arom = new Aromaticity(ElectronDontation.cdk(),
Cycles.cdkAromaticSet());
SmilesParser smipar = new SmilesParser(SilentChemObjectBuilder.getInstance());
String smi = "*CCO*";
IAtomContainer mol = smipar.parseSmiles(smi);
Sgroup sgrp = new Sgroup();
sgrp.addAtom(mol.getAtom(1));
sgrp.addAtom(mol.getAtom(2));
sgrp.addAtom(mol.getAtom(3));
sgrp.addBond(mol.getBond(0)); // bond crossing bracket (xbond)
public static void main(String[] args) throws CDKException, IOException {
String rxnfile = "$RXN\n"
+ "\n"
+ "\n"
+ "\n"
+ " 2 1 0\n"
+ "$MOL\n"
+ "\n"
+ " Ketcher 01271718382D 1 1.00000 0.00000 0\n"
+ "\n"