Last active
August 29, 2015 14:03
-
-
Save kenwebb/6c71b62ab83af820939a to your computer and use it in GitHub Desktop.
SMILES
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<!--Xholon Workbook http://www.primordion.com/Xholon/gwt/ MIT License, Copyright (C) Ken Webb, Sat Jun 28 2014 08:10:36 GMT-0400 (EDT)--> | |
<XholonWorkbook> | |
<Notes><![CDATA[ | |
Xholon | |
------ | |
Title: SMILES | |
Description: Simplified Molecular Input Line Entry System | |
Url: http://www.primordion.com/Xholon/gwt/ | |
InternalName: 6c71b62ab83af820939a | |
Keywords: | |
My Notes | |
-------- | |
According to wikipedia (1): | |
The Simplified Molecular-Input Line-Entry System or SMILES is a specification in form of a line notation for describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules. | |
In this workbook I explore how SMILES can be integrated with Xholon. | |
In a chemical graph, the nodes are atoms, and the edges are semi-rigid bonds that can be single, double, or triple according to the rules of valence bond theory.[3] | |
Xholon containment hierarchy doesn't seem to make sense for molecules and SMILES. But I am using Xholon hierarchy to represent branches in the chemical structure. | |
TODO | |
- use explicit ports between siblings | |
- the bonds are the active objects; all ports are from the bonds to the atoms | |
- the bonds are any of: Sngl Dobl Trpl Rmtc | |
- possibly include explicit Sngl bonds between all otherwise unbonded siblings | |
- handle cycles and cross-branch bonds | |
- add an extra bond to represent the final part of the cycle ? | |
- `In a SMILES string such as "C1CCCCC1", the first occurrence of a ring-closure number (an "rnum") creates an "open bond" to the atom that precedes the ring-closure number (the "rnum"). When that same rnum is encountered later in the string, a bond is made between the two atoms, which typically forms a cyclic structure.`[3] | |
Tentative Conclusions June 28, 2014 | |
--------------------- | |
My exploration of SMILES is incomplete, but I do have some tentative conclusions. | |
- SMILES chemical branches are analogous to Xholon hierarchy | |
- SMILES branch chains are effectibely contained within a SMILES main chain | |
- SMILES siblings are connected with single bonds by default, | |
while Xholon siblings are not connected (SMILES .) by default | |
- SMILES siblings are ordered, while Xholon siblings are unordered | |
- I don't think SMILES has a way of naming main and branch chains | |
- if there's only a main chain, then it's name is the same as the molecule name | |
- SMILES branches are specified using unnamed ( and ) | |
- every SMILES branch has an ASCII-specified structure which functions as an implicit name | |
- all Xholon subtrees have names which are separate from the details of their structure | |
- SMILES allows any atom to bond with any other atom, | |
which is analogous to Xholon ports | |
References | |
---------- | |
(1) http://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system | |
(2) http://www.daylight.com/dayhtml/doc/theory/ | |
(3) http://www.opensmiles.org/opensmiles.html | |
(4) http://cactus.nci.nih.gov/chemical/structure | |
http://cactus.nci.nih.gov/chemical/structure/aspirin/smiles | |
converts chemical names to SMILES and other formats | |
(5) http://pubchem.ncbi.nlm.nih.gov/edit2/index.html | |
converts SMILES to SVG | |
]]></Notes> | |
<_-.XholonClass> | |
<MoleculeSystem/> | |
<Mlcl superClass="Attribute_String"/> <!-- molecule --> | |
<Molecule/> | |
<Atom> | |
<!-- Organic Subset, in SMILES, these atoms don't require square brackets around them --> | |
<Rgnc> | |
<Rgnl> | |
<!-- | |
aliphatic_organic ::= 'B' | 'C' | 'N' | 'O' | 'S' | 'P' | 'F' | 'Cl' | 'Br' | 'I' | |
--> | |
<B/> | |
<C/> | |
<N/> | |
<O/> | |
<S/> | |
<P/> | |
<F/> | |
<Cl/> | |
<Br/> | |
<I/> | |
</Rgnl> | |
<Rgnr> | |
<!-- | |
aromatic_organic ::= 'b' | 'c' | 'n' | 'o' | 's' | 'p' | |
--> | |
<b/> | |
<c/> | |
<n/> | |
<o/> | |
<s/> | |
<p/> | |
</Rgnr> | |
</Rgnc> | |
<!-- | |
BRACKET ATOMS in SMILES, these atoms DO require square brackets around them | |
element_symbols ::= 'H' | 'He' | 'Li' | 'Be' | 'B' | 'C' | 'N' | 'O' | 'F' | 'Ne' | 'Na' | 'Mg' | 'Al' | 'Si' | 'P' | 'S' | 'Cl' | 'Ar' | 'K' | 'Ca' | 'Sc' | 'Ti' | 'V' | 'Cr' | 'Mn' | 'Fe' | 'Co' | 'Ni' | 'Cu' | 'Zn' | 'Ga' | 'Ge' | 'As' | 'Se' | 'Br' | 'Kr' | 'Rb' | 'Sr' | 'Y' | 'Zr' | 'Nb' | 'Mo' | 'Tc' | 'Ru' | 'Rh' | 'Pd' | 'Ag' | 'Cd' | 'In' | 'Sn' | 'Sb' | 'Te' | 'I' | 'Xe' | 'Cs' | 'Ba' | 'Hf' | 'Ta' | 'W' | 'Re' | 'Os' | 'Ir' | 'Pt' | 'Au' | 'Hg' | 'Tl' | 'Pb' | 'Bi' | 'Po' | 'At' | 'Rn' | 'Fr' | 'Ra' | 'Rf' | 'Db' | 'Sg' | 'Bh' | 'Hs' | 'Mt' | 'Ds' | 'Rg' | 'Cn' | 'Fl' | 'Lv' | 'La' | 'Ce' | 'Pr' | 'Nd' | 'Pm' | 'Sm' | 'Eu' | 'Gd' | 'Tb' | 'Dy' | 'Ho' | 'Er' | 'Tm' | 'Yb' | 'Lu' | 'Ac' | 'Th' | 'Pa' | 'U' | 'Np' | 'Pu' | 'Am' | 'Cm' | 'Bk' | 'Cf' | 'Es' | 'Fm' | 'Md' | 'No' | 'Lr' | |
aromatic_symbols ::= 'b' | 'c' | 'n' | 'o' | 'p' | 's' | 'se' | 'as' | |
--> | |
<Brkt> | |
<Brkl> | |
<_H/> | |
<_He/> | |
<_Li/> | |
<_Be/> | |
<_B/> | |
<_C/> | |
<_N/> | |
<_O/> | |
<_F/> | |
<_Ne/> | |
<_Na/> | |
<_Mg/> | |
<_Al/> | |
<_Si/> | |
<_P/> | |
<_S/> | |
<_Cl/> | |
<_Ar/> | |
<_K/> | |
<_Ca/> | |
<_Sc/> | |
<_Ti/> | |
<_V/> | |
<_Cr/> | |
<_Mn/> | |
<_Fe/> | |
<_Co/> | |
<_Ni/> | |
<_Cu/> | |
<_Zn/> | |
<_Ga/> | |
<_Ge/> | |
<_As/> | |
<_Se/> | |
<_Br/> | |
<_Kr/> | |
<_Rb/> | |
<_Sr/> | |
<_Y/> | |
<_Zr/> | |
<_Nb/> | |
<_Mo/> | |
<_Tc/> | |
<_Ru/> | |
<_Rh/> | |
<_Pd/> | |
<_Ag/> | |
<_Cd/> | |
<_In/> | |
<_Sn/> | |
<_Sb/> | |
<_Te/> | |
<_I/> | |
<_Xe/> | |
<_Cs/> | |
<_Ba/> | |
<_Hf/> | |
<_Ta/> | |
<_W/> | |
<_Re/> | |
<_Os/> | |
<_Ir/> | |
<_Pt/> | |
<_Au/> | |
<_Hg/> | |
<_Tl/> | |
<_Pb/> | |
<_Bi/> | |
<_Po/> | |
<_At/> | |
<_Rn/> | |
<_Fr/> | |
<_Ra/> | |
<_Rf/> | |
<_Db/> | |
<_Sg/> | |
<_Bh/> | |
<_Hs/> | |
<_Mt/> | |
<_Ds/> | |
<_Rg/> | |
<_Cn/> | |
<_Fl/> | |
<_Lv/> | |
<_La/> | |
<_Ce/> | |
<_Pr/> | |
<_Nd/> | |
<_Pm/> | |
<_Sm/> | |
<_Eu/> | |
<_Gd/> | |
<_Tb/> | |
<_Dy/> | |
<_Ho/> | |
<_Er/> | |
<_Tm/> | |
<_Yb/> | |
<_Lu/> | |
<_Ac/> | |
<_Th/> | |
<_Pa/> | |
<_U/> | |
<_Np/> | |
<_Pu/> | |
<_Am/> | |
<_Cm/> | |
<_Bk/> | |
<_Cf/> | |
<_Es/> | |
<_Fm/> | |
<_Md/> | |
<_No/> | |
<_Lr/> | |
</Brkl> | |
<Brkr> | |
<_b/> | |
<_c/> | |
<_n/> | |
<_o/> | |
<_p/> | |
<_s/> | |
<_se/> | |
<_as/> | |
</Brkr> | |
</Brkt> | |
</Atom> | |
<!-- chemical bonds --> | |
<Bond> | |
<!-- - single bond --> | |
<Sngl/> | |
<!-- = double bond --> | |
<Dobl/> | |
<!-- # triple bond --> | |
<Trpl/> | |
<!-- $ quadrupal bond OpenSMILES --> | |
<Qdpl/> | |
<!-- : aromatic bond --> | |
<Rmtc/> | |
<!-- / directional bonds --> | |
<!-- \ directional bonds --> | |
</Bond> | |
<!-- disconnected structures; indicates that adjacent atoms are not bonded to each other --> | |
<Dscn/> | |
<Brch/> | |
</_-.XholonClass> | |
<xholonClassDetails> | |
<Sngl xhType="XhtypePureActiveObject"/> | |
<Dobl xhType="XhtypePureActiveObject"/> | |
<Trpl xhType="XhtypePureActiveObject"/> | |
<Rmtc xhType="XhtypePureActiveObject"/> | |
</xholonClassDetails> | |
<MoleculeSystem> | |
<Mlcl roleName="ethane">CC</Mlcl> | |
<Mlcl roleName="carbon dioxide">O=C=O</Mlcl> | |
<Mlcl roleName="triethylamine">CCN(CC)CC</Mlcl> | |
<Mlcl roleName="pentane">CCCCC</Mlcl> | |
<Mlcl roleName="aspirin">C1=CC=CC(=C1C(O)=O)OC(C)=O</Mlcl> | |
<Mlcl roleName="thiosulfate">OS(=O)(=S)O</Mlcl> | |
<!-- TODO + and - are not yet handled --> | |
<Mlcl roleName="sodium chloride">[Na+].[Cl-]</Mlcl> | |
<Mlcl roleName="ring">C1CCCCC1</Mlcl> | |
<Mlcl roleName="cubane">C12C3C4C1C5C4C3C25</Mlcl> | |
<Mlcl roleName="ring-closure number test">C0123456789C0C1C2C3C4C5C6C7C8C9</Mlcl> | |
<Mlcl roleName="syntax test">BCNOSPFIbcnospBrCl-=#:()[]XYZxyz</Mlcl> | |
<!-- TODO "arbitrary atom names" needs more work --> | |
<!--<Mlcl roleName="arbitrary atom names">[one]2[two]([three][three]3[three])[four]2[five]3</Mlcl>--> | |
</MoleculeSystem> | |
<MoleculeSystembehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[ | |
var me; | |
var allowArbitraryAtomNames = false; | |
var beh = { | |
postConfigure: function() { | |
me = this.cnode.parent(); | |
$wnd.xh.param("MaxPorts","2"); | |
var service = $wnd.xh.service("XholonHelperService"); | |
var mlcl = me.first(); | |
while (mlcl) { | |
if (mlcl.xhc().name() != "Mlcl") {break;} | |
var txt = mlcl.text().trim(); | |
me.println(txt); | |
var xml = this.parse(txt, mlcl.role()); | |
service.call(-2013, xml, me); | |
var mlclNext = mlcl.next(); | |
mlcl.remove(); | |
mlcl = mlclNext; | |
} | |
this.cnode.remove(); | |
}, // end postConfigure() | |
parse: function(txt, role) { | |
var xml = '<Molecule roleName="' + role + '">'; | |
xml += "<Annotation>" + txt + "</Annotation>"; | |
var i = 0; | |
while (i < txt.length) { | |
var token = txt.charAt(i); | |
switch (token) { | |
case 'B': | |
if (txt.charAt(i+1) == "r") { | |
i++; | |
token = "Br"; | |
} | |
xml += this.makeXmlNode(token); | |
break; | |
case 'C': | |
if (txt.charAt(i+1) == "l") { | |
i++; | |
token = "Cl"; | |
} | |
xml += this.makeXmlNode(token); | |
break; | |
case 'N': | |
case 'O': | |
case 'S': | |
case 'P': | |
case 'F': | |
case 'I': | |
case 'b': | |
case 'c': | |
case 'n': | |
case 'o': | |
case 's': | |
case 'p': | |
xml += this.makeXmlNode(token); | |
break; | |
// bond | |
case '-': | |
xml += this.makeXmlNode("Sngl"); | |
break; | |
case '=': | |
xml += this.makeXmlNode("Dobl"); | |
break; | |
case '#': | |
xml += this.makeXmlNode("Trpl"); | |
break; | |
case ':': | |
xml += this.makeXmlNode("Rmtc"); | |
break; | |
// branch | |
case '(': | |
xml += "<Brch>"; | |
break; | |
case ')': | |
xml += "</Brch>"; | |
break; | |
// bracketed atom | |
case '[': | |
var bracketedXml = this.parseBracketed(txt, i); // [H] becomes < _H /> | |
if (bracketedXml && bracketedXml.length > 4) { | |
xml += bracketedXml; | |
i += bracketedXml.length - 4; // ignore < _ / > | |
} | |
break; | |
case ']': | |
// no need to do anything | |
break; | |
// ring-closure number (an "rnum") | |
case '0': | |
case '1': | |
case '2': | |
case '3': | |
case '4': | |
case '5': | |
case '6': | |
case '7': | |
case '8': | |
case '9': | |
xml += '<' + 'Sngl' + ' val="' + '10' + token + '.0"' + '/>'; | |
break; | |
// charge | |
case '+': | |
case '-': | |
// TODO | |
break; | |
// disconnection | |
case '.': | |
xml += this.makeXmlNode("Dscn"); | |
break; | |
default: break; | |
} // end switch | |
i++; | |
} // end while | |
xml += "</Molecule>\n"; | |
me.println(xml); | |
return xml; | |
}, // end parse() | |
makeXmlNode: function(tagName) { | |
return "<" + tagName + "/>"; | |
}, | |
// txt.charAt(i) equals '[' | |
parseBracketed: function(txt, i) { | |
var start = ++i; | |
var end = txt.indexOf("]", i); | |
var token = txt.substring(start, end); | |
$wnd.console.log(token); | |
if (token) { | |
// the token may end with + or - | |
var lastChar = token.charAt(token.length-1); | |
if (lastChar == "+" || lastChar == "-") { | |
token = token.substring(0, token.length-1); | |
} | |
$wnd.console.log(token); | |
switch (token) { | |
case 'H': | |
case 'He': | |
case 'Li': | |
case 'Be': | |
case 'B': | |
case 'C': | |
case 'N': | |
case 'O': | |
case 'F': | |
case 'Ne': | |
case 'Na': | |
case 'Mg': | |
case 'Al': | |
case 'Si': | |
case 'P': | |
case 'S': | |
case 'Cl': | |
case 'Ar': | |
case 'K': | |
case 'Ca': | |
case 'Sc': | |
case 'Ti': | |
case 'V': | |
case 'Cr': | |
case 'Mn': | |
case 'Fe': | |
case 'Co': | |
case 'Ni': | |
case 'Cu': | |
case 'Zn': | |
case 'Ga': | |
case 'Ge': | |
case 'As': | |
case 'Se': | |
case 'Br': | |
case 'Kr': | |
case 'Rb': | |
case 'Sr': | |
case 'Y': | |
case 'Zr': | |
case 'Nb': | |
case 'Mo': | |
case 'Tc': | |
case 'Ru': | |
case 'Rh': | |
case 'Pd': | |
case 'Ag': | |
case 'Cd': | |
case 'In': | |
case 'Sn': | |
case 'Sb': | |
case 'Te': | |
case 'I': | |
case 'Xe': | |
case 'Cs': | |
case 'Ba': | |
case 'Hf': | |
case 'Ta': | |
case 'W': | |
case 'Re': | |
case 'Os': | |
case 'Ir': | |
case 'Pt': | |
case 'Au': | |
case 'Hg': | |
case 'Tl': | |
case 'Pb': | |
case 'Bi': | |
case 'Po': | |
case 'At': | |
case 'Rn': | |
case 'Fr': | |
case 'Ra': | |
case 'Rf': | |
case 'Db': | |
case 'Sg': | |
case 'Bh': | |
case 'Hs': | |
case 'Mt': | |
case 'Ds': | |
case 'Rg': | |
case 'Cn': | |
case 'Fl': | |
case 'Lv': | |
case 'La': | |
case 'Ce': | |
case 'Pr': | |
case 'Nd': | |
case 'Pm': | |
case 'Sm': | |
case 'Eu': | |
case 'Gd': | |
case 'Tb': | |
case 'Dy': | |
case 'Ho': | |
case 'Er': | |
case 'Tm': | |
case 'Yb': | |
case 'Lu': | |
case 'Ac': | |
case 'Th': | |
case 'Pa': | |
case 'U': | |
case 'Np': | |
case 'Pu': | |
case 'Am': | |
case 'Cm': | |
case 'Bk': | |
case 'Cf': | |
case 'Es': | |
case 'Fm': | |
case 'Md': | |
case 'No': | |
case 'Lr': | |
return this.makeXmlNode("_" + token); | |
default: | |
if (allowArbitraryAtomNames) { | |
return this.makeXmlNode(token); | |
} | |
return ""; | |
} // end switch | |
} // end if | |
return ""; | |
} // end parseBracketed() | |
} // end beh | |
]]></MoleculeSystembehavior> | |
<Snglbehavior implName="org.primordion.xholon.base.Behavior_gwtjs"><![CDATA[ | |
var bond; | |
var beh = { | |
postConfigure: function() { | |
bond = this.cnode.parent(); | |
//bond.println(bond.toString()); | |
var rnum = bond.val(); | |
if (rnum == 0) { | |
bond.port(0, this.findPreviousAtom(bond.prev())); | |
bond.port(1, bond.next()); | |
} | |
else { | |
var resultNode = this.findMatchingNode(rnum, bond.next()); | |
if (resultNode) { | |
//bond.println(" resultNode: " + resultNode.toString()); | |
bond.port(0, this.findPreviousAtom(bond.prev())); | |
bond.port(1, this.findPreviousAtom(resultNode.prev())); | |
resultNode.remove(); | |
} | |
} | |
this.cnode.remove(); | |
}, | |
// find the next node with the same rnum | |
findMatchingNode: function(rnum, node) { | |
if (node == null) {return null;} | |
if (rnum == node.val()) {return node;} | |
if (node.first()) { | |
var resultNode = this.findMatchingNode(rnum, node.first()); | |
if (resultNode) {return resultNode;} | |
} | |
if (node.next()) { | |
var resultNode = this.findMatchingNode(rnum, node.next()); | |
if (resultNode) {return resultNode;} | |
} | |
}, | |
// find a node's previous sibling that's an atom; skip over bond nodes | |
findPreviousAtom: function(node) { | |
while (node != null) { | |
if (node.xhc().parent() && node.xhc().parent().name() == "Bond") { | |
node = node.prev(); | |
} | |
else { | |
return node; | |
} | |
} | |
return null; | |
} | |
} | |
]]></Snglbehavior> | |
<SvgClient><Attribute_String roleName="svgUri"><![CDATA[data:image/svg+xml, | |
<svg width="100" height="50" xmlns="http://www.w3.org/2000/svg"> | |
<g> | |
<title>SMILES</title> | |
<rect id="MoleculeSystem" fill="#98FB98" height="50" width="50" x="25" y="0"/> | |
<g> | |
<title>Carbon</title> | |
<rect id="MoleculeSystem/Molecule/C" fill="#6AB06A" height="50" width="10" x="80" y="0"/> | |
</g> | |
</g> | |
</svg> | |
]]></Attribute_String><Attribute_String roleName="setup">${MODELNAME_DEFAULT},${SVGURI_DEFAULT}</Attribute_String></SvgClient> | |
</XholonWorkbook> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment