Skip to content

Instantly share code, notes, and snippets.

@ucarion
Last active December 21, 2015 02:19
Show Gist options
  • Save ucarion/6234600 to your computer and use it in GitHub Desktop.
Save ucarion/6234600 to your computer and use it in GitHub Desktop.
How the project works

How this project works can be separated into three parts:

  1. Precalculating the data
  2. Searching the data
  3. Integrating with ChemAxon

Precalculating the Data

Data is precalculated using the LigandDistanceDataTreeBuilder. This class only generates data for distance interactions between ligands and other sorts of groups, so it does not generate protein-protein data. Right now, it is limited to searching through interactions that are up to 5 angstroms in length. If you want to create your own set of precalculated data, the LigandDistanceDataTreeBuilder is most likely the only file you'll have to modify too much. Add your own rules for what interactions to include and DistanceDataTree will take care of saving the interactions efficiently.

Which brings us to DistanceDataTree. This class is essentially just a tree. It is nested somewhat deeply, but because the nesting is an exact and constant depth, I elected to implement each level of the tree as its own class. So, within the DistanceDataTree class you also have 5 subclasses. They are:

  • OriginGroupTree
  • OriginElementTree
  • OriginAtomNameTree
  • TargetGroupTree
  • TargetElementTree

The terms "origin" and "target" are my own, and in my particular case they tend to be replaceable with the words "ligand" and "residue", respectively. However, I did not name them that way to emphasize to future users that any sort of atom will work for the origin and target data. So what does each level of the tree mean? They mean:

  • DistanceDataTree takes in origin group names ('HEM', 'PO4', etc.) and returns OriginGroupTrees.
  • OriginGroupTree takes in origin elements (Element.O, Element.C, etc.) and returns OriginElementTrees.
  • OriginElementTree takes in origin atom names ('OXT', 'CD1', etc.) and returns OriginAtomNameTrees.
  • OriginAtomNameTree takes in target group names ('ALA', 'TYR', etc.) and returns TargetGroupTrees.
  • TargetGroupTree takes in target elements (Element.O, Element.C, etc.) and returns TargetElementTrees.
  • TargetElementTree takes in target atom names ('OXT', 'CD1', etc.) and returns lists of DistanceResults.

How these classes work is not very complicated but very tedious to explain, so I won't go there. It's basically just a plain old tree.

DistanceResults are the final things stored. They keep track of a distance interaction that took place in some structure. LigandDistanceDataTreeBuilder takes care of generating these guys. Stored in a DistanceResult is the PDB ID where this interaction took place, the serial numbers of the two atoms in the bond, and the distance (rounded down to the tenths place) between the two atoms.

So now that we've covered what data is being stored, here's how we precalculate this stuff.

Our final goal is to save distance results in a very efficient and fast-to-retrieve manner. We will do this by saving them in a directory structure. But first, we must find all distance results in the PDB, and next we must save them to disk. Unfortunately, this first part cannot be done in one fell swoop; the PDB is far too large and you'd run out of memory.

The solution, then, is to calculate all the distance results and save them in serialized distance data trees in groups of a thousand PDB IDs. Then, once you've gone through the whole PDB, you go back and generate the directory structure for the whole PDB, going one thousand-PDB-ID-tree at a time. It takes awhile to do this -- over 24 hours, I'd estimate (I've never done this whole process continuously). Exactly how this is implemented can be seen in the DistanceDataTreeBuilderDriver class.

The directory structure used is not very fancy. It is really the directory counterpart to the DistanceDataTrees we stored them in. The folders are layed out like this:

LigandGroup /
  LigandElement /
    LigandAtomName /
      TargetGroup / 
        TargetElement /
          TargetAtomName.ser

Here, TargetAtomName.ser is a serialized file that compactly saves information about a list of DistanceResults. Each distance result goes on a separate line in that file. Exactly how I save and re-load those DistanceResults is handled by DistanceResult#toSerializedForm() and DistanceResult#parseSerializedResult().

For an example of what the directory looks like, here's a real-life demonstration of what the ligand "ZZZ" has in its directory structure:

ulysse$ tree ZZZ/ | more
ZZZ/
├── C
│   ├── C10
│   │   ├── ASP
│   │   │   ├── N
│   │   │   │   └── N.ser
│   │   │   └── O
│   │   │       └── OD1.ser
│   │   ├── GLY
│   │   │   └── O
│   │   │       └── O.ser
│   │   ├── ILE
│   │   │   └── C
│   │   │       └── CD1.ser
│   │   ├── LEU
│   │   │   └── C
│   │   │       └── CD2.ser
│   │   └── PHE
│   │       └── C
│   │           ├── CD1.ser
│   │           └── CZ.ser
│   ├── C2
│   │   ├── ALA
│   │   │   └── C
│   │   │       └── CB.ser
│   │   ├── ASP
│   │   │   └── O
│   │   │       └── O.ser
│   │   ├── GLY
│   │   │   └── O
│   │   │       └── O.ser
│   │   ├── ILE
│   │   │   ├── C
│   │   │   │   ├── C.ser
│   │   │   │   ├── CA.ser
│   │   │   │   ├── CB.ser
│   │   │   │   ├── CD1.ser
│   │   │   │   └── CG1.ser
│   │   │   ├── N
│   │   │   │   └── N.ser
│   │   │   └── O

The process of saving DistanceDataTrees into directory structures is handled by DistanceDataTreeBuilderDriver. Saving and re-loading DistanceResults is taken care of by DistanceDataTreeSerializer and DistanceResult.

Searching the Data

Once all the data has been saved into the directory structure thanks to DistanceDataTreeSerializer, then searching is done with DistanceDataTreeDirSearcher. What this directory searcher does is it takes in a DistanceQuery and looks through the directory structure for folders and files that match the parameters in the distance query. For instance, if you specify that the interaction you want takes place with a ligand called "XYZ", then we will search through the "XYZ/" directory. If you also specify the element of your ligand is Nitrogen, then we'll search through "XYZ/N/". If you don't specify the atom name of your ligand atom, then we'll search through "XYZ/N/*/" (in other words, we'll search through all directories under "XYZ/N/"). If you specify that the interaction happens on an Alanine, then the directories we'll search through must look like this: "XYZ/N/*/ALA/". Nothing terribly magical is going on here.

Once we reach the bottom of the directory structure, we arrive on a bunch "serialized" DistanceResults. Here's an example of what those serialized distance results look like:

ulysse$ cat HEM/C/C1A/ALA/C/CA.ser
19HC~48~4440~61
19HC~48~4835~2268
1CZJ~48~959~786
2WIY~47~3124~2695
4EP6~49~3066~2668

Though that data might seem a little cryptic, it has a very simple meaning. The data is separated by tildes (I'm not sure why I chose to use tildes), and each portion the data means:

PDB ID of the structure where this interaction happened ~ (Distance (in Angstroms) of the interaction) * 10, rounded down ~ Serial number of the origin atom ~ Serial number of the target atom

For example, the first line, which is 19HC~48~4440~61, indicates that there was an interaction between atom #4440 and atom #61 in structure 19HC. The atoms are approximately 4.8 A away from each other.

In case you're wondering how to reproduce these calculated numbers, here's an example of some method that gets the serialized form of a distance result for an interaction, given the two atoms in the interaction and the structure this interaction is occuring in.

public static String getSerializedForm(Structure structure, Atom origin, Atom target) {
  String pdbID = structure.getPDBCode();
  int distance = (int) (Calc.getDistance(origin, target) * 10);
  String originSerial = origin.getPDBserial();
  String targetSerial = target.getPDBserial();

  return pdbID + "~" + distance + "~" + originSerial + "~" + targetSerial;
}

Note that the directory structure does not index by distance; as I've already explained, the information about the distance at which an interaction occured is stored in the DistanceResult. Thus, if the user has specified a particular distance range they want their interactions to fall into, then the directory structure searcher must open up the file containing the distance results, parse each of them, and then make sure their distances are between the user's specified minimum and maximum distances.

Integrating with ChemAxon

This is where things start to get a little hairy, unfortunately. When I say "integrating by ChemAxon", I mean getting this search tool to work with ChemAxon's MarvinView (an applet for drawing molecules), and getting it to work with the substructure search tool they (ChemAxon) provide. The first part of this section, getting it to work with MarvinSketch, is not something I've found an elegant way of doing. I've attached to this gist the code I've been using to try to extract a distance query from a drawing of a molecule, but as you can see it's not at all a very simple thing to do. As complicated as the specification might be, the source code pretty much speaks for itself here, so I won't talk about this too much. Here are a few things to know about ChemAxon's MarvinSketch if you want to try to modify / use that code:

  1. MarvinSketch exports queries into a type of XML called 'MRV' files. These are human-readable and it helps to look at those files directly when trying to understand what is going on in a query.
  2. When you "add data" in the MarvinSketch GUI, behind the scenes this creates a DataSgroup in the exported 'MRV' file. I look at these quite a bit in my code.
  3. It's possible to instantly get the MRV version of your drawing if you want that -- just go to 'File > Save As ...', and save it as an 'MRV' file.
  4. The final product of this tool would most likely need to have amino acids that have their atom names built-in. To do this, you must first create those molecules in marvin sketch, then save all of your amino acids in an MRV file. Make sure this MRV file is located in a directory where users can simply deliver an XmlHttpRequest to get that data from client-side JavaScript. Once you've put your file in place, then go to chemApplet.jsp, and where there are a bunch of calls to msketch_param you can add a line that goes as
msketch_param("tmpls0", ":My Amino Acids:path/to/your/amino/acids.mrv");

(The directory you provide should be relative to the chemApplet.jsp file.) (The "0" in "tmpls0" means "put this in position 0"; you can change this to "1" or "2" or some other number; if you set it to be the same position value as the one the default amino acids are at, then you can replace the amino acids altogether.)

So once we have the query, the next step is to parse the query to extract the information the user provided. This is done in the two files attached to this gist. In particular, the MarvinQueryParser class takes care of extracting information about the target atom, as well as finding the origin atom as well as finding out what the range of the distance interaction is. Right now, these three things are found using these assumptions:

  1. All atoms without an associated residue are part of the ligand (origin).
  2. The distance bond will have the field name of 'distance'; the attached data will be text such that after the text is split along spaces, the first and last token represent the minimum and maximum distance, respectively.
  3. The origin atom will be the atom in the distance bond that is also a ligand.

We find the associated residue of the atoms using the MolAtom#getResidueType() method. In MRV files, this data looks like this:

<!-- This atom has a nonzero residue type. -->
<atom id="a7" elementType="O" residueType="Ser"
  x2="5.7750000762939475" y2="1.386846438394310" />

<!-- This atom has a residue type of 0. -->
<atom id="a8" elementType="C"
  x2="-3.575000047683716" y2="1.4300000071525574" />

To find the distance property, we use ChemAxon's Molecule#getSgroupArray; an Sgroup takes many forms, including the DataSgroup, which is what we're interested in and is what's created when a user attaches data to something. I look for a data Sgroup that has the field name 'Distance' (capitalization is irrelevant) and then from this extract the minimum and maxmimum distance as well as figure out which atom is the origin atom based on which of the two atoms in the interaction has / hasn't an associated residue type.

Once we have extracted all the data that's possible to get from the query, the hard part begins: running the substructure search and figuring out atom names. This is taken care of in the DistanceQuerySearchTool. This program uses a lot of ChemAxon stuff that's a little weird (and by "weird" I mean "I don't entirely understand what's going on"), but it certainly works and is copied from the LigandDAO search tool in the PDB.

In the distance query search tool, we get the data from MarvinQueryParser, then we make a new JChemSearch that will search for the "origin" molecule (this molecule was aquired from MarvinQueryParser). This will return a list of substructure matches. However, we are in particular interested in finding precisely what atoms were matched in the substructure match. To do this, we must use the MolSearch class, which gives us detailed substructure match information for a given pair of molecules. The MolSearch will provide a MolSearch#findAll() method which returns an int[][].

If we let hits be the value returned from MolSearch#findAll(), then hits[i] represents the i-th substructure match between the two passed atoms (it's possible for the query atom to show up within the target atoms multiple times, and each of those gets its own hit). hits[i] provides information about converting indicies in your query to the corresponding indicies in the MolSearch's target. For example, if you have an atom whose index is '5' in your query, then hits[i][5] will give you the index of that atom in your target molecule.

Now that we have a way of finding out what atoms are being matched in a substructure search, the next step is to figure out the names of the atoms the user provided. This is actually a relatively complicated process that requires that we implement Morgan's algorithm.

Morgan's Algorithm

Morgan's algorithm is a way to identify atoms based on their extended connectivity. If done right, Morgan's algorithm can also allow us to consider most atoms to be different, but in some cases, due to pi-centeredness, consider some atoms to be equivalent (for example, we want to consider the two oxygens in a carboxyl to be equivalent).

Before we can implement Morgan's algorithm, we need to create a graph from a molecule. This needs to be done twice; once for a ChemAxon molecule, and once for a Biojava molecule. Morgan's algorithm will allow us to go back and forth. Implementing the programs to create the graphs from the molecules is not very difficult to do and is done in the files ChemAxonGraphMaker and PDBGraphMaker in the PDBDistances repository.

We then implement Morgan's algorithm. This is done in the Morgan and GraphDriver classes. At a high level, Morgan's algorithm looks like this:

initialize atom values
while the number of unique atom values continues to increase:
  for each atom:
    atom.value += the sum of the values of its neighbors

Once this algorithm stops (because the number of unique values has stopped increasing), then the atoms have acquired their final values. Two atoms with the same values are equivalent.

Initializing the atom values is not very complicated, and this is where some chemical heuristics have to be made. An atom's initial value is calculated as:

Initial Value = Atomic Number * 10 + Pi-Centeredness

Pi-Centeredness is 1 if the atom is in a pi-center, or 0 if it isn't. We determine an atom to be in a pi center if it meets any of these criteria:

  1. It forms a bond with another atom, and that bond has an order greater than 1.
  2. It is electronegative (more on this later), and is single-bonded to an atom that is in turn double-bonded to an electronegative atom.

We determine an atom to be electronegative if its valence electron count is between 5 and 7, inclusive.

That's all there is to it! Now, let's look back at the Distance query pipeline...

... Back to Distance Queries and ChemAxon

You'll recall that at this point we have found our substructure hits and we have a way of getting which atoms matched in a substructure search. So next, we apply morgan's algorithm to both the PDB's version of the match (which will also have atom names) and the ChemAxon's version of the match, and use the atom values to convert MolAtoms to atom names. The code sort of speaks for itself in this case. Once we have the origin atom name(s), we go through each possible name and create a distance query for each. Then, we just run the distance query we've created, and presto magico! the program works.

Attached are the two files that are not part of PDBDistances. These would go in PDB code or something.

Afterword: Adding amino acids with atom names to the PDB site.

As I've mentioned before, the code for setting template libraries goes like this:

msketch_param("tmpls0", ":My Amino Acids:path/to/your/amino/acids.mrv");

But what do we put in that MRV file? Well, what you could do is modify the attached allAminoAcidsTemplate.mrv file, which contains an MRV file with all the amino acids (this MRV file is a valid MarvinSketch template library, and you can import it in MarvinSketch by going to "Insert > Template Library" and then clicking on the book with the green "plus" icon). Then, you could open these up in MarvinView and modify the atoms by right-clicking on an atom, going to "Edit Properties", then add a "Name" property with the appropriate name as a value. Then, once you've finished working on an amino acid, save that atom as a MRV file, and replace your template with the new version of the amino acids with the atom names attached.

I haven't actually tried this, so I imagine complications may arise.

Of course, the more proper way to do this would be to run Morgan's algorithm on the amino acid as well, and figure out atom names that way. As a bonus, this method will also identify equivalent atoms in an amino acid too.

<?xml version="1.0" encoding="UTF-8"?>
<cml version="ChemAxon file format v6.0, generated by v6.0.0">
<MDocument>
<MChemicalStruct>
<molecule title="L-alanine " molID="m1" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6" elementType="N C C C O O" x2="1.5399999999999987 2.3099999999999996 3.8499999999999996 1.54 2.3100000000000005 0.0" y2="2.667358243656071 1.3336791218280357 1.3336791218280368 0.0 -1.3336791218280353 0.0" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>W</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a2 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a4 a6" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-arginine" molID="m2" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12" elementType="N C C C C N C N N C O O" x2="13.598395609140175 13.598395609140175 12.264716487312139 10.931037365484105 9.597358243656068 8.263679121828034 6.929999999999998 5.596320878171962 6.929999999999996 14.93207473096821 16.265753852796244 14.93207473096821" y2="6.311037365484106 4.771037365484106 4.001037365484105 4.771037365484107 4.001037365484106 4.771037365484107 4.001037365484107 4.771037365484108 2.461037365484107 4.001037365484105 4.771037365484105 2.4610373654841053" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a6 a7" order="1" />
<bond atomRefs2="a7 a8" order="1" />
<bond atomRefs2="a7 a9" order="2" />
<bond atomRefs2="a2 a10" order="1" />
<bond atomRefs2="a10 a11" order="1" />
<bond atomRefs2="a10 a12" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-asparagine " molID="m3" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9" elementType="N C C C N O C O O" x2="3.6436791218280353 3.643679121828035 2.309999999999999 0.9763208781719639 -0.35735824365607183 0.976320878171963 4.97735824365607 6.311037365484106 4.977358243656069" y2="3.6436791218280358 2.1036791218280357 1.3336791218280357 2.103679121828036 1.3336791218280362 3.643679121828036 1.3336791218280353 2.1036791218280353 -0.20632087817196476" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a4 a6" order="2" />
<bond atomRefs2="a2 a7" order="1" />
<bond atomRefs2="a7 a8" order="1" />
<bond atomRefs2="a7 a9" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-aspartic acid " molID="m4" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9" elementType="N C C C O O C O O" x2="3.6436791218280353 3.643679121828035 2.309999999999999 0.9763208781719639 -0.35735824365607183 0.976320878171963 4.97735824365607 6.311037365484106 4.977358243656069" y2="3.6436791218280358 2.1036791218280357 1.3336791218280357 2.103679121828036 1.3336791218280362 3.643679121828036 1.3336791218280353 2.1036791218280353 -0.20632087817196476" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a4 a6" order="2" />
<bond atomRefs2="a2 a7" order="1" />
<bond atomRefs2="a7 a8" order="1" />
<bond atomRefs2="a7 a9" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-cysteine " molID="m5" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7" elementType="N C C S C O O" x2="3.8499999999999996 2.3099999999999996 1.5399999999999987 2.3099999999999983 1.54 2.3100000000000005 0.0" y2="1.3336791218280368 1.3336791218280357 2.667358243656071 4.001037365484107 0.0 -1.3336791218280353 0.0" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a2 a5" order="1" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a5 a7" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-glutamine " molID="m6" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10" elementType="N C C C C N O C O O" x2="8.621037365484105 8.621037365484105 7.287358243656069 5.953679121828034 4.619999999999998 3.2863208781719635 4.619999999999997 9.95471648731214 11.288395609140176 9.95471648731214" y2="4.977358243656071 3.437358243656071 2.667358243656071 3.4373582436560715 2.6673582436560714 3.4373582436560723 1.1273582436560718 2.6673582436560705 3.4373582436560706 1.1273582436560705" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a5 a7" order="2" />
<bond atomRefs2="a2 a8" order="1" />
<bond atomRefs2="a8 a9" order="1" />
<bond atomRefs2="a8 a10" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-glutamic acid" molID="m7" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10" elementType="N C C C C O O C O O" x2="8.621037365484105 8.621037365484105 7.287358243656069 5.953679121828034 4.619999999999998 3.2863208781719635 4.619999999999997 9.95471648731214 11.288395609140176 9.95471648731214" y2="4.977358243656071 3.437358243656071 2.667358243656071 3.4373582436560715 2.6673582436560714 3.4373582436560723 1.1273582436560718 2.6673582436560705 3.4373582436560706 1.1273582436560705" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a5 a7" order="2" />
<bond atomRefs2="a2 a8" order="1" />
<bond atomRefs2="a8 a9" order="1" />
<bond atomRefs2="a8 a10" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-glycine" molID="m8">
<propertyList>
<property dictRef="type" title="type">
<scalar>common</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5" elementType="N C C O O" x2="1.5399999999999987 2.3099999999999996 1.54 2.3100000000000005 0.0" y2="2.667358243656071 1.3336791218280357 0.0 -1.3336791218280353 0.0" />
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a3 a5" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-histidine" molID="m9" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11" elementType="N C C C C N C N C O O" x2="-1.6075441648932682 -0.2738650430652325 1.0598140787628032 1.0598140787628036 2.3057002501002226 1.8298140787628048 0.2898140787628045 -0.18607209257461532 -0.2738650430652325 -1.6075441648932682 1.0598140787628034" y2="3.6200022448621403 4.39000224486214 3.620002244862141 2.0800022448621416 1.1748129563317335 -0.2898140787628034 -0.2898140787628045 1.174812956331732 5.930002244862141 6.700002244862141 6.700002244862141" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="2" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a6 a7" order="2" />
<bond atomRefs2="a7 a8" order="1" />
<bond atomRefs2="a4 a8" order="1" />
<bond atomRefs2="a2 a9" order="1" />
<bond atomRefs2="a9 a10" order="1" />
<bond atomRefs2="a9 a11" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-isoleucine" molID="m10" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9" elementType="C C C C C N C O O" x2="-0.35735824365607183 0.9763208781719639 2.309999999999999 2.3099999999999987 3.643679121828035 3.6436791218280353 4.97735824365607 6.311037365484106 4.977358243656069" y2="1.3336791218280362 2.103679121828036 1.3336791218280357 -0.20632087817196432 2.1036791218280357 3.6436791218280358 1.3336791218280353 2.1036791218280353 -0.20632087817196476" />
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1">
<bondStereo>W</bondStereo>
</bond>
<bond atomRefs2="a3 a5" order="1" />
<bond atomRefs2="a5 a6" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a5 a7" order="1" />
<bond atomRefs2="a7 a8" order="1" />
<bond atomRefs2="a7 a9" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-leucine" molID="m11" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9" elementType="C C C C C N C O O" x2="-0.35735824365607183 0.9763208781719639 0.976320878171963 2.309999999999999 3.643679121828035 3.6436791218280353 4.97735824365607 6.311037365484106 4.977358243656069" y2="1.3336791218280362 2.103679121828036 3.643679121828036 1.3336791218280357 2.1036791218280357 3.6436791218280358 1.3336791218280353 2.1036791218280353 -0.20632087817196476" />
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a2 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a5 a6" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a5 a7" order="1" />
<bond atomRefs2="a7 a8" order="1" />
<bond atomRefs2="a7 a9" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-lysine" molID="m12" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10" elementType="N C C C C C N C O O" x2="1.9526417563439278 3.2863208781719635 4.619999999999998 5.953679121828034 7.287358243656069 8.621037365484105 8.621037365484105 9.95471648731214 11.288395609140176 9.95471648731214" y2="2.6673582436560723 3.4373582436560723 2.6673582436560714 3.4373582436560715 2.667358243656071 3.437358243656071 4.977358243656071 2.6673582436560705 3.4373582436560706 1.1273582436560705" />
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a6 a7" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a6 a8" order="1" />
<bond atomRefs2="a8 a9" order="1" />
<bond atomRefs2="a8 a10" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-methionine" molID="m13" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9" elementType="C S C C C N C O O" x2="3.2863208781719635 4.619999999999998 5.953679121828034 7.287358243656069 8.621037365484105 8.621037365484105 9.95471648731214 11.288395609140176 9.95471648731214" y2="3.4373582436560723 2.6673582436560714 3.4373582436560715 2.667358243656071 3.437358243656071 4.977358243656071 2.6673582436560705 3.4373582436560706 1.1273582436560705" />
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a5 a6" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a5 a7" order="1" />
<bond atomRefs2="a7 a8" order="1" />
<bond atomRefs2="a7 a9" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-phenylalanine" molID="m14" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12" elementType="N C C C C C C C C C O O" x2="-1.3336791218280317 4.440892098500626E-15 1.333679121828039 1.3336791218280368 2.6673582436560745 2.6673582436560714 1.3336791218280346 0.0 1.3322676295501878E-15 7.105427357601002E-15 -1.3336791218280273 1.333679121828044" y2="3.850000000000005 4.620000000000003 3.850000000000002 2.310000000000002 1.5399999999999978 -2.220446049250313E-15 -0.77 1.9984014443252818E-15 1.5400000000000018 6.160000000000004 6.930000000000007 6.9300000000000015" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="2" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a6 a7" order="2" />
<bond atomRefs2="a7 a8" order="1" />
<bond atomRefs2="a8 a9" order="2" />
<bond atomRefs2="a4 a9" order="1" />
<bond atomRefs2="a2 a10" order="1" />
<bond atomRefs2="a10 a11" order="1" />
<bond atomRefs2="a10 a12" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-proline" molID="m15" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8" elementType="N C C C C C O O" x2="2.305700250100223 1.8298140787628037 0.28981407876280363 -0.18607209257461577 1.0598140787628036 1.0598140787628039 -0.27386504306523163 2.39349320059084" y2="1.1748129563317322 -0.28981407876280374 -0.28981407876280396 1.1748129563317327 2.0800022448621416 3.6200022448621416 4.390002244862141 4.39000224486214" />
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a1 a5" order="1" />
<bond atomRefs2="a5 a6" order="1">
<bondStereo>W</bondStereo>
</bond>
<bond atomRefs2="a6 a7" order="1" />
<bond atomRefs2="a6 a8" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-serine" molID="m16" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7" elementType="N C C O C O O" x2="3.8499999999999996 2.3099999999999996 1.5399999999999987 2.3099999999999983 1.54 2.3100000000000005 0.0" y2="1.3336791218280368 1.3336791218280357 2.667358243656071 4.001037365484107 0.0 -1.3336791218280353 0.0" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a2 a5" order="1" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a5 a7" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-threonine" molID="m17" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8" elementType="C C O C N C O O" x2="2.3099999999999983 1.5399999999999987 -1.3322676295501878E-15 2.3099999999999996 3.8499999999999996 1.54 2.3100000000000005 0.0" y2="4.001037365484107 2.667358243656071 2.667358243656071 1.3336791218280357 1.3336791218280368 0.0 -1.3336791218280353 0.0" />
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a4" order="1" />
<bond atomRefs2="a4 a5" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a4 a6" order="1" />
<bond atomRefs2="a6 a7" order="1" />
<bond atomRefs2="a6 a8" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-tryptophan" molID="m18" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15" elementType="N C C C C N C C C C C C C O O" x2="-3.280575045059234 -2.250113911266592 -0.7437666061365313 -0.2678804347991126 -1.173069723329521 -0.2678804347991135 1.1967466002954232 1.1967466002954241 2.5304257221234607 3.864104843951496 3.8641048439514947 2.530425722123459 -2.726000082604011 -4.232347387734071 -1.6955389488113686" y2="0.5834282635591262 1.7278712947943133 1.4076872909349638 -0.05693974415957259 -1.302825915496991 -2.5487120868344113 -2.0728259154969924 -0.5328259154969925 0.2371740845030097 -0.5328259154969918 -2.072825915496992 -2.842825915496992 3.1924983298888505 3.5126823337482 4.336941361124037" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="2" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a6 a7" order="1" />
<bond atomRefs2="a7 a8" order="2" />
<bond atomRefs2="a4 a8" order="1" />
<bond atomRefs2="a8 a9" order="1" />
<bond atomRefs2="a9 a10" order="2" />
<bond atomRefs2="a10 a11" order="1" />
<bond atomRefs2="a11 a12" order="2" />
<bond atomRefs2="a7 a12" order="1" />
<bond atomRefs2="a2 a13" order="1" />
<bond atomRefs2="a13 a14" order="1" />
<bond atomRefs2="a13 a15" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-tyrosine" molID="m19" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13" elementType="N C C C C C C O C C C O O" x2="-1.3336791218280317 4.440892098500626E-15 1.333679121828039 1.3336791218280368 2.6673582436560745 2.6673582436560714 1.3336791218280346 1.3336791218280317 0.0 1.3322676295501878E-15 7.105427357601002E-15 -1.3336791218280273 1.333679121828044" y2="3.850000000000005 4.620000000000003 3.850000000000002 2.310000000000002 1.5399999999999978 -2.220446049250313E-15 -0.77 -2.31 1.9984014443252818E-15 1.5400000000000018 6.160000000000004 6.930000000000007 6.9300000000000015" />
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="2" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a6 a7" order="2" />
<bond atomRefs2="a7 a8" order="1" />
<bond atomRefs2="a7 a9" order="1" />
<bond atomRefs2="a9 a10" order="2" />
<bond atomRefs2="a4 a10" order="1" />
<bond atomRefs2="a2 a11" order="1" />
<bond atomRefs2="a11 a12" order="1" />
<bond atomRefs2="a11 a13" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
<MDocument>
<MChemicalStruct>
<molecule title="L-valine " molID="m20" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8" elementType="C C C C N C O O" x2="2.3099999999999983 1.5399999999999987 -1.3322676295501878E-15 2.3099999999999996 3.8499999999999996 1.54 2.3100000000000005 0.0" y2="4.001037365484107 2.667358243656071 2.667358243656071 1.3336791218280357 1.3336791218280368 0.0 -1.3336791218280353 0.0" />
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a2 a4" order="1" />
<bond atomRefs2="a4 a5" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a4 a6" order="1" />
<bond atomRefs2="a6 a7" order="1" />
<bond atomRefs2="a6 a8" order="2" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
</cml>
<?xml version="1.0" ?>
<cml>
<MDocument>
<MChemicalStruct>
<molecule title="L-serine" molID="m1" absStereo="true">
<propertyList>
<property dictRef="type" title="type">
<scalar>systematic</scalar>
</property>
</propertyList>
<atomArray>
<atom id="a1" elementType="N" residueType="Ser"
x2="2.695000076293945" y2="4.054204682050381" />
<atom id="a2" elementType="C" residueType="Ser"
x2="3.4650000762939444" y2="2.720525560222345" />
<atom id="a3" elementType="C" residueType="Ser"
x2="2.6950000762939466" y2="1.3868464383943078" />
<atom id="a4" elementType="O" residueType="Ser" sgroupRef="sg1"
x2="1.1550000762939439" y2="1.386846438394306">
<scalar id="a13:prop1" title="Name" convention="marvin:atomprop" dataType="xsd:string" value="OG" />
</atom>
<atom id="a5" elementType="C" residueType="Ser"
x2="5.005000076293947" y2="2.720525560222346" />
<atom id="a6" elementType="O" residueType="Ser"
x2="5.775000076293947" y2="4.054204682050383" />
<atom id="a7" elementType="O" residueType="Ser"
x2="5.7750000762939475" y2="1.386846438394310" />
<atom id="a8" elementType="C"
x2="-3.575000047683716" y2="1.4300000071525574" />
<atom id="a9" elementType="C"
x2="-5.115000047683716" y2="1.4300000071525574" />
<atom id="a10" elementType="C"
x2="-6.203944490710999" y2="0.3410555641252744" />
<atom id="a11" elementType="O"
x2="-5.115000047683716" y2="-0.7478888789020088" />
<atom id="a12" elementType="O"
x2="-2.486055604656433" y2="0.3410555641252744" />
<atom id="a13" elementType="O" sgroupRef="sg1"
x2="-2.486055604656433" y2="2.5189444501798404" />
</atomArray>
<bondArray>
<bond atomRefs2="a2 a1" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a2 a5" order="1" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a5 a7" order="2" />
<bond atomRefs2="a8 a9" order="1" />
<bond atomRefs2="a9 a10" order="1" />
<bond atomRefs2="a10 a11" order="1" />
<bond atomRefs2="a8 a12" order="1" />
<bond atomRefs2="a8 a13" order="2" />
<bond atomRefs2="a13 a4" order="1" queryType="Any" />
</bondArray>
<molecule id="sg1" role="DataSgroup" molID="m2"
atomRefs="a13 a4" fieldName="Distance"
x="-1.7160556046564328" y="3.2889444501798404" unitsDisplayed="Unit displayed" context="Bond"
fieldData="2.5 - 2.9"
/>
</molecule>
</MChemicalStruct>
</MDocument>
</cml>
package org.pdb.query.simple.distance;
import java.io.File;
import java.io.IOException;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.List;
import org.apache.commons.io.FileUtils;
import org.biojava.bio.structure.Element;
import org.hibernate.StatelessSession;
import org.pdb.ormapping.util.HibernateUtils;
import chemaxon.jchem.db.CacheRegistrationUtil;
import chemaxon.jchem.db.JChemSearch;
import chemaxon.license.LicenseManager;
import chemaxon.license.LicenseProcessingException;
import chemaxon.sss.search.JChemSearchOptions;
import chemaxon.sss.search.MolSearch;
import chemaxon.sss.search.SearchException;
import chemaxon.struc.MolAtom;
import chemaxon.struc.Molecule;
import chemaxon.util.ConnectionHandler;
import com.ulyssecarion.pdb.distances.DistanceDataTreeDirSearcher;
import com.ulyssecarion.pdb.distances.DistanceQuery;
import com.ulyssecarion.pdb.distances.DistanceQuery.DistanceQueryBuilder;
import com.ulyssecarion.pdb.distances.DistanceResult;
import com.ulyssecarion.pdb.morgan.graph.ChemAxonGraphMaker;
import com.ulyssecarion.pdb.morgan.graph.GraphDriver.ChemAxonAtom;
import com.ulyssecarion.pdb.morgan.graph.GraphDriver.PDBAtom;
import com.ulyssecarion.pdb.morgan.graph.Morgan;
import com.ulyssecarion.pdb.morgan.graph.PDBGraphMaker;
public class DistanceQuerySearchTool {
static {
try {
LicenseManager
.setLicenseFile("/Users/ulysse/Documents/workspace/pdbinabox/licences/chemaxon/license.cxl");
} catch (LicenseProcessingException e) {
e.printStackTrace();
}
initCache();
}
public static void main(String[] args) throws Exception {
String s = null;
try {
s = FileUtils.readFileToString(new File(
"/Users/ulysse/marvin/distanceQueryFinal.mrv"));
} catch (IOException e) {
e.printStackTrace();
}
runPlainSSS(s);
}
public static void runPlainSSS(String marvinQuery) {
MarvinQueryParser marvinQueryParser = new MarvinQueryParser(marvinQuery);
Molecule ligand = marvinQueryParser.getLigand();
DistanceQueryBuilder queryBuilder = marvinQueryParser.getQueryBuilder();
MolAtom origin = marvinQueryParser.getOrigin();
int indexOfOrigin = ligand.indexOf(origin);
JChemSearch chemSearch = getSearcher();
chemSearch.setQueryStructure(ligand);
try {
chemSearch.run();
} catch (Exception e) {
e.printStackTrace();
}
Molecule[] molecules = null;
try {
molecules = chemSearch.getHitsAsMolecules(chemSearch.getResults(),
null, null, null);
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Number of hits: " + molecules);
for (Molecule subStructMatch : molecules) {
if (subStructMatch.getName().length() > 3) {
System.out.println("Skipping " + subStructMatch.getName()
+ "; it doesn't look like a ligand.");
continue;
}
System.out.println(subStructMatch.getName());
MolSearch molSearch = new MolSearch();
molSearch.getSearchOptions().setKeepQueryOrder(true);
molSearch.setQuery(ligand);
molSearch.setTarget(subStructMatch);
long start = System.nanoTime();
List<PDBAtom> pdbGraph = PDBGraphMaker.getGraph(subStructMatch
.getName());
List<ChemAxonAtom> cmxGraph = ChemAxonGraphMaker
.getGraph(subStructMatch);
long stop = System.nanoTime();
System.out.println("GRAPH: " + (stop - start));
start = System.nanoTime();
Morgan.morganify(pdbGraph);
Morgan.morganify(cmxGraph);
stop = System.nanoTime();
System.out.println("MORGAN: " + (stop - start));
start = System.nanoTime();
int[][] hits = null;
try {
hits = molSearch.findAll();
} catch (SearchException e) {
e.printStackTrace();
}
stop = System.nanoTime();
System.out.println("MOLSEARCH: " + (stop - start));
for (int i = 0; i < hits.length; i++) {
int[] hit = hits[i];
int atomOfInterestCDID = hit[indexOfOrigin];
MolAtom atomOfInterestInLigand = subStructMatch
.getAtom(atomOfInterestCDID);
List<String> potentialAtomNames = Morgan.getAtomNamesOfMolAtom(
atomOfInterestInLigand, cmxGraph, pdbGraph);
for (String originAtomName : potentialAtomNames) {
// everything is preconfigured except the info about the
// origin
queryBuilder.originAtom(originAtomName);
queryBuilder.originElement(Element.valueOfIgnoreCase(origin
.getSymbol()));
queryBuilder.originGroup(subStructMatch.getName());
DistanceQuery query = queryBuilder.build();
start = System.nanoTime();
List<DistanceResult> results = DistanceDataTreeDirSearcher
.search(query);
stop = System.nanoTime();
System.out.println("DISTANCE: " + (stop - start));
for (DistanceResult result : results) {
System.out.println(result);
}
}
}
}
}
/*
* All code after this is boilerplate code for getting connections with
* ChemAxon to work correctly.
*/
@SuppressWarnings("deprecation")
private static JChemSearch getSearcher() {
StatelessSession session = HibernateUtils.getStatelessSession();
ConnectionHandler connectionHandler = new ConnectionHandler();
connectionHandler.setConnection(session.connection());
JChemSearch searcher = new JChemSearch();
searcher.setConnectionHandler(connectionHandler);
searcher.setStructureTable("jchem_named_ligands");
searcher.setRunMode(JChemSearch.RUN_MODE_SYNCH_COMPLETE);
JChemSearchOptions searchOptions = new JChemSearchOptions();
searchOptions.setSearchType(JChemSearch.SUBSTRUCTURE);
searcher.setSearchOptions(searchOptions);
return searcher;
}
/**
* During a thunderous storm did one day the clouds part, and did descend
* the Great Almighty, and He commandeth the programmer:
*
* " Ye shall use the initCache() method, and ye shall not question it, for
* its majesty is beyond the understanding of mere mortals! "
*
* And thus did the ChemAxon API work, and there was much rejoicing.
*
* TL;DR: DO NOT TOUCH THIS METHOD. IT JUST WORKS.
*/
private static void initCache() {
try {
String identifier = "unique_cache_identifier";
CacheRegistrationUtil.setPermanentCacheID(identifier);
StatelessSession sess = HibernateUtils.getStatelessSession();
Connection conn = sess.connection();
conn.setAutoCommit(true);
ConnectionHandler ch = new ConnectionHandler();
ch.setConnection(conn);
CacheRegistrationUtil cru = new CacheRegistrationUtil(ch);
cru.unRegisterCache();
cru.registerCache();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
package org.pdb.query.simple.distance;
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
import org.biojava.bio.structure.Element;
import chemaxon.formats.MolExporter;
import chemaxon.formats.MolFormatException;
import chemaxon.formats.MolImporter;
import chemaxon.struc.MolAtom;
import chemaxon.struc.MolBond;
import chemaxon.struc.Molecule;
import chemaxon.struc.Sgroup;
import chemaxon.struc.sgroup.DataSgroup;
import com.ulyssecarion.pdb.distances.DistanceQuery;
import com.ulyssecarion.pdb.distances.DistanceQuery.DistanceQueryBuilder;
public class MarvinQueryParser {
/**
* The field name users should attach to their distance bond. This value is
* case-insensitive.
*/
private static final String DISTANCE_FIELD_NAME = "distance";
private DistanceQueryBuilder queryBuilder;
private Molecule ligand;
private MolAtom origin;
public static void main(String[] args) {
String s = null;
try {
s = FileUtils.readFileToString(new File(
"/Users/ulysse/marvin/distanceQueryFinal.mrv"));
} catch (IOException e) {
e.printStackTrace();
}
MarvinQueryParser mqp = new MarvinQueryParser(s);
Molecule ligand = mqp.getLigand();
MolAtom origin = mqp.getOrigin();
DistanceQuery dq = mqp.getQueryBuilder().build();
try {
System.out.println(MolExporter.exportToFormat(ligand, "smiles"));
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(origin);
System.out.println(dq);
System.out.println(ligand.indexOf(origin));
}
public MarvinQueryParser(String marvin) {
Molecule m = null;
try {
m = MolImporter.importMol(marvin, "mrv");
} catch (MolFormatException e) {
e.printStackTrace();
}
getBuilderWithTarget(m);
getLigand(m);
}
public DistanceQueryBuilder getQueryBuilder() {
return queryBuilder;
}
public Molecule getLigand() {
return ligand;
}
public MolAtom getOrigin() {
return origin;
}
/*
* Sets the origin and queryBuilder.
*/
private void getBuilderWithTarget(Molecule m) {
queryBuilder = new DistanceQueryBuilder();
for (Sgroup sGroup : m.getSgroupArray()) {
if (sGroup.getType() == Sgroup.ST_DATA) {
DataSgroup dataSGroup = (DataSgroup) sGroup;
if (dataSGroup.getFieldName().equalsIgnoreCase(
DISTANCE_FIELD_NAME)) {
if (dataSGroup.getAtomCount() != 2) {
throw new RuntimeException(
"Distance query should be between exactly two atoms!");
}
MolAtom a = dataSGroup.getAtom(0);
MolAtom b = dataSGroup.getAtom(1);
String nameA = (String) a.getProperty("Name");
String nameB = (String) b.getProperty("Name");
if (nameA == null && nameB == null || nameA != null
&& nameB != null) {
throw new RuntimeException(
"Distance query should be between a ligand and an amino acid!");
}
MolAtom amino = nameA == null ? b : a;
origin = amino == a ? b : a;
double[] minMax = getMinMax(dataSGroup);
queryBuilder.minDistance(minMax[0]);
queryBuilder.maxDistance(minMax[1]);
String residue = Molecule.residueSymbolOf(
amino.getResidueType()).toUpperCase();
queryBuilder.targetGroup(residue);
Element element = Element.valueOfIgnoreCase(amino
.getSymbol());
queryBuilder.targetElement(element);
String atomName = (String) amino.getProperty("Name");
queryBuilder.targetAtom(atomName);
}
}
}
}
/*
* Sets the ligand.
*/
private void getLigand(Molecule m) {
ligand = new Molecule();
for (MolAtom atom : m.getAtomArray()) {
if (atom.getResidueType() == 0) {
ligand.add(atom);
}
}
for (MolBond bond : m.getBondArray()) {
if (ligand.contains(bond.getAtom1())
&& ligand.contains(bond.getAtom2())) {
ligand.add(bond);
}
}
}
private static double[] getMinMax(DataSgroup distance) {
String[] data = distance.getData().split(" ");
double min = Double.parseDouble(data[0]);
double max = Double.parseDouble(data[data.length - 1]);
return new double[] { min, max };
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment