Skip to content

Instantly share code, notes, and snippets.

@datablend
Created January 29, 2013 17:53
Show Gist options
  • Save datablend/4666154 to your computer and use it in GitHub Desktop.
Save datablend/4666154 to your computer and use it in GitHub Desktop.
RandomAccessMDLReader reader = new RandomAccessMDLReader(new File(...));
EncodingFingerprint fingerprinter = new Encoding2DMolprint();
// We will use a pipeline in order to speedup the persisting process
Pipeline p = jedis.pipelined();
// Iterate the compounds one by one
for (int i = 0; i < reader.getSize(); i++) {
// Retrieve the molecule and the fingerprints for this molecule
Molecule molecule = reader.getMol(i);
FeatureMap fingerprints = new FeatureMap(fingerprinter.getFingerprint(molecule));
// Retrieve some of the compound properties we want to use later on
String compound_cid = (String)molecule.getProperty("PUBCHEM_COMPOUND_CID");
// Iterate the fingerprints
for (IFeature fingerprint : fingerprints.getKeySet()) {
// Check whether we already encountered the feature and create accordingly (
String thefeaturestring = fingerprint.featureToString();
if (!fingerprintlist.contains(thefeaturestring)) {
fingerprintlist.add(thefeaturestring);
}
// Get the index of the fingerprint
int fingerprintindex = fingerprintlist.indexOf(thefeaturestring);
// Increment the weight of this fingerprint (number of occurences)
p.incr(fingerprintindex + ":w");
// Create the inverted indexes
// Add the fingerprint to the set of fingerprints of this compound
p.sadd((compound_cid + ":f"), fingerprintindex + "");
// Add the compound to the set of compounds of this fingerprint
p.sadd(fingerprintindex + ":c", compound_cid + "");
}
// Sync the changes
p.sync();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment