Skip to content

Instantly share code, notes, and snippets.

asad /
Created May 28, 2017 11:01 — forked from sujaikumar/
UniRef90 protein blast database with taxon IDs


  • To create UniRef90 protein databases for NCBI blast and Diamond Blast
  • To create a tab delimited taxid mapping file with two columns : sequenceID\tNCBITaxonID


Download the uniref90 xml file first (warning - this is ~15 GB, will take a while)

asad / FPTestCases
Created July 31, 2012 20:57
Example molecules for clustering
mol.1 "C1=CC2=CC=C3C4=CC5=CC6=CC=CC=C6C=C5C=C4C=CC3=C2C=C1"
mol.2 "C1=CC2=CC3=CC=CC=C3C=C2C=C1"
mol.3 "C1=CC2=CC=CC=C2C=C1"
mol.4 "C1=CC=CC=C1"
mol.5 "C1CCCCC1"
mol.6 "C1CCC=CC1"
mol.7 "C1CC=CC=C1"
mol.8 "CCCCCC"
mol.9 "CCCCC(C)C"
mol.10 "CC1=CC(C)=CC=C1"
asad / Nina's example
Created July 23, 2012 10:37
CDK Fingerprinter files on these molecules
asad / SPNina
Created July 23, 2012 02:19
SP based fingerprint output
Nina's example for CDK Hashed Fingerprinter failure.
asad / smsd-cdk compile
Created July 26, 2011 09:10
smsd cdk doc errors
asad:cdksmsdgithub Asad$ ant -Dmodule=smsd qa-module
Buildfile: /users/Asad/Software/GITROOT/cdksmsdgithub/build.xml
asad / AtomTypeTest
Created June 9, 2011 08:28
Atom typing
KEGG ID Atom Count Formula Failed Cases
C00023.mol 1 Fe Fe,
C00032.mol 43 C34FeN4O4 Fe,
C00034.mol 1 Mn Mn,
C00038.mol 1 Zn Zn,
C00070.mol 1 Cu Cu,
C00125.mol 65 C42FeN8O8S2R4 Fe,
C00126.mol 65 C42FeN8O8S2R4 Fe,
C00150.mol 1 Mo Mo,
C00194.mol 109 C72CoN18O17P Co,
asad / UnionTest
Created April 23, 2011 08:13
Unique union of the mols
package tools;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import org.junit.Test;
import org.openscience.cdk.AtomContainer;
import org.openscience.cdk.Bond;
import org.openscience.cdk.DefaultChemObjectBuilder;
asad / 1.mol
Created April 21, 2011 17:02
rBLAST 02051100062D
31 33 0 0 0 0 999 V2000
-10.5253 -1.9067 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
-11.7515 -0.8026 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
-11.6905 -1.9677 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-11.1384 -1.3546 0.0000 P 0 0 0 0 0 0 0 0 0 0 0 0
-10.5864 -0.7415 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-9.7794 -0.9131 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
asad / Connected
Created April 21, 2011 17:00
SDF File
CDK 0421111417
21 20 0 0 0 0 0 0 0 0999 V2000
4.7690 -1.0005 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
6.0010 -0.1345 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
5.1350 1.3655 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.5369 -2.1345 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
2.2690 1.5976 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
4.2690 -2.1345 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
asad / arom.sdf
Created March 1, 2011 20:03
arom molecules for MCS calculation
7 7 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0
0.0000 0.0000 0.0000 I 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0