Skip to content

Instantly share code, notes, and snippets.

View aschreyer's full-sized avatar

Adrian Schreyer aschreyer

View GitHub Profile
@aschreyer
aschreyer / chembl10.sql
Created August 9, 2011 07:48
PostgreSQL version of the ChEMBL10 database schema
--
-- PostgreSQL database dump
--
-- Dumped from database version 9.0.4
-- Dumped by pg_dump version 9.0.1
-- Started on 2011-07-22 13:30:36
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
@aschreyer
aschreyer / ChEMBL11.sql
Created August 12, 2011 08:25
PostgreSQL version of the ChEMBL11 database schema
--
-- PostgreSQL database dump
--
-- Dumped from database version 9.0.4
-- Dumped by pg_dump version 9.0.1
-- Started on 2011-08-12 09:23:17
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
@aschreyer
aschreyer / ligand-buried-surface-areas.py
Created August 22, 2011 12:20
Binding site atom surface area contributions
from credoscript import *
# FETCH PDB ENTRY 3CS9
s = StructureAdaptor().fetch_by_pdb('3cs9')
s.title
>>> 'Human ABL kinase in complex with nilotinib'
# 3CS9 CONTAINS 4 BIOLOGICAL ASSEMBLIES
s.Biomolecules
>>> {1: <Biomolecule(1)>,
@aschreyer
aschreyer / credoscript-chembl-activitycliffs.py
Created September 14, 2011 15:33
ChEMBL extension in credoscript
from credoscript.extensions import chembl
m = ma.fetch_by_molregno(410891)
# get all activity cliffs of this molecule
m.get_activity_cliffs(chembl.ActivityCliff.sali>10)
>>> [<ActivityCliff(455590, 2030543, 2030536)>,
<ActivityCliff(455592, 2030585, 2030582)>,
<ActivityCliff(455594, 2030630, 2030623)>,
sift = (func.sum(cast(subquery.c.credo_contacts_is_covalent, INTEGER)),
func.sum(cast(subquery.c.credo_contacts_is_vdw_clash, INTEGER)),
func.sum(cast(subquery.c.credo_contacts_is_vdw, INTEGER)),
func.sum(cast(subquery.c.credo_contacts_is_proximal, INTEGER)),
func.sum(cast(subquery.c.credo_contacts_is_hbond, INTEGER)),
func.sum(cast(subquery.c.credo_contacts_is_weak_hbond, INTEGER)),
func.sum(cast(subquery.c.credo_contacts_is_xbond, INTEGER)),
func.sum(cast(subquery.c.credo_contacts_is_ionic, INTEGER)),
func.sum(cast(subquery.c.credo_contacts_is_metal_complex, INTEGER)),
func.sum(cast(subquery.c.credo_contacts_is_aromatic, INTEGER)),
@aschreyer
aschreyer / rdkit-bfp-gist.sql
Created October 24, 2011 15:07
RDKit PostgreSQL cartridge KNN-GIST benchmark
# select molregno, tanimoto_sml(morganbv_fp('CC(C(=O)c1ccccc1)C[NH+]1CC[NH+](CC(c2ccc(F)cc2)[NH+]2CC[NH+](C)CC2)CC1',2), circular_fp) as tanimoto
from chembl.fps
where morganbv_fp('CC(C(=O)c1ccccc1)C[NH+]1CC[NH+](CC(c2ccc(F)cc2)[NH+]2CC[NH+](C)CC2)CC1',2) % circular_fp
order by 2 desc
limit 10;
molregno | tanimoto
----------+-------------------
10464 | 1
10451 | 0.869565217391304
@aschreyer
aschreyer / oefp-gist-screening-func.sql
Created November 1, 2011 14:10
GIST index for OpenEye fingerprints
CREATE OR REPLACE FUNCTION tanimoto_sim_query(smiles TEXT)
RETURNS TABLE (molregno INTEGER, similarity REAL) AS
$BODY$
SELECT molregno,
-- Tanimoto similarity
fp % make_circular_fp($1)
FROM fps
-- Boolean operator returning true if the Tanimoto similarity
-- is above the user-defined limit
WHERE fp %? make_circular_fp($1)
@aschreyer
aschreyer / cifstore-entry-chemcomps.py
Created March 20, 2012 14:49
Querying CIFStore with Python
for res in db.structures.find({"entry.id": "2P33"}, {'pdbx_entity_nonpoly.comp_id': 1}):
print res
{u'pdbx_entity_nonpoly': [{u'comp_id': u'J07'}, {u'comp_id': u'HOH'}], u'_id': u'2P33'}
@aschreyer
aschreyer / drugbank-invalid-smiles.csv
Created May 1, 2012 10:47
SMILES strings from drugbank.xml that cannot be parsed with OEChem
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
DB03304;NCc1cnc2N=C(N)NC(=O)c12
DB02543;[O-]C(=O)c1cccn1
DB03254;c1ncc(n1)C1=CC=CC=C1
DB01653;C[C@@H](O)[C@@H](N)C1=N\C(=C/c2cnc3ccccc23)C(=O)N1CCO
DB02931;CC(C)(CO[P@](O)(=O)O[P@@](O)(=O)OC[C@@H]1O[C@H]([C@H](O)[C@H]1OP(O)(O)=O)N1C=NC2=C1N=CN=C2N)[C@@H](O)C(=O)NCCC(=O)NCCSCC(=O)NCCc1cnc2ccccc12
DB01876;NC(=N)c1ccc2nc(nc2c1)C(=O)c1nc2ccc(cc2n1)C(N)=N
DB04534;[O-][N+](=O)c1ccc2nncc2c1
DB01912;c1ccn2[Pt]3n4ccccc4-c4cccc(-c2c1)n34
DB03164;C[n+]1cnc2ncnc2c1N
DB02557;CC(C)C[C@@H](N[P@](O)(=O)O[C@H]1O[C@@H](C)[C@H](O)[C@@H](O)[C@@H]1O)C(=O)N[C@H](Cc1cnc2ccccc12)C(O)=O
@aschreyer
aschreyer / 1M48-FRG-A-301.pdb
Created May 23, 2012 11:57
Different bond orders in the ligands from PDB entry 1M48
COMPND INTERLEUKIN-2
CRYST1 50.470 58.020 93.080 90.00 90.00 90.00 P 21 21 21 8
HETATM 2060 C1 FRG A 301 0.168 6.479 -8.783 1.00 32.66 C
HETATM 2059 C2 FRG A 301 1.023 7.189 -7.948 1.00 29.42 C
HETATM 2058 C3 FRG A 301 1.016 8.567 -7.956 1.00 31.14 C
HETATM 2055 C4 FRG A 301 0.160 9.195 -8.824 1.00 28.64 C
HETATM 2056 C5 FRG A 301 -0.699 8.509 -9.669 1.00 33.72 C
HETATM 2057 C6 FRG A 301 -0.693 7.128 -9.658 1.00 31.32 C
HETATM 2054 C7 FRG A 301 0.114 10.638 -8.862 1.00 25.03 C
HETATM 2053 C8 FRG A 301 0.045 11.930 -8.932 1.00 26.20 C