Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Tweets from 7th Sheffield Conference on Chemoinformatics #ShefChem16
nathanbroo @WendyAnneWarr @baoilleach @mvkrier @rguha awesome! We have a go! #ShefChem16. Looking forward to it!
dgelemi @WendyAnneWarr @nathanbroon @baoilleach @mvkrier @rguha nice. See you at #ShefChem16 on Monday
nathanbroo Hashtag for Sheffield #Chemoinformatics conference is #ShefChem16
http://cisrg.shef.ac.uk/shef2016 
#CompChem #RealTimeChempic.twitter.com/8J2fU9eZwZ
conViktion Sorry to have to miss out on #ShefChem16.
I hope it's a great event!
nathanbroo Using #ShefChem16 for Sheffield Chemoinformatics conference. @OpenEyeSoftware @CCG_MOE @ccdc_cambridge @ChEMBL @3dsBIOVIA @SimulationsPlus
mvkrier #ShefChem16 here we come! I'm transiting via Manchester on Monday morning. Anyone else? https://twitter.com/nathanbroon/status/749191728956071936
nathanbroo Using #ShefChem16 for Sheffield Chemoinformatics conference. @cressetgroup @nmsoftware @MolInformatics @JCIM_ACS @Molomics @ChemDraw
nathanbroo Using #ShefChem16 for Sheffield Chemoinformatics conference. @macinchem @RSC_CICAG @GDCh_CIC
WendyAnneW @dgelemi @nathanbroon @baoilleach @mvkrier @rguha #ShefChem16 Driving from Cheshire on Monday morning. Looking forward to great conference
georgeisyo @dgelemi early start for me with the 07:24 train from St Pancras. #ShefChem16
ZINClick Moving from #medchemie in Dublin to Sheffield... -2 days #ShefChem16
nathanbroo People to follow for #ShefChem16: @WendyAnneWarr @drjohndholliday @baoilleach @DrJoshuaBox @mvkrier @georgeisyourman @dgelemi
nathanbroo I've setup a @lanyrd page for the #ShefChem16 conference so you can follow online: http://lanyrd.com/cgkxwk 
mvkrier Looking forward to hear from Christos Nicolaou at #ShefChem16 https://twitter.com/jcim_acs/status/748214524486914048
mvkrier #ShefChem16 Don't miss Andrew's talk (#15): Calvin Mooers & the Early History of Chemical Information https://twitter.com/mvkrier/status/628279344708866048
mvkrier @ssalentin Looking forward to your talk ( #19 ) at #ShefChem16 https://twitter.com/F1000Research/status/680767705704722432
mvkrier @GJPvWesten Looking forward to tweetup at #ShefChem16
drjohndhol Have a safe journey to #ShefChem16 - See you all tomorrow
georgeisyo On the train to my alma mater, representing the @ChEMBL group with @mmmnow. Can't believe this is my 4th Sheffield conference! #ShefChem16
dgelemi On my way to #ShefChem16, but will miss @georgeisyourman talk after delayed train and missed connection.
cressetgro Today we present on 'Examining the Diversity of Large Collections of Building Blocks in 3D' at Sheffield Chemoinformatics conf. #ShefChem16
GJPvWesten At #ShefChem16 , interesting line up of talks!
mvkrier Waiting for the train at MIA to take to #shefchem16
nathanbroo Nice relaxing breakfast and. Pe ready to enjoy #ShefChem16.
dr_greg_la This will be my first Sheffield meeting. Crazy.
#ShefChem16
EnricoBera Excited for the Sheffield meeting! #ShefChem16
nathanbroo Prof. Peter Willett is opening #ShefChem16 with a history of the conference over the years.
stewartadc Peter Willet is now opening #ShefChem16. I just gave wireless key to @WendyAnneWarr so I'm expecting some insightful live tweets!
nathanbroo First up is @georgeisyourman talking about @ChEMBL, @SureChEMBL & @Open_PHACTS. #ShefChem16
dr_greg_la And we kick off with @georgeisyourman talking about @ChEMBL and @SureChEMBL #ShefChem16
baoilleach #shefchem16 Sheffield Conference on Cheminformatics
WendyAnneW @stewartadcock #ShefChem16 Willett with 2 t's. Not insightful... Sorry
baoilleach #shefchem16 @georgeisyourman on SureChEMBl
WendyAnneW #ShefChem16 mining began with GENSAL
baoilleach #shefchem16 Patent knowledge may never appear elsewhere, and even if it does, there's a lag. First disclosure often in patent.
baoilleach #shefchem16 2007 SureChem. 2010 Macmillan purchase. 2013 donated to EMBL-EBI. 2015 bioannotations.
WendyAnneW #ShefChem16 SureChembl data now in Open PHACTS
baoilleach #shefchem16 Structures from text, from images and from provided MOL files. Fully automatic - no manual curation.
WendyAnneW #ShefChem16 SureCHEMBL tiresome syntax to tweet. I will cheat
baoilleach #shefchem16 Can do fuzzy substructure search, e.g. positional variation with halogen.
nathanbroo .@SureChEMBL allows structure and substructure search from automatically extract chemistry patent data. #ShefChem16
baoilleach #shefchem16 Not just exemplified structures, but reagents, intermediates, solvents. Don't yet handle Markush though.
WendyAnneW #ShefChem16 Markush not handled yet
baoilleach #shefchem16 18M unique structures. ~80k novel cmpds per month. 1-7 days from publication to entry into SureChEMBL.
WendyAnneW #ShefChem16 1 to 7 days only before patent appears in SureChembl
baoilleach #shefchem16 Comparison to SciFinder. Set of 47 patents chosen. 65% of SF cmpds were found by SureChEMBL. Missed were Markush, or from tables
WendyAnneW #ShefChem16 65% of scifinder cmpds are found in SureChembl
baoilleach #shefchem16 ..but these patents were quite old, with OCR mistakes, so 65% is lower bound.
nathanbroo Assessment of chemistry databases generated by automated extraction of chemical structures from patents. http://www.ncbi.nlm.nih.gov/m/pubmed/26457120/ #ShefChem16
WendyAnneW #ShefChem16 the missing 35% caused by OCR errors etc Or out of scope
baoilleach #shefchem16 Can download 'map file' of cmpd to patent info. Or via UniChem, and or a data client feed to keep up to date incrementally.
WendyAnneW #ShefChem16 unichem has 135 million struts
baoilleach #shefchem16 Bio-annotations made with Termite (SciBite) run over life-science patents. Finds genes, diseases, frequencies in different parts
baoilleach #shefchem16 ...also relevance score (0-3) to remove noise as inventors often mention large amounts of diseases/proteins, e.g. over 50
WendyAnneW #ShefChem16 oops. Structures
nathanbroo Bio annotation in @SureChEMBL using Termite text-mining engine #ShefChem16
baoilleach #shefchem16 ...same for compounds. Real scope versus large numbers of mentioned drugs.
baoilleach #shefchem16 To do this, use term frequencies but also the position of the entity, e.g. in title or caption or heading.
WendyAnneW #ShefChem16 relevance ranking by freq and position of occurrence of bio term etc. (SureChembl)
baoilleach #shefchem16 Compounds with busy chemical space around them are interesting. cf. Hatori et al, Tyrchan et al.
nathanbroo Using FCFP_4 fingerprints to identify nearest neighbours of relevance. Wonder if too fuzzy? #ShefChem16
WendyAnneW #ShefChem16 JCIM vol 52 p 1480
nathanbroo .@Open_PHACTS offers linked databases in an easy-to-use API. #ShefChem16
WendyAnneW #ShefChem16 DDT vol 17, 21
baoilleach #ShefChem16 All integrated with OpenPHACTS so can query via its API on disease, target, compound.
baoilleach #ShefChem16 Can infer links between cmpds/targets or disease/targets etc. using Open Phacts API. (Example shown)
nathanbroo Use cases of patent searching: from patent to genes, targets, diseases & structure. #ShefChem16
WendyAnneW #ShefChem16 SureChembl use case 1targets and diseases using openphacts API. Go to patent. Extract MCS
nathanbroo MCS is a good approximation for manually curated Markush structures in patents. #ShefChem16
baoilleach #ShefChem16 MCS of structures and Markush structure often very similar.
WendyAnneW #ShefChem16 use case 2. targets and indications. Search the patent one
baoilleach #ShefChem16 eluxadoline - 17 patents in the "patentome" - extract relevant targets and indications
nathanbroo Now possible to link structured data with @SureChEMBL & @Open_PHACTS #ShefChem16
baoilleach #ShefChem16 Uses: target validation/druggaility, novelty checking, add parmacology and pathway info via ChEMBL
nathanbroo Next steps: target validation, druggability, novelty checking & due diligence. @SureChEMBL #ShefChem16
WendyAnneW #ShefChem16 it is now poss to access the patent corpus by high thruput mining. Can do target validn. Do novelty check.
nathanbroo .@ChEMBL v22 will have bioactivity data mined from medicinal chemistry patents. #ShefChem16
baoilleach #ShefChem16 ChEMBL22! BindingDB info - only 5% in ChEMBL already. Many targets not well represented in ChEMBL. Patent activity info.
baoilleach #ShefChem16 @georgeisyourman has delivered what is surely the best talk of the conference so far
nathanbroo Q from the floor: comparison of @ChEMBL & IBM. IBM slightly better in recent study. #ShefChem16
nathanbroo Q: can database overlaps be used to estimate errors? Yes & no… insufficient reference sets. #ShefChem16
baoilleach #ShefChem16 BindingDB is manually extracting activity info from patents
nathanbroo Next up is Marc C. Nicklaus from @theNCI talking about SAVI: Synthetically Accesible Virtual Inventory #ShefChem16
baoilleach #ShefChem16 Marc Nicklaus on SAVI - synthetically acessible virtual inventory - Yuri Pevzner and WDI
mvkrier #ShefChem16 Now Marc Nicklaus from @theNCI about SAVI
baoilleach #ShefChem16 Screening database sizes - what size is enough? NCI ~0.3m, ChemNavigator 3m
DrJoshuaBo Great talk from @georgeisyourman on @SureChEMBL. Will this make companies even sneakier when drafting new patents? #ShefChem16
nathanbroo Medicinal chemistry relevant structure space is incredibly vast > 10^40 unique structures #ShefChem16
baoilleach #ShefChem16 SCSORS (semi-custom synthesis online request system) - attrition rate high - failures, low yields, too much time/money lost
baoilleach #ShefChem16 Not "what is a potential bioactive molecule? then make it", instead "what can I easily and cheaply make/get?"
baoilleach #ShefChem16 Let's try for a billion: some chemistry rules, inexpensive starting materials, cheminf engine
WendyAnneW #shefchem16 getting in a mess with mobile app️
baoilleach #ShefChem16 Hartenfeller reaction rules, but lack chemical context. Worked with Lhasa to create rules. Cactvs for cheminf.
WendyAnneW #shefchem16 Marc Nicklaus using Lhasa rules and CACTVS
baoilleach #ShefChem16 2312 retro syn transforms based on Corey's work in 70s. Written in CHMTRN/PATRAN - an "Interesting" language.
nathanbroo 2,312 retro synthetic transformations based on E. J. Corey's work in the 70s: combination of British English & 70s FORTRAN. #ShefChem16
mvkrier #ShefChem16 M. Nicklaus: Hartenfeller paper describes robust reactions, but are lacking chemical context information
baoilleach #ShefChem16 One new transform added: azide-alkyne Huisgen cu catalysed cycloaddition - "click chemistry". More to come.
dgelemi @WendyAnneWarr should do like @baoilleach and use laptop #typewriter #fasttyping #ShefChem16
WendyAnneW #ShefChem16 Marc shows CHMTRN transform
mvkrier #ShefChem16 M. Nicklaus: we are forward synthetic!
drjohndhol #ShefChem16 well under way. A good response to our first presentation.
baoilleach #ShefChem16 Originally for retrosyn but adapted for forward direction.
WendyAnneW #ShefChem16 starting materials from sigma Aldrich
baoilleach #ShefChem16 Using Sigma-Aldrich building blocks - 0.4m high availablity
WendyAnneW #ShefChem16 Marc shows Lhasa react workflow
baoilleach #ShefChem16 Paal-Knorr pyrrole synthesis example, same reactants but different products and scores
baoilleach #ShefChem16 2015 created 0.6m available for download based on 11 "productive" transforms in one-step reactions
baoilleach #ShefChem16 side-comment on PAINS filtering is not good - not explained
nathanbroo "PAINS is not good. You shouldn't use it." In its current form, couldn't agree more!!! #ShefChem16
baoilleach #ShefChem16 possible products based on 14 rules and 337k starting is 420m, 330m after filtering
WendyAnneW #ShefChem16 Marc 's prods generated as SDfiles. Then apply PAINS etc. 420 million poss prods from 377K build blocks
cressetgro Drop by our booth to see case studies and find out about our academic program #ShefChem16 #InnovativeSoftwarepic.twitter.com/8aRgO51kcE
baoilleach #ShefChem16 combinational explosion - need to filter - non-flatness, TPSA, types of rings/scaffold, rule-of-X, Bruns&Watson rules (JMC2012)
WendyAnneW #ShefChem16 JMC 2012 55 9763 Lilly rules for transformations
nathanbroo Uses Bruns & Watson 275 rules from Lilly for undesirable structures. http://pubs.acs.org/doi/abs/10.1021/jm301008n #ShefChem16 #OpenAccesspic.twitter.com/h3wanzPUnK
baoilleach #ShefChem16 Are novel? Only 0.8% found in PubChem 2015.
nathanbroo Must be clear they are demerits *not* rules or filters! I don't like rules! #ShefChem16
baoilleach #ShefChem16 looking at stat distributions. Making new rings? Yes - both aromatic and aliphatic.
WendyAnneW #shefchem16 Prop distributions and distrib of cost of starting mats shown. No exotic cmpds. New rings made
baoilleach #ShefChem16 Novel rings? Comparison with Ertl et al (2006). Some novel - missed the details...
WendyAnneW #shefchem16 Marc. Novel space is generated
nathanbroo Analysis of novel rings using Peter Ertl's excellent Quest for the Rings paper. http://pubs.acs.org/doi/abs/10.1021/jm060217p #ShefChem16 pic.twitter.com/r9ODgu4wqe
baoilleach #ShefChem16 New transforms being developed by LHASA. Goal still for ~1bn interesting cmpds. Will be available freely, searchable, orderable
baoilleach #ShefChem16 (Meta: Battery about to run out) Collab between Xemistry, Marc, Lhasa, Novartis, Sigma and Merck.
dgelemi @WendyAnneWarr @baoilleach tweets will go down through the day with fingers and batteries dying #ShefChem16 . Next talk on solubility...
nathanbroo Next up: Solubility - in search of a structural solution. Beth Thomas @ccdc_cambridge #ShefChem16
WendyAnneW #ShefChem16 Beth Thomas CCDC on solubility
nathanbroo 40% of drugs are practically insoluble! Poor absorption, low target exposure, higher dose. #ShefChem16
stewartadc Beth Thomas of @ccdc_cambridge standing up to talk about solubility and the data available in the CSD. #shefchem16 pic.twitter.com/BRSyLTXUL6
nathanbroo General Solubility Equation requires melting points but difficult to predict. #ShefChem16
nathanbroo Solubility is balance of melting point and lipophilicity. Crystal lattice and 'happiness' in water #ShefChem16
WendyAnneW #ShefChem16 we should predict solubility cos for design we need understanding
dgelemi #ShefChem16 very few effort from pharma at doing compound X-Ray to understand solubility at the early stage of the project
mvkrier #shefchem16 Beth Thomas from @ccdc_cambridge : Let's see if we can improve solubility by design
nathanbroo Solubility decision tree is presented. Many parameters driving solubility: complementarity, polarity, disruption of plane #ShefChem16
nathanbroo 'Solubility Cliffs' derived from matched molecular pairs. Cf. work from MedChemica #ShefChem16
WendyAnneW #ShefChem16 use CSD for lattice energy
nathanbroo Use @ccdc_cambridge Access Structures to look at structures in 3D https://summary.ccdc.cam.ac.uk/structure-summary-form #ShefChem16
dgelemi @WendyAnneWarr #ShefChem16 website available to get info. Can be used without own crystal to get data @ccdc_cambridge
WendyAnneW #ShefChem16 Beth Thomas mapping interactions around C=O using isostar. Isostar methodology explained
stewartadc Play along with Beth's talk by entering the refcode at https://summary.ccdc.cam.ac.uk/structure-summary-form #shefchem16
WendyAnneW #ShefChem16 Thomas why do some cmpds have this lattice and some not. Stacking in ureas, notoriously insol cmpd
WendyAnneW #ShefChem16 interaction map for benzamide. High density high MP
nathanbroo Try CSD refcode: DPUREA for interactive 3D structure visualisation: https://summary.ccdc.cam.ac.uk/structure-summary-form #ShefChem16 pic.twitter.com/7yUokEyM3p
WendyAnneW #ShefChem16 rufinamide JMC 59, 2346
WendyAnneW #ShefChem16 further down sol decision tree motif synergy can also affect soly JMC 59 , 1052
dgelemi #ShefChem16 example where increase logP >> increase solubility by disrupting crystal packing. Me addition on primary amide
WendyAnneW #ShefChem16 steric bulk increasing soly. Related cmpds in CSD shown
nathanbroo Disrupting planarity by introducing ortho twist but how much is enough? #ShefChem16
WendyAnneW #ShefChem16 packing disruption ortho twist example57 3719 JMC
nathanbroo Disrupting planarity through ring saturation. Cyclopropyl example - cool little functional group (recent review @JMedChem) #ShefChem16
WendyAnneW #ShefChem16 tstlilb and vemluu cyclopropyl example JMC 58 130
WendyAnneW #ShefChem16 at bottom of decision tree , knowledge driven approach. MMP anal of CSD
nathanbroo 'Isostructurality' introduced… Good title for a new book, perhaps… #ShefChem16
WendyAnneW #ShefChem16 CCDC the MMP research is a work in progress
WendyAnneW #ShefChem16 MP and lipophilicity both designed in to our mols. Send her soly examples that are unusual
nathanbroo Call for USOs @ccdc_cambridge 'Unexplained Solubility Outliers'. Example given is a simple regioisomer. #ShefChem16
WendyAnneW #ShefChem16 pic.twitter.com/GmLTcYqS8H
pwk2013 #MedChem #CompChem #cheminformatics #ShefChem16 https://twitter.com/nathanbroon/status/749197360211890176
rguha kudos to @nathanbroon @baoilleach @WendyAnneWarr @dgelemi @mvkrier for great coverage of #shefchem16
dr_greg_la I wonder when the last time I stayed in student housing was. #allpartofthefun
#ShefChem16
WendyAnneW #ShefChem16 first 3 papers after lunch will be tosco of cresset on diversity, leach on shape of mols, nicolaou on design of cmpd collectns
WendyAnneW @dr_greg_landrum #ShefChem16 I stayed in Downing college Cambridge for CCDC 50th last year. Accommodation was excellent. Shef not so posh️
InfoSchool The 7th Joint Sheffield Conference on Chemoinformatics is today til Wed., organised by our academics! #ShefChem16 http://bit.ly/29qMUKN 
ZINClick Waiting for your talk Paolo! #ShefChem16 https://twitter.com/cressetgroup/status/749879294655131649
deniseOme .@nathanbroon tuned in #ShefChem16 to find out how @targetvalidate w/ info on genes, disease, drugs, pathways can help in #drug validation
cressetgro Next up at #ShefChem16 is Paolo Tosco who will present on ‘Examining the Diversity of Large Collections of Building Blocks in 3D’ #CompChem
dgelemi Time to restart #ShefChem16 missing plugs along seats to charge phone/tablet/laptop pic.twitter.com/IrhAoRqZ7m
baoilleach #ShefChem16 (meta: batteries back up)
baoilleach #ShefChem16 Paolo Tosco @cressetgroup on 3D diversity of building blocks
baoilleach #ShefChem16 Comprehensive Fragment Library (CFL) - which fragments to include from large potential set?
WendyAnneW Tosco's Leap to lead platform #ShefChem16
baoilleach #ShefChem16 3d similarity necessary or just 2d sufficient? Let's find out...
WendyAnneW #ShefChem16 tosco pilot study to see if 3D similarity feasible. Bioblocks used
nathanbroo Up next is Paulo Tosco @cressetgroup on the analysis of 3D building blocks. #ShefChem16
dgelemi #ShefChem16 3D clustering from @cressetgroup on 800k virtual library collection using shape and field, including diastereoisomers #bigtask
WendyAnneW #ShefChem16 tosco diastereo enumeration pitfalls discussed
baoilleach #ShefChem16 diastereomer enumeration - not all independent for ring systems - not all are actually stereocenters - nitrogen inversion pssble
mvkrier #ShefChem16 Paolo Tosco from @cressetgroup describing a diastereoisomer generation worfklow
WendyAnneW #ShefChem16 accelrys draw and RDkit compared. Used latter. Redo coords after enumeration to see which stereoisomers feasible
baoilleach #ShefChem16 try to generate diastereomers and discard those not possible in 3D, also those with high E
baoilleach #ShefChem16 XED force field - eXtended Electron Distribution - multipoles are modelled via additional monopoles with dummy atoms VinterJCAMD
WendyAnneW Jcamd 1994 ref for XED force field #ShefChem16
WendyAnneW #ShefChem16 XED ft gives detailed interaction patterns
baoilleach #ShefChem16 Force field gives detailed electrostatic interaction patterns - matches exptal much better than traditional FFs
baoilleach #ShefChem16 Molecular interaction fields (MIFS) contain too much info to be used computationally quickly so compress down to a few points
baoilleach #ShefChem16 Whole MIF used for sim measure but field points used in first steps
dr_greg_la Collecting @RDKit_org bugs during Paolo Tosco's talk! I love it when people really push the code. #ShefChem16
WendyAnneW #ShefChem16 cresset MIFs contain too much info. So condense to fewer field points. But 3D similarity uses all field
baoilleach #ShefChem16 Sim measured as a combination of shape (0.25) and fields (0.75)
WendyAnneW #ShefChem16 pair wise similarity assessment assesses 25 % shape and 75% field
baoilleach #ShefChem16 3h for pairwise ECFP4 similarity; 250 CPU days for shape sim! 20k x 20k molecules
WendyAnneW #ShefChem16 20k diverse cmpds used in pilot took 96 cpu days
baoilleach #ShefChem16 2D vs 3D correlation: very little correlation - this is expected for molecules with low similarity
dgelemi #ShefChem16 no correlation at all between 2D ecfp4 and field 3D similarity #expected
WendyAnneW #ShefChem16 sim value distributions better for 3D than 2D. no correlation between the two sets of structures
baoilleach #ShefChem16 data clustered with k-medoids/CLARANS algorithm (in-house C++) - cluster tightness assessed by silhouette metric
WendyAnneW #ShefChem16 tosco used silhouette metric to...
WendyAnneW #ShefChem16 tosco now showing clustering examples pic.twitter.com/o1xbxOgZJF
baoilleach #ShefChem16 2D data disastereomers point in different directions when considered in 3D, but using field+shape everything lines up better
WendyAnneW #ShefChem16 3D fields and shapes method gives mols with much better fields than 2D ecfp4
baoilleach #ShefChem16 advantages of 3D - finds similarities across diastereomers and confs
WendyAnneW #ShefChem16 full set of 800k cmpds not feasible so used diverse 150k to do the clusters
baoilleach #ShefChem16 full set of 750K would take too long - chose diverse set of 150K. 5K CPU days. Used compressed files to reduce I/O contention.
dgelemi #ShefChem16 need better CPUs. 2 months calculation on 2k CPU! Need strategy to take number down. Cluster diverse set then assigned the rest
baoilleach #ShefChem16 First mention of cloudy cloud - Amazon EC2
WendyAnneW #ShefChem16 tosco discusses his job distribution strategy (nodes etc.) Amazon elastic cloud
WendyAnneW #ShefChem16 tosco discusses hec cloud infrastructure
baoilleach #ShefChem16 One master m3.large on demand, 64 c3.8xlarge spot instances, NFS-mounted EBS, 2048 jobs in parallel
WendyAnneW #ShefChem16 clustering took 4 weeks
baoilleach #ShefChem16 Took 4 days in the end. Clustering then took 4 weeks.
dgelemi #ShefChem16 cluster with large number of nodes and flexibility using EC2 from @amazonpic.twitter.com/SnwULAeBAL
dr_greg_la Paolo's also providing a nice intro to some AWS features. Good to see this in a #cheminformatics talk
#ShefChem16
WendyAnneW #ShefChem16 silhouette very similar to what he would have got using all 800k cmpds
nathanbroo Large number of comparisons taking many thousands of CPU hours cluster 3D building blocks. Tosco #ShefChem16
WendyAnneW #ShefChem16 all cmpds in 800k set now linked to ...
baoilleach #ShefChem16 This was a collab with BioClocks. Thanks to #rdkit too which was used throughout.
baoilleach #ShefChem16 (type: BioBlocks)
nathanbroo Now sat on the floor at the back of the room as @georgeisyourman stole my seat. #ShefChem16
drjohndhol A good crowd in at #ShefChem16 pic.twitter.com/PoWWDXRxLs
baoilleach @georgeisyourman #ShefChem16 Correction - first mention of cloudy cloud in a non-plenary :-)
baoilleach #ShefChem16 Andrew Leach - yes that one - no, the other one - on impact of shape on its bio and phy props
WendyAnneW #ShefChem16 pic.twitter.com/6ZwN7Wbhzc
baoilleach #ShefChem16 no matched molecular pairs will be mentioned
WendyAnneW #ShefChem16 leach. Why shape is important
baoilleach #ShefChem16 Fischer + Pauling lock-in-key - shape is important
nathanbroo Next up: Andrew Leach on molecular shape and the importance in biological activity. #ShefChem16 @pwk2013pic.twitter.com/YwB7CDxmM2
WendyAnneW #ShefChem16 lock and key of course
WendyAnneW #ShefChem16 leach. Isosteres are important in meds hem
baoilleach #ShefChem16 Isosteres - "iso" + "stereoes" - equal solid/hard
baoilleach #ShefChem16 Example: cimetidine->ranitidine/nizatidine - ranitidine built the Stevenage cite for GSK (undergrad rumour :-)
WendyAnneW #ShefChem16 shape has strong influence on pharm props
baoilleach #ShefChem16 To test for "shape": look at enantiomers - if shape unimportant then should have same props
dgelemi #ShefChem16 Andrew leach on shape. One case study are enantiomers where shape influence can be tested on pharmaceutical properties
baoilleach #ShefChem16 Looking at pairs of enantiomers (enants from now on) - same functional groups arranged differently in space
WendyAnneW #ShefChem16 bl...y type ahead. I meant medchem not mend hem or meds chem. Think I'll stick to photos
baoilleach @georgeisyourman #ShefChem16 'sokay - I'm presenting the plenary poster
baoilleach #ShefChem16 Leach et al. MedChemComm 2012 is all about this
WendyAnneW #ShefChem16 Leach medchemcomm 2012 3 528
mvkrier #ShefChem16 Andrew G. Leach from @LJMU takes up the cudgels for enantiomers and their difference in properties
cressetgro See Paolo Tosco's presentation from #ShefChem16 http://bit.ly/CresBB3D  #BuildingBlocksIn3D #CompChem #rdkit
baoilleach #ShefChem16 Gleeson binned data awakening the ghost of @pwk2013
WendyAnneW #ShefChem16 pic.twitter.com/u2zQlglQX4
baoilleach #ShefChem16 "shape" has improved the lives of many people, and pharma CEOs too
nathanbroo Shape influences biological recognition. Important influence on pharmaceutical properties. Enantiomers used as case study. #ShefChem16
baoilleach #ShefChem16 a quiz!! help me internets
baoilleach #ShefChem16 what shape is this? (shows a cube)
baoilleach #ShefChem16 what shape is this? (shows a molecular shape)
GJPvWesten @baoilleach #ShefChem16 did you get it right? ;)
WendyAnneW #ShefChem16 shape of vancomycin is v important but how do we talk about it?
nathanbroo Shapes are important but how do we process them and what is the recognition process? #ShefChem16 pic.twitter.com/QS4FtnSk0X
mvkrier #shefchem16 as my greek medchem colleague always says: without chirality, there is no life
baoilleach #ShefChem16 What shape is this? (shows an interesting carrot)
WendyAnneW #ShefChem16 speaks for itself... pic.twitter.com/fiBYOfpYk6
baoilleach #ShefChem16 ROCS @OpenEyeSoftware can measure shape sim via atom-centered Gaussians
dgelemi #ShefChem16 waiting to see if @WendyAnneWarr photo of the slide is going to pass NSW filter from Twitter...
WendyAnneW #ShefChem16 ROCS is the answer JCIM 2005, 45, 673
baoilleach #ShefChem16 Need reference shapes. Haigh/Pickup/Grant/Nicholls - shape Tanimoto - diversity selection of dataset
WendyAnneW #ShefChem16 leach uses shape tanimoto and ROCS
baoilleach #ShefChem16 Low shape Tanimotos are in the noise so best to use a cutoff and only consider as similar if above cutoff
WendyAnneW #ShefChem16 noise can be a problem
baoilleach #ShefChem16 Haigh et al compared shape fps to ROCS, but didn't link to measured props and work out what the fp comparisons really mean
baoilleach #ShefChem16 We are interested in which set of ref shape is best, what do the fp Tanimotos really mean, and something else I missed
baoilleach #ShefChem16 Ligand Expo was used to derive our Shape Database
WendyAnneW #ShefChem16 leach used Ligand Expo to derive his shape database. Test set of robin Taylor et al
baoilleach #ShefChem16 Taylor et al JCAMD 2012 dataset of bio activity where molecules are known to share a binding mode
nathanbroo Is shape another way of looking at molecular complexity? #ShefChem16
baoilleach #ShefChem16 Compare sim of molecules that bind to same vs different targets - logistic regression
WendyAnneW #ShefChem16 leach results pic.twitter.com/axjYHopmUO
baoilleach #ShefChem16 (@georgeisyourman sees neighbourhood behaviour behind every graph)
dgelemi #ShefChem16 complex graphs there. 2 histograms (up and down the plot) plus a line for ROC #overcrowded
baoilleach #ShefChem16 Our calculations use only shape. Comparisons of docking and shape isn't really like-with-like.
WendyAnneW #ShefChem16 not comparing like with like. Only using shape
baoilleach #ShefChem16 There is a remarkable ability to identify molecules with shared bioactivity when ONLY shape is considered
baoilleach #ShefChem16 All sets of reference shapes will be made available
baoilleach #ShefChem16 ...switching to solubility...
WendyAnneW #ShefChem16 pic.twitter.com/g90KuMcATA
baoilleach #ShefChem16 Lists 20 references on screen with solubility data
WendyAnneW #ShefChem16 leach. And now apply this to solubility
baoilleach #ShefChem16 About 5600 compounds with solubility and mp data - divided in training/test - built models with only SlogP
WendyAnneW #ShefChem16 models built with slogp & slogp plus MP and slogp plus GPs
baoilleach #ShefChem16 R-squared in 0.6 range even for training.
nathanbroo Now onto solubility (again!). Data collated from many sources. Models using fingerprints & ClogP seem better than MP & ClogP. #ShefChem16
baoilleach #ShefChem16 Some influential shapes. (I've missed the connection to solubility here - oops)
WendyAnneW #ShefChem16 leach shows some influential shapes
dgelemi @baoilleach #ShefChem16 similar to USR from @pjballester http://www.ncbi.nlm.nih.gov/pubmed/17342716  with shape similarity
WendyAnneW #ShefChem16 sigh. Slogp plus fingerprints not gps
baoilleach #ShefChem16 Dedication to Andy Grant who spurred his interest in this area
baoilleach @pwk2013 #ShefChem16 (ghostly present here at the conference)
baoilleach #ShefChem16 Christos Nicolau on ...Designing compound collections for success
nathanbroo Next up (my some-time co-author), Christos Nicolaou from Lilly on designing compound collections. #ShefChem16
baoilleach #ShefChem16 Has previously spoken on Proximal Lilly Collection (PLC). What's new?
dgelemi #ShefChem16 Nicolaou started his talk to say that it's similar to one 3 yrs ago. What's new? Now they have more data and use in real life
baoilleach #ShefChem16 sees drug discov as a computational optimization - multiple objectives - need effective search of chemical space
mvkrier #ShefChem16 Christos A. Nicolaou from @LillyPad will tell us about Proximal Lilly Collection in practice for 3 years
nathanbroo Nicolaou almost using the Douglas Adams quote that [chemistry] space is big! Really, really big! #ShefChem16
GJPvWesten @drjohndholliday But picked up several followers already... #ShefChem16
baoilleach #ShefChem16 number of atoms on earth (10^50) is less than the size of the small molecule universe (10^60)
nathanbroo Nicolaou using mass of the earth as metaphor for size of chemistry space. I use a double decker bus. #ShefChem16
WendyAnneW #ShefChem16 people calculate the number of atoms on earth. Think how long to make all mols in chem space
baoilleach #ShefChem16 Working on enabling the generation of large virtual libs and real collections, and methods to probe those spaces.
WendyAnneW #ShefChem16 the general approach to exploring space is search based. Lilly's focus is interestingness not just bigness
baoilleach #ShefChem16 What about interestingness? He doesn't think enough work has been done on this - partly because of size involved
baoilleach #ShefChem16 PLC - building blocks, virtual synthesis via rules -> large number of compounds
baoilleach #ShefChem16 see also BI-Claim, PGVL, PLC, Scubidoo, SAVI (Marc's talk earlier)
WendyAnneW #ShefChem16 proximal Lilly collection published in JCIM. Bi claim pgvl and other work acknowledged
baoilleach #ShefChem16 multi-step DNA encoded library (DEL) via multi-component reaction
baoilleach #ShefChem16 combinatorial chemistry on steroids
WendyAnneW #ShefChem16 DNA encoded library (DEL) technology overview
GJPvWesten #ShefChem16 DNA-Encoded library by Christos Nicolaou, million cmpds in a testtube.. cool!
nathanbroo DNA-encoded libraries and characteristics of the new chemical space that is opened up. #ShefChem16
baoilleach #ShefChem16 DNA tag added that identifies compound. Mixture of 1 million potentially. Can screen, and work out the DNA codes for hits.
baoilleach #ShefChem16 Can used a ltd no of robust rxns, or finite no of building blocks, full enumeration.
dgelemi #ShefChem16 DEL has limited BB and number of reactions but still huge numbers. Also full enumeration. But is this space interesting?
dr_greg_la Christos keeps telling us large chem space is automatically interesting. I'm waiting for the "but"
#ShefChem16
baoilleach #ShefChem16 Took some of these sets and analysed them for "interestingness"
WendyAnneW #ShefChem16 100 X 500 X 500 combinatorialI library. Some people are actually trying to make these huge spaces. He does not try this
baoilleach #ShefChem16 intra-collection diversity, property values, ...
baoilleach #ShefChem16 Compare real collections (e.g. subset of PubChem, Lilly), and virtual collections
baoilleach #ShefChem16 Used #rdkit for analysis via @knime
baoilleach #ShefChem16 looking at intraset near-neighbour distance (paper on the way explaining all this)
WendyAnneW #ShefChem16 nicolaou results pic.twitter.com/lBWIgawbI1
baoilleach #ShefChem16 DEL sets have larger no of compds per cluster, but fluctuates largely
baoilleach #ShefChem16 C80% - how compounds would you need to screen to ensure that finds a nbr within 0.8
baoilleach @dgelemi #ShefChem16 @WendyAnneWarr 10 min left on battery again!!
dgelemi #ShefChem16 there are DELs and there are DELs. Not all libraries are equal even with same technology (large space A != large space B)
WendyAnneW #ShefChem16 diversity analysis . How many cmpds need to be compared worse for PLC than for DEL
baoilleach #ShefChem16 DEL libs are highly homogeneous, explores rel few compound classes
baoilleach #ShefChem16 gc3tk used ?? to do analysis
WendyAnneW #ShefChem16 DEL designs differ a lot. Which to make first? Merck atom pair FPs and c80% diversity anal. NN distributn & other approaches
WendyAnneW #ShefChem16 they work within diversity neighbourhoods eg drugs drugs, frags frags
mvkrier #shefchem16 How does your chemical diversity neigbourhood look like (A. Nicolaou)
WendyAnneW #ShefChem16 and now for tea - recharge our batteries, in all respects
nathanbroo Time for coffee! Very much needed! #ShefChem16
rguha @baoilleach how is interestingness being defined? #ShefChem16
dgelemi #ShefChem16 poster session coming ... Or seating next to the wall to charge phone. Another good session.
WendyAnneW #ShefChem16 Paul Hawkins of Open Eye talks about SNOWFLAKE. Data driven decisions in lead discovery
nathanbroo Up next: Paul Hawkins @OpenEyeSoftware on SNOWFLAKE: data-driven decisions in lead discovery #ShefChem16
dgelemi #ShefChem16 session to start. 1st talk from Hawkins from #openeye. Snowflake with abstract containing cloud and GPUs
baoilleach #ShefChem16 Paul Hawkins @OpenEyeSoftware on Snowflake - data-driven decisions in lead discov
baoilleach #ShefChem16 we've been analysing VS results wrong
baoilleach #ShefChem16 VS about separating scores - decoys vs actives
nathanbroo Combative start: we're analysing virtual screening results in he wrong way. #ShefChem16
baoilleach #ShefChem16 Talking about ROCS scores
mvkrier #Shefchem16 Ready for Paul Hawkins from @OpenEyeSoftware; snowflake, hope it doesn't develop into a snowstorm
baoilleach #ShefChem16 No more sampling to infer population - because of cloudy cloud and GPU
mvkrier the cloud and the gpu=> no more sampling (P. Hawkins) #ShefChem16
WendyAnneW #ShefChem16 vhts work by separating score distributions. He uses ROCS shape similarity but his talk applies to other s/w too. Probability
WendyAnneW #ShefChem16 Hawkins. No need for sampling any more. Cloud and clusters can be used
baoilleach #ShefChem16 10^7 molecules: ROCS 60-90 mins over 1200 CPUs, FastROCS (2 GPUs): 1-2 mins
WendyAnneW #ShefChem16 distributions for ALL mols.
baoilleach #ShefChem16 "The cutoff" - based on scores, or rank/number - what is the right cutoff?
nathanbroo "What's the perfect cut-off for ROCS? There isn't one. It depends…" #ShefChem16
dgelemi #ShefChem16 "what's the best cutoff from VS", best answer is "it depends" @OpenEyeSoftware
WendyAnneW #ShefChem16 analysing a virtual screen you used to set a cutoff. You do not Need to.
baoilleach #ShefChem16 Cutoff should be based on the results, not set in advance....
nathanbroo Virtual screening cut-offs: we still have to do something. What are the options? #ShefChem16
baoilleach #ShefChem16 The cutoff decides precision vs recall (signal-to-noise) - Goldilocks "just about right" cutoff
nathanbroo The #JustAboutRight Goldilocks cut-off. #ShefChem16
WendyAnneW #ShefChem16 cutoff means too much noise or too little signal. He recommends the just about right cutoff. The goldilocks cutoff
baoilleach #ShefChem16 Example: 9m molecules from eMolecules, 96 queries from DUDE, and ROCS score distribs
dgelemi #ShefChem16 presentation from Hawkins available here https://openeye.box.com/s/g6kmw8kc3ey1c3uoo3g64d2b131qxgz1 @OpenEyeSoftware for our Twitter friend
baoilleach #ShefChem16 The query affects the distribution of scores. FIxed score cutoffs are a bad idea.
nathanbroo "Fixed score cut-offs are a bad idea" no disagreement there. Does anyone still do that? #ShefChem16
WendyAnneW #ShefChem16 96 queries from DUDE database.get ROCS score distributions. Fixed score cutoff are a bad idea. Mean distributn differs by query
baoilleach @neysanev #ShefChem16 Mean score @nathanbroon @WendyAnneWarr @dgelemi We each have a unique spin on it.
rguha you could also consider cutoff in terms of resource availability (take as many as you can follow up)? #shefchem16 https://twitter.com/baoilleach/status/749983971535097856
nathanbroo Rank scoring also a bad idea… #ShefChem16
baoilleach #ShefChem16 Snowflake ref from Fight Club "you are not a snowflake"
pjballeste @dgelemi @baoilleach #ShefChem16 Now easy to use USR/USRCAT prospectively: http://nar.oxfordjournals.org/content/early/2016/04/22/nar.gkw320.abstract e.g.w/cytarabine: http://usr.marseille.inserm.fr/iview/?575581c69641d65b5da98dab
WendyAnneW #ShefChem16 score distributions in lb virtual screening are unique for the population
nathanbroo "Virtual screening is a beautiful and unique snowflake" (Palahnuik) Must be treated differently every time #ShefChem16
baoilleach #ShefChem16 pessimism in analysing VS, e.g. if scoring is not working, are random; if no actives present
WendyAnneW #ShefChem16 Hawkins. Pessimism in analysing VS? Database scores are distributed randomly.
baoilleach #ShefChem16 Background distribution is considered to be normal (central limit theorem)
WendyAnneW #ShefChem16 the population score is accessible
baoilleach #ShefChem16 assumptions: deviations from normality indicate actives
nathanbroo Snowflake assumptions: scale, scores & significance. Access entire population, one normal distribution #ShefChem16
baoilleach #ShefChem16 normal distrib is "unbounded" - sits on real number line
baoilleach #ShefChem16 real scores are bounded (0->1, etc.)
nathanbroo Fitting to a normal *not* calculating mean and standard deviation. #ShefChem16
mvkrier #shefchem16 Miss Smilla's Feeling for Snow to be applied for #virtual_screening
baoilleach #ShefChem16 ...so unbound it "logit p = log(p/1-p)" logit transformation
WendyAnneW #ShefChem16 deviations from normal at high scoring end found by snowflake
baoilleach #ShefChem16 can we decide whether a VS works or not before exptal testing?
dgelemi @WendyAnneWarr and used to know if a VS did work or not using statistics, and comes QQ plot. #ShefChem16
baoilleach #ShefChem16 QQ plot - want to see more high scoring molecules than expected by chance - comparison to normal
baoilleach #ShefChem16 how to locate the Goldilocks cutoff - don't just stare at the top of the list
WendyAnneW #ShefChem16 looking just at the top of the hit list is not the way to get the cutoff. Look at all the results
baoilleach #ShefChem16 look at the whole list, and fit to normal distribution, and think about z-scores
nathanbroo "Looking at just the top of the hitlist is uninformative." Paul Hawkins #ShefChem16
baoilleach #ShefChem16 what no of molecules do you expect at a particular z-score?
baoilleach #ShefChem16 Consider the actives as a small Gaussian sitting on top of a large background Gaussian
georgeisyo @GJPvWesten Z-scores #ShefChem16
baoilleach #ShefChem16 Binomial test: what's the chance that the selected cutoff is valid
WendyAnneW #ShefChem16 binomial test for probability of success in goldilocks cutoff (GC)
baoilleach #ShefChem16 Think about probabilities instead of scores; the relevant probability of success
WendyAnneW #ShefChem16 the language of odds is easier for the chemist to understand. Score to z score to odds
baoilleach #ShefChem16 z-score of 4: 1 in 31574, etc.
baoilleach #ShefChem16 can turn molecule scores into z-scores - can tell if interesting vs uninteresting - select hits based on probabilities
WendyAnneW #ShefChem16 you can find extremely unusual molecules - the ones to make
nathanbroo Snowflake method for prioritising interesting hits. Paul Hawkins. #ShefChem16 pic.twitter.com/xxsoMLtqZy
baoilleach #ShefChem16 CP3A4 example: no cutoff can be found - no actives are present, or at least no confidence that top results are better than rndom
WendyAnneW #ShefChem16 examples. In one case no GC found . less than 90k to 1 chance of mols
baoilleach #ShefChem16 XIAP: lines cross, and can find goldilocks cutoff. One in 3.5million chance scores.
jwmay Anyone else reminded of this: http://www.daylight.com/meetings/emug97/Bradshaw/Significant_Similarity/Significant_Similarity.html #ShefChem16
WendyAnneW One in 3.5 million example next #ShefChem16
WendyAnneW #ShefChem16 third eg 4.2 million to one against
baoilleach #ShefChem16 Easy to digest way of approaching significance in sim measures. Graphical.
baoilleach #ShefChem16 Use all your hard-won data - the entire distrib. Treat each run as unique - diff cutoff. ...and st else
cdsouthan . @WendyAnneWarr appreciating your #ShefChem16 tweets but some are a tad cryptic
nathanbroo Use all the data: fit entire distribution. Treat each VS As unique. Talk about results in terms of odds *not* scores. #ShefChem16
nathanbroo Snowflake in summary. #ShefChem16 pic.twitter.com/nvNeXELz8V
baoilleach @cdsouthan #ShefChem16 @WendyAnneWarr dude - this ain't easy, listening, understanding and typing soundbites :-)
rguha in this case, wouldn't fitting a mixture model be more robust? (e.g. Tong et al http://dl.acm.org/citation.cfm?id=1234656) #shefchem16 https://twitter.com/baoilleach/status/749987098124517376
mvkrier #ShefChem16 acknowledgements goes to @mattgeballe , @craigbruce and Forrest York
rguha @baoilleach what the 'entire distrib' or 'population' that is being referred to? The entire screening deck? #shefchem16
GJPvWesten @georgeisyourman #ShefChem16 loving them ;)
WendyAnneW @baoilleach @cdsouthan #ShefChem16 I do better with my laptop...
nathanbroo Up next: Yi Mok @ICR_London talking on EnCore a new method we have developed for scaffold morphing to enrich SAR #ShefChem16
baoilleach #ShefChem16 Yi Mok: ICR on enriching SAR info by scaffold enum
GJPvWesten @georgeisyourman #ShefChem16 improved MCC from ~0.20 --> ~0.40 for MCNBC ...
nathanbroo I'll probably step away from tweeting this one. Might be a bit biased! #ShefChem16
dgelemi #ShefChem16 next talk from @ICR_London Yi Mok on scaffold enumeration and clustering. Scaffold Tree and HTS on abstract
baoilleach #ShefChem16 What is a mol scaffold? Part not changing within a cmpd series
mvkrier #ShefChem16 Yi Mok from @ICR_London : reminding us of the different definitions of a molecular #scaffold
baoilleach #ShefChem16 Defn preferably dataset independent, objective and invariant. Example: Scaffold Tree (Ansgar in audience!)
cdsouthan . @baoilleach @WendyAnneWarr #ShefChem16 For us (envious) remotees there's no panic on the Titanic i.e. slower more cogent tweets better
WendyAnneW #ShefChem16 scaffolds should be independent of dataset. E.g., Novartis scaffold tree
baoilleach #ShefChem16 Such scaffolds can be used in clustering, represents "compd series", easily interpretable, may overlook key functional groups
baoilleach #ShefChem16 introduce "controlled fuzziness" in scaffold representations
dgelemi #ShefChem16 not science related but my neighbour is doing an art like drawing. I'm impressed. #scribbling on notepad
WendyAnneW #ShefChem16 clustering using core scaffolds, objective scaffolds. Good 4 SAR and interprets but may be 2 stringent or miss key funct groups
baoilleach #ShefChem16 EnCore: Enumeration of Core Scaffold. Mimic scaffold exploration efforts and enrich SAR info during HTS hit id
WendyAnneW #ShefChem16 EnCore. Mok and brown JCIM submitted
baoilleach #ShefChem16 (aside: didn't they just make this name up last week?? @nathanbroon :-)
dgelemi #ShefChem16 paper from Mok's presentation, EnCore submitted in JCIM with @nathanbroon Soon in your RSS alert
baoilleach #ShefChem16 Features: CNO elemental changes (mutation), preserve aromaticity, keep unique mutated scaffolds-->enumerated scaffold cluster
baoilleach #ShefChem16 ...several generations, iterative
WendyAnneW #ShefChem16 C N O changes. Mutation on scaffold once per generation. Keep aromaticy From mutated scaffolds get enumerated scaffold cluster
baoilleach #ShefChem16 Example: imidazole as parent scaffold. 1st gen gives 6 mutated scaffolds, 2nd gen gives an additional 8
baoilleach #ShefChem16 Relevant to med chem? Let's find out...
baoilleach #ShefChem16 Can the scaffold exploration recover explored scaffolds in a med chem compound series?
WendyAnneW #ShefChem16 can the scaffolds retrieve scaffolds explored in medchem literature?
baoilleach #ShefChem16 Mok & Brown, JCIM, submitted
nathanbroo @baoilleach @WendyAnneWarr the anticipation is palpable! #ShefChem16
baoilleach #ShefChem16 Looking at all 62 publications with at least 100 compounds with IC50. After filters --> 43 med chem series. Variety of targets
WendyAnneW #ShefChem16 selection of JMC series made. At least 100 cmpds with ic50. Series for wide spectrum of targets.
nathanbroo 43 medicinal chemistry series defined from @ChEMBL in submitted manuscript. #ShefChem16
baoilleach #ShefChem16 The top level 1 scaffold in each series was used for EnCore enumeration
baoilleach #ShefChem16 20 of the series were found at the Level 1, 10 found at Level 2, the few remaing at 3/4
baoilleach #ShefChem16 (meta: need to sort out battery again! Firefox/tweetdeck is not light)
WendyAnneW #ShefChem16 his scaffolds ARE relevant. 20 scaffolds at first generation, 9 more at second, 2 more at third. EnCore mimics the medchem
WendyAnneW #ShefChem16 how many generations best?
baoilleach #ShefChem16 DrugBank approved drugs contain 475 unique level 1 scaffolds.
baoilleach #ShefChem16 Two rings connected by very long scaffold leads to many enumerated scaffolds.
WendyAnneW #ShefChem16 used DrugBank to test this. Lipin ski like cmpds. 475 unique level 1 scaffolds
baoilleach #ShefChem16 (meta: sitting close to a plug at the back now, but could do with a binoculars)
WendyAnneW #ShefChem16 average epfp7 Structural similarity studied. tanimoto below 6 by third or fourth generation
baoilleach #ShefChem16 Can EnCore associate structurally related screening cmpds to the parent scaffold?
WendyAnneW #ShefChem16 now some work on overlapping series,
baoilleach #ShefChem16 HTS screening set: ~0.25m -> 23K Level 1 scaffolds.
baoilleach #ShefChem16 After two gens of mutations, 74% of scaffold clusters found. Enrichment achieved!
georgeisyo @nathanbroon I will count this as the first mention of @chembl data at #ShefChem16
deniseOme .@baoilleach Wonder if these targets are associated with diseases in the #targetvalidation platform https://www.targetvalidation.org/  #ShefChem16
dgelemi #ShefChem16 sent to the past by listening to the talk from Y Mok when we were both at @UoDLifeSciences
WendyAnneW Can enrich SAR by using his enumerated scaffolds #ShefChem16
baoilleach #ShefChem16 Looking at singletons originally, but after enumeration can find many screening cmpds.
baoilleach @deniseOme #ShefChem16 Pinging @nathanbroon
DrJoshuaBo The DUDE abides #ShefChem16
baoilleach #ShefChem16 I should read this paper when it comes out.
WendyAnneW @baoilleach #ShefChem16 ten thousand compounds in one case
DrJoshuaBo We are all special, beautiful and unique snowflakes #ShefChem16
WendyAnneW #ShefChem16 now Peter Kolb winner of MGMS prize. Exploring chemical and GPCR ligand space
nathanbroo Next up: Peter Kolb - recipient of the MGMS Silver Jubilee with his award lecture. #ShefChem16
baoilleach #ShefChem16 Peter Kolb - In-silico exploration of chemical and GPCR ligand space
baoilleach #ShefChem16 Talking about docking, GPCRs and kinases, and method dev
mvkrier #ShefChem16 @MgmsUpdates MGMS Lecture from Peter Kolb from @Uni_MR
DrJoshuaBo Yi Mok cookin raw with the Brooklyn boy #EnCore #ShefChem16
nathanbroo Docking is key technique applied in the Kolb lab. Now their applications in GPCRs. #ShefChem16
dgelemi #ShefChem16 start of the presentation with acknowledgement and telling that it's not going to be a method talk. GPCR, docking, MD on the way
baoilleach #ShefChem16 Docking good because can deal with orphan targets, novel chemotypes, hit rates of 1% (HTS is only 0.1%)
baoilleach #ShefChem16 How GPCRs work. G-protein, beta arrestin. rhodopsin first to be xtallised. Since 2007 many more.
WendyAnneW #ShefChem16 intro. advantages of docking. Gpcrs transmit signals
GJPvWesten #ShefChem16 Beta2, sexy binding site :)
DrJoshuaBo If a dog barks at me I get a dopamine rush but I get what you're saying Pete! #ShefChem16
baoilleach #ShefChem16 B2AR. Nice binding pocket. Like Bordeaux wine glass. 5 known inverse agonists.
WendyAnneW #ShefChem16 inverse agonists of beta 2 AR. Five known
baoilleach #ShefChem16 Kolb PNAS 2009 - found 6 compds from 25 tested. Six potent and novel hits.
WendyAnneW #ShefChem16 Kolb found six potent and novel hits pnas 2009, 106, 6843
baoilleach #ShefChem16 Some of the hits look like expected, but novel effects also found. Have since been expanded upon. Very encouraging for docking.
WendyAnneW #ShefChem16 so docking to gpcrs is a worthwhile endeavour
baoilleach #ShefChem16 We can also identify fragment like binders, typically thought to be more difficult.
WendyAnneW #ShefChem16 can also identify fragmentlike binders
baoilleach #ShefChem16 Can we find molecules with tailored selectivity: CXCR3 vs CXCR4
jwmay Nice talk but can't help but wonder which tools makes 'OH' red and white? #ShefChem16 pic.twitter.com/pfeeA7NCLF
WendyAnneW #ShefChem16 selectivity cxcr3 versus 4 investigated by docking
baoilleach #ShefChem16 3m molecules docked to both receptors. Exptally dual binders are not described in ChEMBL - diff props.
WendyAnneW #ShefChem16 homologous modelling was necessary
WendyAnneW #ShefChem16 homology not homologous. Sigh
baoilleach #ShefChem16 17 predictions, and all three selectivity groups were 50% correct (i.e. only CXCR3, only 4, both)
dgelemi #ShefChem16 finding dual binder difficult. That's my daily job at @exscientialtd for GPCR and across gene family. It's possible
MgmsUpdate Peter Kolb, winner of the 8th MGMS Silver Jubilee Award, speaking at #ShefChem16 on his docking work against GPCRs pic.twitter.com/1SbwHd6DhK
WendyAnneW #ShefChem16 different binding mode for a significant compound. Dual binder
baoilleach #ShefChem16 Different binding mode (exptally) for a dual binder.
baoilleach #ShefChem16 Checked for non-specific inhibition. Detergent, spinning, dynamic light scattering, screening on unrelated target.
WendyAnneW #ShefChem16 proved that the binding is "real"
baoilleach #ShefChem16 suvorexant has unusual shape - horseshoe shape in xtal structure. Could we have found it? Yes. And other tool compds too.
WendyAnneW #ShefChem16 he was able to reproduce suvorexant binding. Almorexant has intro I stacking
baoilleach #ShefChem16 K003JG022 used as id for molecule - almost a WIsswesser Line Notation in itself!
WendyAnneW #ShefChem16 sorry - hit send key too soon in last tweet. Rubbish...
WendyAnneW #ShefChem16 docking can find novel chemotypes JCIM 2015, 55, 1824
baoilleach #ShefChem16 Chevillard JCIM 2015 - building blocks, 58 robust reactions -> 21m - SCUBIDOO
nathanbroo It's not just me that uses Lego in slides on enumerating chemistry space! This one is rather good! #ShefChem16 pic.twitter.com/il1uMdA6Zy
baoilleach #ShefChem16 stratified balanced sampling to select sample (aside: why not random?)
WendyAnneW #ShefChem16 SCUBIDOO described in that JCIM paper
baoilleach #ShefChem16 SCUBIDOO useful for finding tool cmpds, virtual screening. Available online. http://kolblab.org/scubidoo 
baoilleach #ShefChem16 Traffic light system used for showing hits: red->caution, might polymerise.
nathanbroo SCUBIDOO: Screenable Chemical Universe - http://www.kolblab.org/scubidoo  #ShefChem16 pic.twitter.com/p46LxMxxYH
baoilleach @WendyAnneWarr #ShefChem16 You've called my bluff!
baoilleach #ShefChem16 Overall impression is that Kolb is a GPCR inhibitor finding machine. Or his software is.
nathanbroo .@baoilleach @WendyAnneWarr I mentioned them in my new book. #ShefChem16
WendyAnneW #ShefChem16 fragment built up and compatible building blocks found to get ligands for beta 2 AR binding site
baoilleach #ShefChem16 Collab with Taros (?) in Dortmund, testing 240 molecules for b2AR. Need to ask for details - hot off the press...
WendyAnneW #ShefChem16 conclusion. docking is great for finding ligands for gpcrs
nathanbroo Nice to see success stories on docking. It is useful but you need to be careful how and when you use it & modulate expectations. #ShefChem16
baoilleach #ShefChem16 Question from floor on viz inspection step, importance thereof. Kolb tries to select novel ones which would be game changing.
baoilleach #ShefChem16 ...has also tried to use machine learning to chose them, like a chemist would, but not quite there yet.
WendyAnneW @baoilleach #ShefChem16 yes Taros
DrJoshuaBo I dream of a world where picking molecules gives you street cred #ShefChem16
nathanbroo @baoilleach @WendyAnneWarr maybe… #ShefChem16 #shamelessadvertisingpic.twitter.com/CQkUs7Vknr
nathanbroo Now onto the @MgmsUpdates AGM. #ShefChem16
jwmay Quick break - maybe time for a quick read (Shef circa '84) #ShefChem16 pic.twitter.com/RiIZuNw1AY
ssalentin Great Conference #ShefChem16 so far! :) Looking forward to present #PLIP tomorrow here in Sheffield. #protein #ligand #interactions
dgelemi #ShefChem16 @DDU_Dundee is present! @aschreyer Credo's database mentioned as wellpic.twitter.com/OmxDUIOkMN
nathanbroo Starting another day with a conference breakfast. #ShefChem16 pic.twitter.com/vW1bcz6t6I
dgelemi #ShefChem16 9am start approaching and breakfast room getting less crowdy pic.twitter.com/RudiHLsHbg
jwmay My Poster (21) for #ShefChem16: "Sketchy Sketches". http://www.slideshare.net/NextMoveSoftware/sketchy-sketches-hiding-chemistry-in-plain-sight?qid=b10cc5a4-9f47-4eb1-ad48-d6ae50d3f159&v=&b=&from_search=1
baoilleach #ShefChem16 Nadine Schneider talking about reaction chemistry space
nathanbroo Up first: Nadine Schneider and back to the roots of chem(o)informatics from NIBR. #ShefChem16
baoilleach #ShefChem16 Syn of cmpds is one of the bottlenecks in drug discov
baoilleach #ShefChem16 Bias to more easily accessible compounds and to a small set of common reactions
WendyAnneW #ShefChem16 marine Schneider of Novartis. Why we care about reactions
nathanbroo We know lots of molecules but do we know how to make them. New ways to design synthetic routes. #ShefChem16
WendyAnneW #ShefChem16 Schneider. And why we use patents
baoilleach #ShefChem16 Why use patent data? Rich source, only small overlap 6% with compounds in literature, rxns are actually used in pharma
baoilleach #ShefChem16 Text-mining can extract a lot of info from patents
nathanbroo Text mining allows extraction of compounds, targets, bio-affinities, melting points, yields whole reaction schemes #ShefChem16
baoilleach #ShefChem16 Schneider et al JMC 2016 analysis of 40 years worth of patents
WendyAnneW #ShefChem16 Schneider . Admittedly patent reactions are error prone
dgelemi #ShefChem16 extracting reactions from patents, paper picked up be @Dereklowe lots of interest on this patent data from chemists
WendyAnneW #ShefChem16 chemists are interested in the data. See C&EN and in the pipeline blog
nathanbroo Here is the paper from Schneider et al. http://pubs.acs.org/doi/abs/10.1021/acs.jmedchem.6b00153 #ShefChem16 pic.twitter.com/5nsXxlWMa6
baoilleach #ShefChem16 Textmining - identify exptal section in patent - extract product (usually in title, but not always) - convert to chemical struct
WendyAnneW #ShefChem16 mining patents. Extract product is easy. Getting the structure Is harder
baoilleach #ShefChem16 Textmining: extract the remainder of the reaction. Shoutout for Daniel Lowe of @nextmovesoftware
WendyAnneW #ShefChem16 Schneider cites Lowe PhD thesis
baoilleach #ShefChem16 Reactions are noisy: cleanup required - crude sanity checks - standardisation & duplicate removal
baoilleach #ShefChem16 Duplicates removed with canonical reaction smiles
nathanbroo Reaction standardisation & duplicate removal first. Canonical SMILES, lexicographic sort, hashing function #ShefChem16
WendyAnneW #ShefChem16 the reactions need clean up, de duplicating standardised etc. Then do reaction mapping
baoilleach #ShefChem16 reaction role assignment and atom-to-atom mapping - often have more atoms on left than on right
nathanbroo Reaction role assignment & atom-to-atom mapping next: reactants, reagents & products #ShefChem16
baoilleach #ShefChem16 Which are reactants and which are reagents? Can do atom mapping - find corresponding atoms - uses MCS based approach
baoilleach #ShefChem16 Atom mapping: many different algorithms - huge challenge is identifying the atoms and bonds in the rxn
WendyAnneW #ShefChem16 Chen in WiRES cited on mapping
baoilleach #ShefChem16 For many applications, the exact mapping is not required, just need to reaction role assignment
nathanbroo Sub graph matching with changing atoms and invariants. New fingerprint method for reaction role assignment. #ShefChem16
baoilleach #ShefChem16 A new method for rxn role assignment without mapping, based on fingerprints
WendyAnneW #ShefChem16 schneider presents new fp assignment method without using mapping
nathanbroo Fingerprint is very local form of Morgan fingerprint (radius = 1) with simple topological characteristics. #ShefChem16
baoilleach #ShefChem16 Morgan FPs (radius 1); detailed FP versus scaffold FP (the latter only has atomic no and no bond types)
dgelemi #ShefChem16 lots of different atom mapping methods, including reaction fingerprint by adapting ecfp
WendyAnneW #ShefChem16 Schneider explains reaction role assignment
baoilleach #ShefChem16 Using count-based fingerprints; combine reactant fps in different combinations; ignore irrelevant combinations using heuristic
baoilleach #ShefChem16 Build reaction fps based on difference between fps for product and reactant combinations
WendyAnneW #ShefChem16 she did not cite InfoChem JCIM paper overview on mapping
nathanbroo Rogers & Hahn cited for Morgan fingerprint (or Extended-Connectivity Fingeprint) http://pubs.acs.org/doi/abs/10.1021/ci100050t #ShefChem16 pic.twitter.com/8kPeTCRm1T
nathanbroo Reaction assignment made by new scoring function for difference fingerprints of reaction fingerprints. #ShefChem16
baoilleach #ShefChem16 Score each fp based on bits mapped and bits left over - based on both detailed and scaffold fp - gives which are the reactants
WendyAnneW #ShefChem16 she says no datasets available. Has she seen the InfoChem set?
baoilleach #ShefChem16 What to compare to? The patent reaction dataset of 1.3m unique rxns of which 62% are classified using SMIRKS by NameRxn
WendyAnneW #ShefChem16 use of next move name run
baoilleach #ShefChem16 Dataset 1: 683 unbalanced rxns from patents with 228 rxn types
philbiggin Great afternoon of talks at the 7th Sheffield Cheminformatics Conference #ShefChem16 yesterday. Thanks again to all who made the MGMS AGM!
nathanbroo Dataset construction to validate new assignment method. #ShefChem16
baoilleach #ShefChem16 Dataset 2: randomly 50k from patents, all of the atoms need to be atom-mapped and only including classified rxns
WendyAnneW #ShefChem16 she also uses indigo toolkit
baoilleach #ShefChem16 (First mention of indigo toolkit)
WendyAnneW #ShefChem16 grr! Next move name rxn not name run
baoilleach #ShefChem16 NameRxn works by applying a pattern to the LHS and check whether the result matches the RHS
nathanbroo Quick summary of NameRxn @nmsoftware which is a classification-based atom mapping procedure. #ShefChem16
WendyAnneW #ShefChem16 indigo reaction mapping uses substructure based mapping
baoilleach #ShefChem16 Indigo toolkit uses a substructure-based algorithm
WendyAnneW #ShefChem16 Schneider : 87% of reactants correctly assigned
baoilleach #ShefChem16 Typical failures first: assignment of correct reactant only possible knowing the rxn mechanism
nathanbroo 87% correct assignments in first set of results. Most issues in oxidations. #ShefChem16
baoilleach #ShefChem16 Noisy unbalanced patent data may contain multiple rxns in one scheme
baoilleach #ShefChem16 2 possible solns predicted for 1.5% of dataset
baoilleach #ShefChem16 Dataset 2: 97% agreement with NameRxn results
nathanbroo Results on second dataset gives 97% agreement on. 50k unique reactions from patents. Very fast to run. #ShefChem16
baoilleach #ShefChem16 96% agreement with Indigo TK results
WendyAnneW #ShefChem16 better than indigo toolkit
WendyAnneW #ShefChem16 fp method even handles subtle examples
baoilleach #ShefChem16 FP-based method even gets subtle diffs between reactants - summary: it works well, no atom mapping required, efficient, robust
nathanbroo Summary from Schneider: efficient and robust fingerprint-based methods as alternative to atom-to-atom mapping step. #ShefChem16
WendyAnneW #ShefChem16 the new method will be included in RDkit
baoilleach #ShefChem16 New method sheds some light on unclassified rxns in patent dataset
baoilleach #ShefChem16 RDKit UGM 26-28 Oct in Novartis, Basel - come along
nathanbroo 5th #RDKit UGM to be held in Basel on 26-28th October. Free registration still available. Hope to attend myself! #ShefChem16
WendyAnneW #ShefChem16 next up Ben Allen & Sree Vadlamudi on network pharmacology and NCEs
baoilleach #ShefChem16 Ben Allen & Sree Vadlamudi e-Therapeutics; Network Pharmacology and New Chemical Entities
nathanbroo Up next: two-hander of Ben Allen & Sree Vadlamudi from e-Therapeutics on Network Pharmacology & New Chemical Entities #ShefChem16
baoilleach #ShefChem16 What is Network Pharmacology? And how can it be used for drug discover?
baoilleach #ShefChem16 Think about diseases in terms of interacting proteins
nathanbroo Network pharmacology to elicit protein network of pathways and dependencies. Biology is highly robust & adaptive. #ShefChem16
baoilleach #ShefChem16 Network is robust and adaptive, emergent function
baoilleach #ShefChem16 Nodes+edges=network, network props, node props, community structure
WendyAnneW #ShefChem16 networks nodes and edges explained
dgelemi #ShefChem16 2 part talks from e-therapeutics. 1st on work from 2 yrs ago and 2nd on the results from it and results from cell assay ("real")
nathanbroo Many ways to characterise networks: diameters, distances, etc. Cf. social networks and small world phenomena #ShefChem16
baoilleach #ShefChem16 Network diameter (max dist) - to what extent the network can route around a missing edge
WendyAnneW #ShefChem16 network and node properties discussed
baoilleach #ShefChem16 Centrality of nodes - probably important - betweenness centrality - how often a node is on the shortest paths
baoilleach #ShefChem16 Modules, cliques, clustering - parts of the network more connected to each other than to the rest - communities
nathanbroo Community structure: identify functional organisation of complex networks. Biology: connection between structure & function. #ShefChem16
baoilleach #ShefChem16 Network science considers a random network; randomly deleting nodes makes them vulnerable
baoilleach #ShefChem16 Biological networks have power-law degree distribution (not gaussian); robust to random node deletion but vulnerable to targeted
WendyAnneW #ShefChem16 vulnerability of networks. Biol networks have no inherent scale and are robust
nathanbroo Biological networks have power-law degree distribution. Robust to random node deletion; brittle to targeted node deletion. #ShefChem16
baoilleach #ShefChem16 Apply to perturb a protein-protein interaction network - intervention needs to be both multiple and targeted
dgelemi #ShefChem16 lots of stats on biological networks. They are not like random and it's possible to identify important nodes
baoilleach #ShefChem16 Can disrupt by picking highest degree and highest betweenness nodes
WendyAnneW #ShefChem16 he explains how to disrupt a network
baoilleach #ShefChem16 Have team of biologists spending between 2-6 mnths building up a network for a particular problem
WendyAnneW #ShefChem16 you need experts to construct the networks from lit searching and pathway analysis
baoilleach #ShefChem16 Cmpds are promiscuous binders and pleiotropic (?) in action
baoilleach #ShefChem16 Overall footprint of cmpd affects many targets and if hit the right ones, then can have important network effect
dgelemi #ShefChem16 expert biologists involve for months to help building networks with some meaning #manualwork
baoilleach #ShefChem16 Have an in-house target profile database used to identify molecules that will have the desired effect
nathanbroo Generate large profile of compound-protein interactions, filling in the gaps using chemoinformatics. Not elaborated. #ShefChem16
WendyAnneW #ShefChem16 looking for compounds that have a significant effect on the network but don't care about structures of ligand or protein
baoilleach #ShefChem16 Very spare dataset as full interaction profile not known from ChEMBL
WendyAnneW #ShefChem16 yes pleiotropic
nathanbroo As-is tends to mine known pharmacology. Now moving into predictive models to find the unknown. #ShefChem16
dgelemi #ShefChem16 Bayesian and fingerprint mentioned as prediction. Try to predict the entire pharmacology profile of a compound
WendyAnneW #ShefChem16 network impact calculation explained
baoilleach #ShefChem16 Network impact calculation: cmpd->footprint in database->remove corresponding nodes from network (approximation)
baoilleach #ShefChem16 Then calculate impact on network, e.g. largest component size and so forth...now for some of the results
nathanbroo Moving on to Sree to talk about case studies using network pharmacologies. #ShefChem16
WendyAnneW #ShefChem16 @baoilleach sparse not spare
dgelemi #ShefChem16 end of part 1. Now real application as a result of investment in 2012 to use the platform that was developed before that.
nathanbroo Heavy use of CROs for compound acquisition and testing. #ShefChem16
baoilleach #ShefChem16 12 projects currently happening; some in med chem/lead opt stage; immuno-oncology, oncology, CNS
nathanbroo 12 projects; 30-35 FTEs working over Europe. Two cancer projects presented: telomerase and hedgehog. #ShefChem16
WendyAnneW #ShefChem16 oncology projects are of the most interest to them
baoilleach #ShefChem16 Telomerase: involved in cell cycle reg, exp of anti-apoptotic proteins; DNA damage response, reg of Myc, NF-kB and Wnt/b-catenin
baoilleach #ShefChem16 Hypothesis: telomerase is involved in many cancer-related activities
dgelemi #ShefChem16 e-therapeutics moving as a biotech with assay dvt, hit and lead discovery with CROs. Focus on cancer, CNS and inflammation
WendyAnneW #ShefChem16 new telomerase biology - approach for targeting the telomerase
baoilleach @WendyAnneWarr #ShefChem16 Spot the journal editor :-)
baoilleach #ShefChem16 Telomerase project: had cmpds that hit 4 or 5 key nodes, bought similar cmpds for screening
nathanbroo Identify compounds that impact 4 or 5 nodes. #ShefChem16
baoilleach #ShefChem16 Test cmpds in cell proliferation assays, and telomerase expression in two cancer cell lines
dgelemi #ShefChem16 find compounds that affect key nodes from network with telomerase. And test in cell assays to show reduction of expression
baoilleach #ShefChem16 393 cmpds tested; 106 active, 9 selected based on potentcy/selectivity/IP, lead opt -> 100x improvement, patents filed
baoilleach #ShefChem16 Targeting the hedgehog pathway (?)
WendyAnneW #ShefChem16 non SMO based methods for hedgehog binding
baoilleach #ShefChem16
nathanbroo Rapid progress of initial hits through optimisation of drug parameters: potency, DMPK, etc. #ShefChem16
dgelemi #ShefChem16 aim for candidate nomination be W4 2016 after lead optimisation effort to improve activity by 100 fold. #success
baoilleach @WendyAnneWarr #ShefChem16 "when" not "where"
dgelemi #ShefChem16 first 3D pie chart of the conference #dataviz . Good result to show out of it though
baoilleach #ShefChem16 (...oops lost the thread, responding to typo corrections)
DrJoshuaBo Difference between network and graph? Networks are graphs where nodes/edges have attributes... Sounds like a graph to me! #ShefChem16
baoilleach #ShefChem16 This time, 1146 cmpds tested, 63 active, lead opt --> 1000x potency improvement, multiple patents filed, scale up planned
nathanbroo Both cancer targets expected to deliver in candidate selection in Q4 2016. #ShefChem16
WendyAnneW #ShefChem16 they also have TNF projects but no time to discuss
baoilleach #ShefChem16 Summary: unique discov platform; proven ability to id high-potency mols; open to collab
deniseOme .@nathanbroon @baoilleach + extraction of target-disease associations @EuropePMC http://phenoday2016.bio-lark.org/pdf/1.pdf  #ShefChem16 pic.twitter.com/lwiwmeMd7w
WendyAnneW #ShefChem16 they have unique patent protected platform...
dgelemi #ShefChem16 @baoilleach no mention on the medchem side and how hit to lead is done (usual way assumed). e-Platform for target/hit ID
baoilleach #ShefChem16 Planning on developing MOA platform to generate more info for networks
baoilleach #ShefChem16 Question: isn't optimising for 1 target hard enough, without thinking about 5 simultaneously?
baoilleach #ShefChem16 Answer: not straightforward, is different than one-target opt
WendyAnneW #ShefChem16 next up Tim James of Evotec. Building a phenotypic screening collection
nathanbroo Challenge of optimising selectively to dive targets is huge but confident can progress. Multiple chemotypes so backups available #ShefChem16
dgelemi #ShefChem16 @baoilleach from the answer I got they do phenotypic screening and not target based. Am I correct? @nathanbroon @WendyAnneWarr
baoilleach #ShefChem16 Tim James Evotec: challenges in optimising public data for phenotypic screening
nathanbroo Up next: Tim James on building & analysing phenotypic screening collections from Evotec #ShefChem16
baoilleach #ShefChem16 Phenotypic drug discov - work backwards from readouts to identify targets
dgelemi #ShefChem16 next one is #evotec for phenotypic screening by Tim James and analysing data results and some target assignment
baoilleach #ShefChem16 HTS library - potential to find novel targets no-one has worked on before - exptal techniques work but not always
WendyAnneW #ShefChem16 HTS library has potential for finding novel targets but bios note the
nathanbroo CROs have access to large amounts of data but they don't own it so can't use it. #ShefChem16
WendyAnneW #ShefChem16...bio annotate the library. Sorry for previous typo
baoilleach #ShefChem16 Using bioannotated library - target not novel, but the connection to a particular disease is
baoilleach #ShefChem16 Screening data + annotations -> initial hypotheses
WendyAnneW #ShefChem16 thus establish some initial hypotheses. Then assess the target using much publicly available data
baoilleach #ShefChem16 Target assessment: knockout/mutation data, clinical data (SIDER/ClinVar), pharmac. probes, literature evidence
baoilleach #ShefChem16 Target validation: experimental, CRISPR, siRNA, overexpression
WendyAnneW #ShefChem16 experts then do target validation. He will home in on target de convolution
baoilleach #ShefChem16 Approaches to target deconvolution: cheminf is part of a set of complementary techniques, e.g. phospho-proteomics
nathanbroo Many approaches to target deconvolution. Chemoinformatics not used in "glorious isolation" #ShefChem16
nathanbroo ~5k unique drugs, probes, tools in phenotypic screening library. #ShefChem16
baoilleach #ShefChem16 Bioannotated screening collection: maximise target coverage while mininising cmpd duplication, 8019 samples->5k unique
nathanbroo Key point on (any) screening library: continuous curation! #ShefChem16
baoilleach #ShefChem16 Manual checking of chemical structures; 800 fixes for stereo; outright errors; subtle errors
WendyAnneW #ShefChem16 much curation of the bio annotated library must be done manually eg stereo
dgelemi #ShefChem16 use bio annotated library as screening deck for target deconvolution in phenotypic screening. Similar to Irwin lab at UCSF
baoilleach #ShefChem16 Important to have correct structures or everything that follows will be wrong if the structure is wrong
nathanbroo Database is bio annotated from multiple sources: @ChEMBL, DrugBank, @pubchem (qualitative), suppliers & manual curation. #ShefChem16
baoilleach #ShefChem16 ~900k data pts from public sources (e.g. PubChem, DrugBank, ChEMBL), 172 data pts per cmpd
WendyAnneW #ShefChem16 manual curation for quantitative activities
baoilleach #ShefChem16 Text-mining the supplier descriptions identifies 2549 compound-target annotations
baoilleach #ShefChem16 (aside: am wondering whether patents could also be a useful source of cmpd target annotations)
WendyAnneW #ShefChem16 text mining for compound-target annotations covers only 35% of the collection
nathanbroo Using public data 60-70% of data can be corroborated. #ShefChem16
baoilleach #ShefChem16 Comparing annotations from multiple sources: 60% consistent, 6% inconsistent
WendyAnneW #ShefChem16 most of the cmpds are relatively selective
georgeisyo @nathanbroon best quote so far #shefchem16
baoilleach #ShefChem16 Assessing polypharmacology: most cmpds appear selective, median of 0 for highly-potent, median of 3 for <= 10 micromolar
WendyAnneW #ShefChem16 19% of cmpds have no annotation at all for activity
nathanbroo Highly polypharmacological compounds should be retired: promiscuous, false-positives (assay tech.)? #ShefChem16
baoilleach #ShefChem16 Quality of biological probes: PAINS 7%, MedChem flagged 12%, Badapple 22% flagged
baoilleach #ShefChem16 Quinalizarin both a catechol and a quinone, appears to be potent at 16 different targets
WendyAnneW #ShefChem16 up to 20% of cmpds have problems re PAINS and the like
baoilleach #ShefChem16 Why does 19% of the collection lack any protein-level annotations: prodrugs, antibiotics/virals/non-human, poorly characterised
baoilleach #ShefChem16 Need to retire some cmpds and find more data for others
baoilleach #ShefChem16 Do statistical analysis to find enriched targets, enriched GO terms, enriched pathways
WendyAnneW #ShefChem16 from phenotypic screen to MOA
baoilleach #ShefChem16 Even for the top targets, we find both cmpds known to be active, and active - we want to see the weight of evidence
baoilleach #ShefChem16 (Missed how the stats are worked out)
baoilleach #ShefChem16 What other data sources: STITCH, ....
WendyAnneW #ShefChem16 how to extend the coverage of the library. Integrity and Reaxys are expensive
baoilleach #ShefChem16 Assessing library coverage; we cover ~80% of target space, but requiring selectivity reduces this down to 20%, but dn't needthis
WendyAnneW #ShefChem16 ChEMBL and she lectivity..
WendyAnneW #ShefChem16 oops - selectivity
dgelemi #ShefChem16 difficulty to get a library of properly annotated compounds, manual effort, expensive other source of information...
baoilleach #ShefChem16 Context is important - cmpd would interact but not permeable into cell - could ChEMBL tell this? not much data available on this
WendyAnneW #ShefChem16 establishing context is critical and it cannot be done from public data
dgelemi #ShefChem16 using @ChEMBL for permeability data is not fantastic. Few data points published.
nathanbroo Summary: knowledge-driven phenotypic deconvolution can be done using public data (to a certain extent). Curation essential! #ShefChem16
baoilleach #ShefChem16 Summary: pharmacology data well reported but contextual data not; data curation important; public data CAN be used for this
nathanbroo I'm chairing the next session so tweets probably at a minimum. @WendyAnneWarr, @baoilleach & @dgelemi will be tweeting harder! #ShefChem16
dgelemi #ShefChem16 @baoilleach getting ready for next session.pic.twitter.com/z1Q9KK9pLL
baoilleach #ShefChem16 Next up is my boss, Roger Sayle, on Chemical Similarity based on Graph Edit Distance
WendyAnneW #ShefChem16 after the coffee break we have Roger Sayle on chem similarity based on graph edit distance
dgelemi #ShefChem16 @nathanbroon preparing to chair next session. Starting with @nmsoftwarepic.twitter.com/vOXONiXfcg
PedroFranc #ShefChem16 pic.twitter.com/ntBWtyLzv1
baoilleach #ShefChem16 Tribute to Andy Grant, who was involved back at the start in 2012
baoilleach #ShefChem16 2D chemical similarity - fingerprints or MCS-based approaches (shoutout to Ed Duesbury)
baoilleach #ShefChem16 MCS results more intuitive but more difficult to implement - shoutout to Val Gillet
baoilleach #ShefChem16 Limitations of fps: sim based on local substructure "all the same notes just in a different order"
baoilleach #ShefChem16 Can have saturation of features/chemical space (e.g. Andrew Dalke's Chemical Tookit Rosetta benzodiazapine set)
baoilleach #ShefChem16 fps: no distinct between atom type changes (e.g. Cl->Br), tautomers may have low similarity
dgelemi #ShefChem16 fingerprints have limitations. Local substructures, difficulty for peptides, C16+ alkanes, atom type and stereochemistry
baoilleach #ShefChem16 Examples: Central change in structure leads to low Tanimoto
baoilleach #ShefChem16 Given the known deficiencies with binary fps, is it possible to do better with modern hardware?
dgelemi #ShefChem16 Genesis of small world from @nmsoftware https://www.nextmovesoftware.com/smallworld.html 
WendyAnneW #ShefChem16 can we do better with 2D similarity. JCIM 46, 1912
baoilleach #ShefChem16 Grant et al JCIM 2006 - Lingos - "the emperor has no clothes" - a simple method that beats path-based fps
baoilleach #ShefChem16 41% of mols in dataset with same Bemis-Murcko scaffold have same bioactivity; for MMPs, it's 86%, same mol formula, it's 83%
baoilleach #ShefChem16 R-groups (anti-scaffold); 50% of the time same bioactivity (keep the R groups and throw away the scaffolds)
baoilleach #ShefChem16 String edit distance, but applied to molecular graphs
WendyAnneW #ShefChem16 MMPs and mol formula are better than bemis Murcko for similarity of bio activity
baoilleach #ShefChem16 1965 Levenshtein no. of insertions, deletions and subs to transform one string into another - big in bioinformatics
baoilleach #ShefChem16 Graph Edit Distance is the same concept applied to graphs - Sanfeliu 1983
WendyAnneW #ShefChem16 advances in bioinformatics mean that GED matching is feasible
baoilleach #ShefChem16 Examples of edits: benzene->pyridine 1 mutation; bezene->cyclohexane change of hybridisation; R group additions
baoilleach #ShefChem16 Unfortunately, calculating GED is a generalisation of calculation of MCS which is NP-Hard
WendyAnneW #ShefChem16 calcn of GED is generalisation of MCS. Computationally expensive
baoilleach #ShefChem16 GED and MCS are related by deleting atoms from A to get the MCS, and then adding atoms to get B
baoilleach #ShefChem16 Video of SmallWorld in action https://www.youtube.com/watch?v=hZ4QyQSeSWg
nmsoftware #ShefChem16 Video of SmallWorld in action https://www.youtube.com/watch?v=hZ4QyQSeSWg
baoilleach #ShefChem16 Shows demo - can fine-tune edit distance searching - turn on/off mutations
dgelemi #ShefChem16 recorded video instead of live demo. Not taking risk with network
WendyAnneW #ShefChem16 demo shows that it IS fast enough - so now onto the math
baoilleach #ShefChem16 Possible to solve MCS problem in polynomial time using preprocessed graphs Messmer ref
WendyAnneW #ShefChem16 the secret is in the preprocessing. Loads of sub graphs found and stored
baoilleach #ShefChem16 Enumerate all subgraphs for a given molecule, sorted by the number of bonds. MCS becomes a trivial set intersection or lookup.
dgelemi #ShefChem16 thanks to math research and pre processing it is now possible to do fast search. #crossdiscipline
baoilleach #ShefChem16 A single molecule could have millions of subgraphs. Shows graph of molecule connected to its subgraphs, all the way down...
baoilleach #ShefChem16 Can find all the molecules one atom away, two atoms away, etc. very quickly - find nearest neighbours even in large databases
WendyAnneW #ShefChem16 his tree-like view of chemical space. In SmallWorld. One bond at base, structure with seven bonds higher up the tree
baoilleach #ShefChem16 The bigger the database (e.g. PubChem) the faster the search for nearest neighbours - counter intuitive
baoilleach #ShefChem16 Traditional MCS starts from one bonds and grow; SmallWOrld starts with N bonds and shrinks
miltonbrew #ShefChem16 Build reaction fps based on difference between fps for product and reactant combinations
baoilleach #ShefChem16 Grand unified theory of cheminf forces: mol identity, sub search, chem sim. All can be done with the SmallWorld graph database.
WendyAnneW #ShefChem16 fraction of practical chemical space. Is actually much smaller than theoretical space
baoilleach #ShefChem16 Different edge types in graph: terminal edges, link edges, ring gen/breaking
baoilleach #ShefChem16 "Dalke wormholes" - need to work down from A to MCS and then go up, not vice versa to avoid wormholes through unlikely structure
baoilleach #ShefChem16 Efficient subgraph enumeration - benzene has 7 anon subgraphs, atorvastatin has 3million
WendyAnneW #ShefChem16 efficiencies in MCES. Eg delete just one bond
baoilleach #ShefChem16 Roger's starting to go super fast! Slides going by in a whizz
baoilleach #ShefChem16 Today db has 19billion nodes and 76billion edges, 2.85 terabytes
WendyAnneW #ShefChem16 19 billion nodes. 76 billion edges In SmallWorld
baoilleach #ShefChem16 Connected vs disconnected MCS
WendyAnneW #ShefChem16 in SmallWorld index, that is
WendyAnneW #ShefChem16 representing activity cliffs becomes intuitive
baoilleach #ShefChem16 Representing activity cliffs - intuitive in terms of graph edit distance. Any molecule more similar to A than to B...
baoilleach #ShefChem16 Summary: sublinear behaviour in searching, blessing of dimensionality, continual advances in hardware make this more attractive
baoilleach #ShefChem16 Question: have considered applying weights to particular edit distances?
baoilleach #ShefChem16 Answer: can be done, e.g. PAM matrices in biology, difficult to come up with consistent weights
WendyAnneW #ShefChem16 in memorium Andrew Grant
baoilleach #ShefChem16 Question: possible to work out GED between 2 molecules directly? Answer: yes - but not practical for db search
WendyAnneW #ShefChem16 next up Richard Hall of Astex on their fragment network
baoilleach #ShefChem16 Richard Hall Astex: The Astex Fragment Network
dgelemi #ShefChem16 need more after talk. Slide on FP issues with protein and peptide but no example during the talk #shortoftime
baoilleach #ShefChem16 More on graph networks and recommendation systems - to find near nbrs for hit validation
WendyAnneW #ShefChem16 the Astex fragment network builds heavily on sayle's work
baoilleach #ShefChem16 Networks used to manage Big Data - think of PageRank - we have built a graph network for chemistry recommendations
dgelemi #ShefChem16 astex talk on their approach to similarity graph search. Comparison to smallworld from @nmsoftware is in the abstract
baoilleach #ShefChem16 "Fragments like this may also be of interest to you"
WendyAnneW #ShefChem16 Astex has graph network for chemistry recommendations
baoilleach #ShefChem16 Fragment hit validation - do I really need the phenyl, can I grow the fragment in direction X, can I substitute on the ring?
baoilleach #ShefChem16 Many of these are not matched pairs.
DrJoshuaBo Roger Sayle presenting SmallWorld long in the making. Another tribute to the genius of Andy Grant #ShefChem16
baoilleach #ShefChem16 Path-based FPs don't work well in the fragment space. Substructure searching difficult to find diff rings, longer linkers, O/M/P
baoilleach #ShefChem16 (ortho/meta/para substituted rings)
dgelemi #ShefChem16 traditional fingerprint similarity measures don't work well on fragments. Similar fragments end up in the noise
WendyAnneW #ShefChem16 chemical FPs do not work well for fragments. Substructure search may be laborious .network approach looks appealing
baoilleach #ShefChem16 Generate a list of nodes and edges by systematically removing (not atoms) but rings, linkers and functional groups
baoilleach #ShefChem16 Shows a graph where fragments are connected to subfragments
DrJoshuaBo Remember Andy asking me to write depiction webservice for this. I added loads of features and of course it was too slow! #ShefChem16
WendyAnneW #ShefChem16 build network by removing rings linkers funct groups systematically
WendyAnneW As network grows, we begin to connect similar cmpds #ShefChem16
baoilleach #ShefChem16 Network built up to include 4.5 million cmpds - nodes are labelled with source/supplier info
WendyAnneW #ShefChem16 search network by looking for friends of friends
baoilleach #ShefChem16 To search, we can easily find nbrs which are additions/deletions to a substrucutre, find nbrs of nbrs, search time 0.1-0.2s
baoilleach #ShefChem16 Up to 16 heavy atoms (fragments) - Use neo4j 23M nodes, 107M edges - cypher query language
WendyAnneW #ShefChem16 system called neo4j.
baoilleach #ShefChem16 Grouping of results into sets, e.g. para-sub of upper ring, upper ring replacement, hydroxy group replacement
nathanbroo Roger from @nmsoftware showing enumeration of alkanes that is similar to Arthur Cayley's kenograms. #ShefChem16 pic.twitter.com/jxPogMOsj5
baoilleach #ShefChem16 We sort the results by no of registry/ChEMBL observations - number of times particular subs occur - more likely subs shown first
baoilleach #ShefChem16 Peter Ertl JS viewer used again (also by Roger Sayle)
baoilleach #ShefChem16 Web-based simple interface - each result annotated with source information
baoilleach #ShefChem16 Chemists love it - can find 95% of the interesting cmpds with 5% of the effort
baoilleach #ShefChem16 Comparison to other sim measures (shoutout to #rdkit)
baoilleach #ShefChem16 Gotchas: some changes are not as small as you might think, e.g. changing two OH to two OMe's
WendyAnneW #ShefChem16 interesting comparison. Some frag net cmpds would be hidden in noise. A few outliers found by both methods. Many don't overlap
baoilleach #ShefChem16 The search results are only for 2 edge distance (i.e. nbrs of nbrs)
baoilleach #ShefChem16 Can't jump directly from ring to ring
WendyAnneW #ShefChem16 chemists and modellers at Astex love it
baoilleach #ShefChem16 Retrieves cmpds inline with med chem intuition
baoilleach #ShefChem16 Future work: add additional edge types (e.g. tautomer, similar rings), add "like" functionality, more comparisons to fps
baoilleach #ShefChem16 Analyse connectivity - might be a good source of variation (synthetically mutable)
baoilleach #ShefChem16 (previous tweet is a reference to highly connected molecules in the database)
baoilleach #ShefChem16 Landrum points out an atom-atom map sim method more appropriate for fragments
baoilleach #ShefChem16 Andrew Dalke on Calvin Mooers and the early history of chemical information
baoilleach #ShefChem16 Starting looking into history of how to substructure screens
dgelemi #ShefChem16 next talk from Andrew Dalke on Calvin Mooers and history of chemical information #historyLecture
baoilleach #ShefChem16 1940s "there was too much data" was the complaint - people wanted to do complex queries
baoilleach #ShefChem16 People used punched cards (this is before computers were really around)
dgelemi #ShefChem16 pictures of punched cards. Back to a time that I don't know #TooYoung
baoilleach #ShefChem16 Two types: edge-notched for manual systems and interior-notched for sorting machines - use a knitting needle to search notches
WendyAnneW #ShefChem16 I remember both types of punch cards but I never used edge-notched ones
baoilleach #ShefChem16 Lots of punch card coding systems were developed - not a binary search system, was more complex than that
baoilleach #ShefChem16 C&EN 1945: Frear stat investigation of correlation between structure and toxicity - Frear Codes - substructure keys
baoilleach #ShefChem16 1945 "As we may think" - Vannevar Bush proposed a memex machine to search information - big inspiration at the time
dgelemi #ShefChem16 big data in 1945 with 1M records in CAS. Start thinking on how to retrieve data and index it
baoilleach #ShefChem16 Over 1 million records at CAS at the time - Chemistry was Big Data
WendyAnneW #ShefChem16 first time I have heard of frear code but everyone knows about memex. Only 1 million records in CAS then
baoilleach #ShefChem16 Calvin Mooers invented information retrieval at ACS conference 1950 (or st like this)
baoilleach #ShefChem16 Moers coined the term descriptor. Is there a clever coding system for sparse data.
nathanbroo Mooers coined the term "descriptors" #ShefChem16
WendyAnneW #ShefChem16 mooers coined the term descriptor
baoilleach #ShefChem16 Used to call false negatives "false drops" due to physical dropping of cards from the needles
baoilleach #ShefChem16 More and more complicated codes to support card searching of more than 1 descriptor and supporting "AND" queries
WendyAnneW #ShefChem16 mooers and superimposition of random codes
WendyAnneW #ShefChem16 Calvin Mooers that is...
baoilleach #ShefChem16 Zatocoding - Sharpe and Dohme were the first customers
dgelemi #ShefChem16 description on problem with punched card, needles and collisions. Created way to encode descriptors then moves to chemical
nathanbroo Chemical Zatocoding by Mooers is the first time descriptors were used. #ShefChem16
baoilleach #ShefChem16 Indexing limitations - human error in indexing - a non-index search was very slow, e.g. 3 months search of CAS
nathanbroo Information Retrieval has its roots in chemistry! #ShefChem16
WendyAnneW #ShefChem16 I just learned something from Noel. I never knew why we used the term "false drop"
baoilleach #ShefChem16 Zatopleg coding - awesome name
WendyAnneW #ShefChem16 I am beginning to feel young! I never heard of zatopleg coding
baoilleach #ShefChem16 Substructure search desired as a constraint satisfaction problem suitable for early computers (UNIVAC)
baoilleach #ShefChem16 Mooers strong armed people over the patent and was challenged with an injunction - took years of lawsuits to resolve
dgelemi #ShefChem16 zatopleg coding. Not in Wikipedia! Best to go to Andrew Dalke website http://www.dalkescientific.com/writings/diary/archive/2014/06/19/Calvin_Mooers.html
baoilleach #ShefChem16 Mooers started working at National Bureau of Standards. Came up with N-tuples (early Daylight fps?) of atoms.
baoilleach #ShefChem16 1960s: Can finally buy a computer!
WendyAnneW #ShefChem16 but surely we have all heard of Ray and kirsch and their connection table. They referenced mooers
baoilleach #ShefChem16 Ernst Meyer at BASF implemented Mooers contraint satisfaction sub search
baoilleach #ShefChem16 CIDS project by the US army
WendyAnneW #ShefChem16 Meyer at BASF computerised mooers work in GREMAS in the 60s
WendyAnneW #ShefChem16 mooers is referenced in the Morgan paper too
baoilleach #ShefChem16 Canonicalisation. Gluck published an algorithm. Mooers is mentioned by Morgan (1965), and may have suggested the algorithm.
baoilleach #ShefChem16 Dysonian ciphers, ordering zermelo - more awesome names
dgelemi #ShefChem16 chemical space can be defined by graphs, in 1960's
WendyAnneW #ShefChem16 by 1973-1975 references to mooers disappear
WendyAnneW #ShefChem16 lefkovitz cited mooers
baoilleach #ShefChem16 Mooers' influence lasted till 70s - e.g. Lefkovitz (1975) "the large db file structure dilemma"
dgelemi @WendyAnneWarr #ShefChem16 but Sheffield was on the slide for the 70's. Difficult to tweet and follow this talk. Lots of unknown stuff
baoilleach #ShefChem16 Feldman and Hodes (1975) - a superimposed code for substructure searching
WendyAnneW #ShefChem16 so did Feldman and Hodes. I visited Feldman once, many years ago
dgelemi #ShefChem16 and here is the daylight slide https://twitter.com/baoilleach/status/750286354697838592
baoilleach #ShefChem16 summary of zatocoding problems, and now the impact on cheminf
WendyAnneW #ShefChem16 zatocoding does help explain daylight fp work
baoilleach #ShefChem16 Mooers - "Father of the conection table"? Yes, though reinvented multiple times
baoilleach #ShefChem16 The biologists liked the encoding better than the chemists
baoilleach #ShefChem16 Thanks to the Charles Babbage Institute - Minnesota
dgelemi #ShefChem16 makes you wonder what those scientifically in the 40/60's would have done with modern computers! #foundation
dgelemi #ShefChem16 lunch breaks down poster session. May need it as next talk as integral equation theory in the title
rguha @dgelemi thankful for the lunch break. So I catch up on the deluge that is #ShefChem16
dgelemi #ShefChem16 MMP and pharmacophores. But different definition of pharmacophores from @Discngine graphs MMPspic.twitter.com/KaNeLrc50t
baoilleach #ShefChem16 David Palmer on "Is there a role for the integral eqn theory of mol liquids in chem informatics?"
baoilleach #ShefChem16 Go Dave!
baoilleach #ShefChem16 LogP/LogD etc all rely on solvent effects - can we calc these a bit more rigorously and quickly?
dgelemi @rguha and we're back with RISM-MOL-INF (obvious) #ShefChem16
baoilleach #ShefChem16 Hydration free E - how much the mol wants to be in gas or soln phase
WendyAnneW #ShefChem16 David Palmer univ Strathclyde integral equation theory of molecular liquids is a statistical mechanics method. Not my field
baoilleach #ShefChem16 pKa, solub, lipophilicity, complex formation all linked to hydration free E
baoilleach #ShefChem16 Two main methods: explicit solvent (MD, MC - rigourous but slow) versus implicit (solvent distrib fns - continiuum electrostats)
baoilleach #ShefChem16 Another implicit method - integral eqn theory of liquids - fairly quick - more info than continuum models
baoilleach #ShefChem16 "Fairly shocking slide to see shortly after lunch" - the equation for the 3D RISM method
WendyAnneW Yes all those 3D RISM equations are a bit of a blow after lunch #ShefChem16
baoilleach @pwk2013 #ShefChem16 Yes - through the MCS.
baoilleach #ShefChem16 Around since the 1930s, but not used much because doesn't give great results.
baoilleach #ShefChem16 But Palmer et al 2010 have shown that including the Partial molecular volume that make it work
dr_greg_la @WendyAnneWarr yeah, it's not that much math, but still brutal after the food. #ShefChem16
baoilleach #ShefChem16 Extrapolates from dataset of 185 molecules to drug-like molecules - very robust
WendyAnneW Jctc 2014, 10, 934 #ShefChem16
baoilleach #ShefChem16 Original model had 2 regression params, cavity correction since then (1 param), ensemble correction (0 param)
WendyAnneW J phys chem let 2014 5 1935 #ShefChem16
dgelemi #ShefChem16 improvement from the 1st equation by different groups and a new one in press. Longer and longer equations to correct for errors
baoilleach #ShefChem16 He has developed a PC+ Correction (JChemPhys2015 in press) using the knowledge of the others
WendyAnneW #ShefChem16 his own work is j phys chem in press
baoilleach #ShefChem16 No empirical params so doesn't just work for neutral molecules in standard conditions
WendyAnneW J chem phys 2015, 142, 091105 #ShefChem16
baoilleach #ShefChem16 Works for ionized small organics - the MD calcs take 1000 times longer
WendyAnneW #ShefChem16 j phys chem b 2016, 120, 975
rguha fears of overfitting? #ShefChem16 https://twitter.com/dgelemi/status/750316190946525184
baoilleach #ShefChem16 Recently been working on salting in/out - can predict Setschenow's constants very well
baoilleach #ShefChem16 Non-polar solvents can be modelled as Lennard-Jones spheres.. More difficult to solve the equations due to more atoms.
baoilleach #ShefChem16 Using a coarse-grained model to handle this. Well-studied problem in chemical engineering so using info from that.
baoilleach #ShefChem16 Doesn't work so well for non-spherical solutes, e.g. long linear alkanes
WendyAnneW J phys chem b in press on nonpolar solvents #ShefChem16
baoilleach #ShefChem16 Prediction of solubility from mol simulation - try to do it from first principles - still a difficult problem
baoilleach #ShefChem16 Not yet accurate enough for practical applications. Need to move to higher levels of theory.
baoilleach #ShefChem16 Want a fast general method to predict a number of different outputs - should be fast enough for screening databases
baoilleach #ShefChem16 If we can compute the solvent density very quickly how can this be converted to useful descriptors?
baoilleach #ShefChem16 Showing the RISM functional from 0 -> 20 Ang. Integral is the hydration free energy. Shape is specific to the specific molecule.
baoilleach #ShefChem16 Put them into machine-learning algorithms, e.g. random forest. Mol Pharm 2015 Very accurate predictions
baoilleach #ShefChem16 One conformer used, but what about averaging over many? Shoutout for confab #openbabel
baoilleach #ShefChem16 Included SAMPL data + Mobley et al - the multiconf method works slightly better (or is it worse?)
baoilleach #ShefChem16 Caco-2 permeability - Mol Pharm 2015 - we do okay - as good or better than others
baoilleach #ShefChem16 Using 1D RISM - could move to 3D RISM - should improve things but slower
baoilleach #ShefChem16 Review article in Chem Rev just submitted
baoilleach #ShefChem16 Working on intrinsic aqueous solubility - need to get that right first. There's a systematic error - from the xtal term.
WendyAnneW #ShefChem16 David Palmer won a CSA Trust grant, I remember
baoilleach #ShefChem16 ....but very little data available on sublimation thermodynamics.
baoilleach #ShefChem16 The 1st solvation shell tends to be the best predictor in the models.
baoilleach #ShefChem16 Sereina Riniker Teaching distance geometry about exptal torsion prefs
baoilleach @pwk2013 #ShefChem16 Sorry - am dropping the ball on answering - will have to catch up later....
WendyAnneW #ShefChem16 Sereina Riniker on conformed generator
dgelemi #ShefChem16 next is Sereina Riniker on her work on 3D conformation with @dr_greg_landrum . Method already in @RDKit_org
baoilleach #ShefChem16 Conformer generation - many applications - want diverse structures - how to validate?
baoilleach #ShefChem16 Methods: Systematic search (ltd to few rotatable bonds) vs stochastic search (can handle highly flexible mols but may miss stuf)
baoilleach #ShefChem16 Knowledge based methods, distance-geometry based
baoilleach #ShefChem16 Distance geom - have a distance bounds matrix based on connectivity and try to embed into 3D Blaney/Dixon/RevCompChem1994
baoilleach #ShefChem16 How it's done. 1-2 distances, 1-3 distances, 1-4, other distances. #rdkit algorithm explained step by step.
baoilleach #ShefChem16 v. fast - only needs 2D structure. Disadvantage - can lead to "not so good" looking conformations
baoilleach #ShefChem16 Normally a forcefield opt is done. Our idea is to improve this by using exptal torsion prefs.
WendyAnneW #ShefChem16 distance geometry DG fast but embedding can lead to some not-pretty conformers
baoilleach #ShefChem16 Rarey published paper with 392 SMARTS with torsion prefs.
baoilleach #ShefChem16 In the end, we regenerated everything from the CSD data. Still not enough. Added "basic knowledge" terms for minimization step.
WendyAnneW #ShefChem16 JCIM 2015, 55, 2562 Riniker and landrum
baoilleach #ShefChem16 ETKDG: exptal torsions + knowledge distance geometry
WendyAnneW #ShefChem16 it is called ETKDG
baoilleach #ShefChem16 Showing examples of fitted torsion potentials, and comparing to Rarey paper
baoilleach #ShefChem16 Datasets: CSD dataset (Hawkins 2010) and PDB dataset, and some more from CSD
baoilleach #ShefChem16 Looked at RMSD to xtal structure for best conformer (symmetry corrected)
baoilleach #ShefChem16 Riniker and Landrum J Cheminf 2015
WendyAnneW ETKDG outperforms DG both by rmsd and tfd analysis #ShefChem16
baoilleach #ShefChem16 Within 1Ang can reproduce 80% of the CSD molecules - comparison to Confect
WendyAnneW #ShefChem16 also compared it with CONFECT
WendyAnneW #ShefChem16 the analysis is in the Riniker landrum paper in JCIM 2015
baoilleach #ShefChem16 Examples where basic DG beats ETKDG and v.v.
baoilleach #ShefChem16 ETKDG is 2-3 times slower than basic DG, but double the speed of DG+UFF opt,
WendyAnneW #ShefChem16 of course it takes 2 or 3 times as long to carry out compared with DG
baoilleach #ShefChem16 Further devel: chirality issues in DG fixed, improved planarity for aromatic rings and sp2-atoms
WendyAnneW #ShefChem16 since publication chirality issues fixed
baoilleach #ShefChem16 ETKDG v 2 : updated torsion SMARTS pattern Guba et al JCIM 2016, now 408 patterns, 364 in original
WendyAnneW #ShefChem16 JCIM 2016 56 1
baoilleach #ShefChem16 ETKDG v 3 : additional torsion patterns for aliphatic ring bonds, currently DG generated
WendyAnneW #ShefChem16 next version will include aliphatic rings torsion patterns
baoilleach #ShefChem16 (aside: nice to see academic developing software that becomes part of a toolkit...just like Confab!)
baoilleach #ShefChem16 Summary: DG+exptal torsion prefs is a good way to create conformers
baoilleach #ShefChem16 Torsion deviation fingerprint also used to assess performance
baoilleach #ShefChem16 Rarey makes the point that handling intermolecular hydrogen bonds might actually be a bad idea for biological uses
baoilleach #ShefChem16 David Schaller Target-pairs for synergistic bio activity
dr_greg_la @WendyAnneWarr But! Similar accuracy with 1/4 the number of conformers #ShefChem16
WendyAnneW #ShefChem16 David schaller trying too identify drug-target pairs
baoilleach #ShefChem16 For each target we set a different threshold automatically based on the data
WendyAnneW #ShefChem16 try again... Trying to identify drug-target pairs by systematic data mining
baoilleach #ShefChem16 Need to separate out data on agonist, antagonist
dgelemi #ShefChem16 next talk is David Schaller on finding multi targets that can be used for a disease. @ChEMBL mentioned in first few slides
baoilleach #ShefChem16 Working on obesity - causies diabetes, cardio diseases and cancer
baoilleach #ShefChem16 A set of different targets described: PPARa, LEPR, GHSR, 41 targets in total with act in ChEMBL
baoilleach #ShefChem16 In total had 57k activities - first challenge is to filter with a cutoff
dgelemi #ShefChem16 workflow in @knime around filtering chembl. Need to consider different activity threshold per target
WendyAnneW #ShefChem16 41 targets with activity data in chem bl. First challenge is activity filtering. Problem if you set a threshold.
baoilleach #ShefChem16 Set the threshold 3 log units higher than the most active and use that as a cutoff
WendyAnneW #ShefChem16 diff thresh
WendyAnneW #ShefChem16 diff threshold needed for each target...
baoilleach #ShefChem16 Sometimes this is too strict, or too high. So needed to set max and min values.
WendyAnneW #ShefChem16 so activity threshold borders done for 41 targets
baoilleach #ShefChem16 Less well discovered targets tended to have higher thresholds, of course.
WendyAnneW #ShefChem16 ziprasidone target families discu send
baoilleach #ShefChem16 Target families; merged all subtypes into one, e.g. all 5-HT, dopamine receptors.
WendyAnneW #ShefChem16 and now to chemical similarity
WendyAnneW #ShefChem16 21 targets 14 families and ?
baoilleach #ShefChem16 Down to 235 compounds now that show activities over different target families.
dgelemi #ShefChem16 keeping only if similarity between compounds is high, FCFP>0.7. Only 235 activities left
baoilleach #ShefChem16 Data divided into agonist/partial agonist vs antagonist/inv agonist/inhib - must be done manually - not in ChEMBL typically
WendyAnneW #ShefChem16 getting the activity type E.g. agonist is important and that is not in chem bl. So do manually
baoilleach #ShefChem16 Want to find synergistic effects, two targets with one compound
rguha a somewhat different usage of ‘synergy’. Is this actually looking for dual inhibitors? #ShefChem16 https://twitter.com/baoilleach/status/750330245912363009
WendyAnneW #ShefChem16 h3r and mchr1 receptors
baoilleach #ShefChem16 Need to hit two targets to avoid cancelling out inhibitory effect of receptor
baoilleach #ShefChem16 Manual validation - predictions - 3 hits < 1 micromolar
WendyAnneW #ShefChem16 known h3r antag plus some bought in similar cmpds were shape matched - work I n progress
baoilleach #ShefChem16 ..brain starting to drift from moorings...my fault, not speaker
WendyAnneW #ShefChem16 11 target pairs found inc one confirmed
baoilleach #ShefChem16 All done with Knime
baoilleach #ShefChem16 Summary: found potential target-pairs for appetite inhibition, and have found some potential hits
WendyAnneW #ShefChem16 I omitted some of the similarity matrices and manual validation. Hard to listen well and type well at the same time
georgeisyo Data mining for dual pharmacology using @ChEMBL data and @knime. Great logo too. #ShefChem16 pic.twitter.com/VU3kihCCBp
WendyAnneW #ShefChem16 wonder if there will be more of those yummy scones with jam and cream for tea.
jezwicker Mmm, donuts, best coffee break ever #ShefChem16 pic.twitter.com/FbMco6wmD8
baoilleach @dgelemi #ShefChem16 @rguha It was that slide with the picture of the axons.
WendyAnneW #ShefChem16 I preferred the scones to the doughnuts
baoilleach #ShefChem16 Sebastian Salentin PLIP Fully auto protein-ligand interaction profiler
WendyAnneW #ShefChem16 https://GitHub
baoilleach #ShefChem16 A tool to detect relevant non-covalent interactions from PDB structures
baoilleach #ShefChem16 Published in NAR web server issue
WendyAnneW #ShefChem16 oops! http://GitHub.com/ssalentin/plip 
baoilleach #ShefChem16 Overview: lists an overview
baoilleach #ShefChem16 ligand-binding depends on specific patterns of non-covalent interactions
baoilleach #ShefChem16 Diff types: polar interactions (H bonds, salt bridges, water bridges, halogen bonds), aromatic interactions (pi cation+stacking)
baoilleach #ShefChem16 People saw we don't need to consider halogen bonding and other less frequent ones but....
dgelemi #ShefChem16 multiple types of interactions and even if not some are not frequent, they can be important.
baoilleach #ShefChem16 ...these are sometimes v important for particular targets, e.g. ABC-transporter (pi catio interactions in several directions)
baoilleach #ShefChem16 Why useful to know? For viz, characterisation (e.g. design, docking/md - can postprocess), comparison (repositioning drugs)
WendyAnneW #ShefChem16 can be used in visualisation characterisation and comparison of interactions
baoilleach #ShefChem16 We are interested in drug repositioning - using microarray data, polypharmacology networks
baoilleach #ShefChem16 Looking at pairs of targets with similar bindings sites - look for similar interaction patterns for drug repositioning
WendyAnneW #ShefChem16 he uses plip in drug repositioning
baoilleach #ShefChem16 Web server and open source command line tol: fast, few seconds per structure, easy-to-use one-click, 8 interaction types
baoilleach #ShefChem16 ...soon, will have REST interface and protein-protein interactions
baoilleach #ShefChem16 Shows example image created automatically, which is publication-ready
WendyAnneW #ShefChem16 fast. No need to prepare pdb file. Easy to use. Publication ready images. 8 interaction types
WendyAnneW #ShefChem16 soon they will offer REST service and will handle protein protein interactions
baoilleach #ShefChem16 Example shown of DNA/RNA interaction diagram. Takes care of the assembly of the whole strand without any input from the user.
baoilleach #ShefChem16 Can provide PDB code, 3D viz in JSMOL, can download PyMol session file, all results also available as XML/RST (human readable)
dgelemi #ShefChem16 interface a bit simpler than CREDO (http://marid.bioc.cam.ac.uk/credo  ), and user can upload their own file
baoilleach #ShefChem16 Distance and angular constraints applied for different interactions to help identify interactions
baoilleach #ShefChem16 Can automate over large numbers of proteins and tune params, e.g. for low resolution
baoilleach #ShefChem16 Application: docking postprocessing
baoilleach #ShefChem16 Interaction profiles help identify false positives in docking - let's say you want halogen bonds in your ligand
WendyAnneW #ShefChem16 application. Helps identify false positives in docking. (Been published)
WendyAnneW #ShefChem16 another app is binding mode matrices - work in progress
baoilleach #ShefChem16 Working on tools to handle an ensemble of complexes - i.e. different ligands to same targets - and visualise diffs in bndngmodes
baoilleach #ShefChem16 Repositioning: search for similar interaction features + binding sites
dgelemi #ShefChem16 using interactions to predict new targets with similarity by interaction features
WendyAnneW #ShefChem16 target prediction PLoS one 2013 discovering similarity
ZINClick #ShefChem16 Just used PLIP on my phone, very easy and fast! (pdb id: 2gvj) @ssalentinpic.twitter.com/iYw9A8P9bW
baoilleach #ShefChem16 Virtual screening of 125K PDB complexes based on interaction profile (I think) - followed by docking of 200 candidates
WendyAnneW #ShefChem16 hsp27 pattern matching pipeline shown. 2 out of 6 hits validated
baoilleach #ShefChem16 ...docking into Hsp27 and TK - exptal validation - 2/6 validated hits in functional assay and cells
baoilleach #ShefChem16 Interaction fingerprinting: ligand-centric + protein-centric -> fusion concept (multiple proteins - multiple ligands)
WendyAnneW #ShefChem16 interaction fingerprinting is future work
baoilleach #ShefChem16 Plip web service URL too long to type - Google it
baoilleach #ShefChem16 Available on PyPi and Github Apache v2.0
baoilleach #ShefChem16 PLIP uses #openbabel (yay!)
WendyAnneW #ShefChem16 I have tweeted the URL
baoilleach #ShefChem16 Sergio Ruiz Dynamic undocking and the quasi-bound state as tools for drug design
baoilleach #ShefChem16 Barcelona group working on MD, chemoinformatics, and virtual screening, target validation, new chemical probes, non-standrd MMoA
WendyAnneW #ShefChem16 ruiz-carmona on dynamic undocking DUck and quasi-bound state WQB
baoilleach #ShefChem16 rDock is the docking program developed in the group
WendyAnneW #ShefChem16 rDock published in PLoS comput. Biol
baoilleach #ShefChem16 Speed vs accuracy tradeoff in molecular docking
baoilleach #ShefChem16 Goal to develop novel tools complementary to existing methods to improve their performance
baoilleach #ShefChem16 Quasi-bound state - what is it? Delta G binding - diff between bound+unbound.
WendyAnneW #ShefChem16 manuscript has been submitted on DUck and WQB
baoilleach #ShefChem16 Forget about energy of binding. Instead, diff ligands have diff E profiles.
dgelemi #ShefChem16 presentation work from Ruiz-Carmona has been submitted and under review.
baoilleach #ShefChem16 Hypothesis: active ligands have narrow delta G minimum, i.e. structurally stable. Test whether resistant to small perturbation.
baoilleach #ShefChem16 Quasi-bound: point along the dissociation cure where it has just broken the most important contacts
WendyAnneW #ShefChem16 h bond interaction potentials have deep and narrow minima
baoilleach #ShefChem16 H-bond interaction potentials are deep and narrow. Water-shielded H bonds present steep barriers (strongly resist breaking)
baoilleach #ShefChem16 Most proteins contain an essential H bond, fulfilled by all ligands, even the smallest ligands form at least one H-bond
WendyAnneW #ShefChem16 water shielded h bonds present steep barriers
baoilleach #ShefChem16 We think that stability comes from H bonds and we use them to calculate stability.
baoilleach #ShefChem16 Quasi bound: ...where the ligand has just broken the key essential (predefined) H bond
WendyAnneW #ShefChem16 most Prots contain an h bond. And one more reason...
baoilleach #ShefChem16 Dynamic undocking - DUck - particular implementation of steered MD - measure work to reach quasi-bound state
WendyAnneW #ShefChem16 DUck is implementation Of steered molecular dynamics SMD
WendyAnneW #ShefChem16 work to reach quasi-bound state = WQB
baoilleach #ShefChem16 When doing the MD, only keep a protein portion, local environment, explicit water solvation, equilibrate (1ns): 20min (GPU)
baoilleach #ShefChem16 SMD from 2.5A -> 5.0A in 0.5ns: 10min, work is the maximum along the curve
WendyAnneW #ShefChem16 DUck can discriminate between active and inactive cdk2 ligands
baoilleach #ShefChem16 Hmmm...not sure how they are placing the actives/inactives in the binding site?
baoilleach #ShefChem16 Shows a distribution of the work value is shifted to smaller values for the inactives vs actives
dgelemi @baoilleach #ShefChem16 yes it does. nice ROC scores with examples presented (0.85)
WendyAnneW #ShefChem16 AUC 0.85 for cdk2 or 0.87 for aa2r or 0.86 for another
WendyAnneW #ShefChem16 docking and DUck are complementary
baoilleach #ShefChem16 Docking and DUck are complementary - docking has *All* interactions, DUck focuses on a single interaction
baoilleach #ShefChem16 "Dock...then DUck!"
WendyAnneW #ShefChem16 dock then DUck !
baoilleach #ShefChem16 Comparing histograms of actives/inactives along both axes of a scatter plot of docking scores and work scores.
baoilleach #ShefChem16 (aside: this work reminds me of my work on receptor depth scaling in GOLD - i.e. upscore deep H bonds in docking)
WendyAnneW #ShefChem16 work with vernalis on prospective app to hsp90 virtual screen
baoilleach #ShefChem16 Prospective application to Hsp90 (seems to be a popular target at this meeting)
baoilleach #ShefChem16 8M->500 topscoring->139 candidates->28 strong Work -> tested 21 -> 8 active (38%) vs 4.4% from fragment screening
baoilleach #ShefChem16 Here's that URL for PLIP... https://twitter.com/ssalentin/status/750355387711365120
dgelemi #ShefChem16 on prospective work the undocking method is good at removing false positive from docking.
WendyAnneW #ShefChem16 strong binders -bought & tested 21 compounds. 8 active in fragment screens. Medium 15 bought 1 active. Weak of 11 none active
baoilleach #ShefChem16 Collab with Vernalis
WendyAnneW #ShefChem16 sorry Only 4.4% by fragment screening but 38% of strong etc. NOT 8 tested in fragment screens
WendyAnneW #ShefChem16 Agnes Meyder is last speaker of the day. From the Rarey group
baoilleach #ShefChem16 Agnes Meyder EDIA: Estimating electron density support for individual atoms in X-ray structures
baoilleach #ShefChem16 Works on rescoring Hyde results - wanted to improve quality of x ray structures in benchmarks
baoilleach #ShefChem16 Want to highlight well-supported versus unsupported atoms to visually analyse ligand, but also automatically score it
WendyAnneW #ShefChem16 rescoring with Hyde . Support prot structure with electron density. Identify hi quality data for benchmarks
baoilleach #ShefChem16 Various pose comparison methods: RMSE, IBAC, GARD, RSE, but what about exptal uncertainties?
baoilleach #ShefChem16 Comparing model and expt: RSR, RSRn, RSCC, RSZD - compare observed and calcd electron density
WendyAnneW #ShefChem16 shortcomings of current comparisons outlined. RSR, RSZD etc.
baoilleach #ShefChem16 RSR and friends bad (intuitive but several probs); RSZD good (statistically sound, reproducible, needs diff map)
WendyAnneW #ShefChem16 EDIA = estimating electron density support for individual atoms in X-Ray structures
baoilleach #ShefChem16 New scoring method! Compute score on 2f0-fc map; atomwise; score should be comparable between structs iwth diff resolutions
baoilleach #ShefChem16 Published in Nittinger et al 2015
baoilleach #ShefChem16 EDIA reports unexpectedly shaped electron density, overlapping electron d spheres between atoms, ranges from 0 -> 1.2
WendyAnneW #ShefChem16 EDIA initially reported in JCIM 2015, 15, page?
baoilleach #ShefChem16 Method: oversample electron d grid; assign electron d radius to each atom; more details...read the paper everyone
baoilleach #ShefChem16 Method now described in pictures, starting to sound like stock/stakeholder partitioning used for partial charges
baoilleach #ShefChem16 (or is it Mulliken charges I'm thinking of)
baoilleach #ShefChem16 Outer vs inner electron density sphere - I've missed out what this is I'm afraid
baoilleach #ShefChem16 Can identify unaccounted density, not enough density, and possible clashes
baoilleach #ShefChem16 Given the atomic EDIAs, we can work out the EDIA for a set of atoms - a sort of weighted sum that falls off quickly if unspprted
baoilleach #ShefChem16 Nice diagrams displaying different possibilities for EDIA for single atom - even I can understand this one
baoilleach #ShefChem16 EDIA stability analysis - EDIA>0.8 good, 0.4 medium and less than that it's bad (for resolution of up to 2 Ang)
baoilleach #ShefChem16 Poorer resolution is not suitable for EDIA analysis - electron d probably not Gaussian which is one of the assumptions
baoilleach #ShefChem16 Looked at correlation of EDIA with RSCC - dataset is combined Iridium HT - Astex set
baoilleach #ShefChem16 RSCC only checks for shape of electron d; EDIA checks for shape and density
baoilleach #ShefChem16 Applications: identify multiple confs, bogus single atoms (e.g. chlorine)
WendyAnneW #ShefChem16 apps. Can detect clashes
WendyAnneW #ShefChem16 example of too much density
baoilleach #ShefChem16 Showing some nice examples of PDB errors
WendyAnneW #ShefChem16 alternative conformation problem detection
WendyAnneW #ShefChem16 water versus chlorine issue
baoilleach #ShefChem16 Can screen the whole of the PDB, subset thereof with 2.0A rez
WendyAnneW #ShefChem16 now some comments on metals in pdb
WendyAnneW #ShefChem16 could also use EDIA to curate hi quality benchmark datasets
baoilleach #ShefChem16 Features: intuitive scoring scheme, integratd fault analysis, 10s per binding pocket, good for curation, analysis
WendyAnneW #ShefChem16 clearly an important paper. lots of questions even tho we are running late
WendyAnneW #ShefChem16 conference dinner in Sheffield cathedral tonight
nathanbroo Conference dinner in Sheffield Cathedral. ##ShefChem16 pic.twitter.com/PfQRxyE3Hr
WillPitt1 Professor Peter Willet and @WendyAnneWarr at #ShefChem16 dinner pic.twitter.com/mgEvpLl3lP
nathanbroo We're playing guess the molecule with Andrew Dalke's homemade index punch cards. #ShefChem16 pic.twitter.com/WSSmusLND7
nathanbroo Guess who is having this scurvy fighting cocktail? #ShefChem16 pic.twitter.com/GkM5xGl1Z6
nathanbroo Index cards for chemical structure lookup by Andrew Dalke. I am in awe!!! #ShefChem16 pic.twitter.com/eHA1GsHH2d
dgelemi #ShefChem16 and we're back for last days. 1st speaker is H. J. Patel from Oxford. Lots of logos, diamond, ucb, GSK, sgc, epsrc, mrc...
baoilleach #ShefChem16 Hannah Jemi Patel Novelty Score: Prioritising cmpds that are novel scaffolds or form novel PL interactions
baoilleach #ShefChem16 First talk ever at a conference - good luck!
baoilleach #ShefChem16 Drug discov process: subjective decision making - synthetic intuition, existing biases, experience
mvkrier #shefchem16 Starting today with @HannahJemi from @UniofOxford: which compound to make next? The Novelty Score
baoilleach #ShefChem16 Novelty score: primary goal of drug discov is identification of novel active cmpds
baoilleach #ShefChem16 Prioritise molecules that are most diff to known binders in terms of interactions with protein, placement in binding site, struc
baoilleach #ShefChem16 Algorithm: create three grids after aligning protein structures, 3D matched molecular pairs mentioned,..
nathanbroo Last conference breakfast. #ShefChem16 pic.twitter.com/tfPQiinyiM
baoilleach #ShefChem16 Protonate protein with PDB2PQR, calculate pharmacophoric features with #rdkit
baoilleach #ShefChem16 Create grid that contains counts of elements, also a grid of counts of pharmacophoric features, include xtal occupancy too
baoilleach #ShefChem16 Interaction defns taken from PLIP (see talk yesterday)
dgelemi #ShefChem16 PLIP for protein interaction presented yesterday is mentioned as the source of features
baoilleach #ShefChem16 Place candidate cmpds into binding site with 3D MMPs - after alignment to library, constrained conf generation with rdkit
nathanbroo Up now: Hannah Patel from Oxford Uni talking about prioritising molecules for protein-Logan interactions. #ShefChem16
baoilleach #ShefChem16 Score based on grid - something like how novel the structure is compared to the grids based on elements, pharm feats, interacts
baoilleach #ShefChem16 For interactions, also takes into account the direction vector rather than just counts
nathanbroo Now for some scaffold hopping: prioritise novelty in elements and making same interactions. #ShefChem16
baoilleach #ShefChem16 Interpretation: can prioritise scaffold hopping - want novel element score, but deprior novel interaction score
baoilleach #ShefChem16 ...or can prioritise novel interactions
nathanbroo Novelty: prioritise pharmacophore and interaction score. #ShefChem16
WendyAnneW #ShefChem16 Patel high element score low interaction score for scaffold hopping
mvkrier #shefchem16 CRANks is web-browser based
baoilleach #ShefChem16 Showing web server demo; spreadsheet of results, showing mols on left, scores and distributions to their right
WendyAnneW #ShefChem16 they are developing a way of visualising the grid
nathanbroo Live demo! Clear and interactive interface. #ShefChem16
jezwicker Great demo of Novelty Score by Hannah Patel from University of Oxford #ShefChem16
baoilleach #ShefChem16 Showing 3D structure view of aligned proteins and showing the grids. Can load a conformer in and see how overlaps.
dgelemi #ShefChem16 live demo of the web service currently in dvt. It's called CRANKs, interactions visible by lines or grid points with cpd or not
baoilleach #ShefChem16 Only in first year of PhD! A lot done...
WendyAnneW #ShefChem16 very preliminary testing with hiv 1 protease
baoilleach #ShefChem16 Preliminary test: example HIV-1 protease, lots of data available, high mutation rate so need to be careful to match seq identity
baoilleach #ShefChem16 Shows two hydrophobic regions with split in-between where the catalytic site is
nathanbroo #ShefChem16 comparison with other dissimilarity methods.
baoilleach #ShefChem16 Comparison to using fps for same purpose; novelty score can separate molecules used or not to create the grid but fps cannot
WendyAnneW #ShefChem16 she is testing grid on brd1
nathanbroo #ShefChem16 planning to test on BRD1 in future: testing ranking of novelty score for enrichment & relationship of novelty score & others.
baoilleach #ShefChem16 Future: testing on more proteins, can rank actives vs inactives? more comparisons to other methods, param tuning, handle water?
WendyAnneW #ShefChem16 in future will dev algorithm with pharmacophore features and waters
dgelemi #ShefChem16 future dvt will be to include the water. Hot topic at the moment
nathanbroo #ShefChem16 planning to include waters in future, particularly bridging waters.
baoilleach #ShefChem16 Collab with GSK, UCB, Diamond
dgelemi #ShefChem16 presentation with HIV protease leads to questions about it, old target > expert in the room (mutation, flexibility)
baoilleach #ShefChem16 Oliver Koch - Analysing the framework of PL interactions
baoilleach #ShefChem16 Overall topic is binding site comparison
mvkrier #shefchem16 Oliver Koch from @TU_Dortmund will talk about ligand sensing cores and privileged scores.
baoilleach #ShefChem16 Movie time - GSK3 & TryS have similar ligand sensing cores - pass the popcorn...
mvkrier #shefchem16 overall binding site comparison; example GSK3 & TryS
WendyAnneW #ShefChem16 Oliver Koch from Dortmund movie of gsk3 and TryS. Similar ligand sensing cores
baoilleach #ShefChem16 Secondary structural element arrangement are similar though overall binding modes are different
nathanbroo Up now: Oliver Koch from Dortmund. Identifying privileged scaffolds. Lots of scaffolds this morning. #ShefChem16
mvkrier #shefchem16 have a look at #pdb 1q3w and 2vpm
WendyAnneW #ShefChem16 in trypanosomatid diseases. Koch has chapter in vch 2013 book
baoilleach #ShefChem16 Both targets have ligands that have similar scaffold, but different side chains
WendyAnneW #ShefChem16 future med chem 2011 3 699 secondary protein structure
baoilleach #ShefChem16 Does nature reuse specific folding pattern to recreate particular binding site. Skolnick 2015
baoilleach #ShefChem16 Similar backbone - can lead similar but also dissimilar pockets
nathanbroo #ShefChem16 different view on molecular design today. This time metaphor is shape sorting toy.
WendyAnneW #ShefChem16 Skolnik bioorg med chem lett 2015 15:25(6) 1163 pocket space and shape
nathanbroo #ShefChem16 ligand sensing cores and molecular scaffolds: lots of possibilities of binding different proteins.
baoilleach #ShefChem16 Emphasising the interplay of both mol scaffold and protein structural elements
HannahJemi First talk at a conference complete! *phew* Thanks for having me! #ShefChem16 #relieved
mvkrier #shefchem16 nature likes symmetry; pattern recognition
dr_greg_la So great to have a mix of talks from people early in their careers and more experienced researchers. #ShefChem16
WendyAnneW #ShefChem16 it is not just like shape sorting there are "wheels within wh eels". Now use in Identification of new ligands
baoilleach #ShefChem16 Needed software to compare only secondary structural elements around binding site - no such software available- did it ourselves
baoilleach #ShefChem16 Scaffoldhunter to generate scaffolds from bioactivity data
baoilleach #ShefChem16 Combined with protein target info to find scaffolds annotated with target info
WendyAnneW #ShefChem16 similar scaffold, different targets. Found a novel ligand sensing core
baoilleach #ShefChem16 Applying to find new ligands: wanted new inhibs for BRD4
dgelemi #ShefChem16 goal of O. Koch is to find privileged scaffolds where the selectivity is found by decorations, using a rationale approach.
mvkrier to get here http://scaffoldhunter.sourceforge.net/  #scaffolds #ShefChem16 https://twitter.com/baoilleach/status/750608804514062337
nathanbroo #ShefChem16 using ScaffoldHunter tool to identify new & interesting scaffolds: http://scaffoldhunter.sourceforge.net pic.twitter.com/QFhfAjRy2Y
dgelemi #ShefChem16 on privileged scaffold, a review here http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2908274/ , and examples in tables 1 and 4
baoilleach #ShefChem16 4 proteins with ligand info; sim search of screening library and tested; some hits (0, 1, 3, 11% hit rate)
WendyAnneW #ShefChem16 brd4 similar to 4 other proteins. Get their 4 ligands. To 4 libraries. Test vs brd4. Hits confirmd for prot a and b, and best c
WendyAnneW #ShefChem16 so new brd4 inhibitors found
nathanbroo #ShefChem16 new BRD4 inhibitors identified & validated with thermal shift and crystallography.
baoilleach #ShefChem16 Also did docking: hit rate 5%, but hits from ligand sensing core approach were almost completely different
WendyAnneW Now Docking based vs tried as well. Only 2 cmpds similar from the two approaches #ShefChem16
baoilleach #ShefChem16 Runtimes are fast: compare 3 pockets to 3000 takes 11s, 39k x 39k is 1.5d
baoilleach #ShefChem16 How to do pocket detection? ligand-based or automated?
WendyAnneW #ShefChem16 automated pocket selection methods not robust
dgelemi #ShefChem16 the hypothesis: similar scaffold of binding sites bind to same compound scaffold can be validated
baoilleach #ShefChem16 Ligsite - automated pocket detection not so good for MAO example
WendyAnneW #ShefChem16 the pocket detection method used can have major effect on your results
nathanbroo #ShefChem16 automated pocket detection method: Ligsite.
baoilleach #ShefChem16 ...but method so fast that no need to rely on automated pocket detection - just compare whole protein if necessary
WendyAnneW #ShefChem16 he does not have to rely on auto pocket detection
baoilleach #ShefChem16 Enzyme classification can be done based on ligand-sensing cores - shows network with different regions for different classes
nathanbroo #ShefChem16 looks like @Gephi used for network visualisation but not specified.
WendyAnneW #ShefChem16 all against all comparison of pdb. Had to use auto pocket detection in this case
WendyAnneW #ShefChem16 comparison took about 1 month on up to date workstation
baoilleach #ShefChem16 Can do all-against-all on PDB (1 month of time) - comparing scores to GRAlign and Usearch
WendyAnneW #ShefChem16 interesting pocket pairs being analysed now
baoilleach #ShefChem16 Showing interesting pairs of proteins found as having similar pockets from the PDB
WendyAnneW #ShefChem16 TryS versus pdb atp subset work in progress
baoilleach #ShefChem16 Developing ligands for kinase based on tubulin...synthetase (TryS)
WendyAnneW #ShefChem16 use in finding selective ligands
baoilleach #ShefChem16 LOOKING FOR A POST-DOC
baoilleach #ShefChem16 Everyone invited to Fulda for GCC
WendyAnneW #ShefChem16 he is looking for a postdoc
deniseOme @WendyAnneWarr I wonder if these 4 proteins correspond to the ones displayed in @targetvalidate? #ShefChem16 pic.twitter.com/PPYS6gOZ75
baoilleach #ShefChem16 "Really embarrassing" exit of people who haven't checked out yet - tut tut
WendyAnneW #ShefChem16 Lena Kalinowsky diverse test set for validating scoring function
baoilleach #ShefChem16 Lena Kalinowsky: Diverse test set for validation of scoring fns based on MMPs
WendyAnneW #ShefChem16 ...based on matched molecular pairs
baoilleach #ShefChem16 Many studies evaluating scoring fns for pose pred, ranking, and affinity pred
baoilleach #ShefChem16 Docking problem solved in the case of rigid receptors, scoring problem not yet solved though
baoilleach #ShefChem16 Most scoring fns correlate more with mol size than with binding affinity - want to address this issue
WendyAnneW #ShefChem16 JCIM 54, 1717. Wang in JMC. Kitchen in nature... All have compared scoring functions. Scores match mol size better than affinit
baoilleach #ShefChem16 Novel validation dataset based on MMPs
baoilleach #ShefChem16 MMPs differ by one well defined transformation; e.g. H -> CH3, with corresponding effect on Ki
baoilleach #ShefChem16 (Nice idea)
baoilleach #ShefChem16 Starting pt was PDBbind - broad collection of data for many targets
WendyAnneW Dataset of MMPs made from PDB bind #ShefChem16
baoilleach #ShefChem16 We used the core set of PDBBind - 195 diverse structures and binding data
baoilleach #ShefChem16 Assigned PDB IDs to UniProt codes (oh oh I don't know bioinformatics..)
WendyAnneW Assign to uniprot iD and cluster #ShefChem16
baoilleach #ShefChem16 Clustered based on sequence alignment (I think...)
WendyAnneW #ShefChem16 sequence ID better than 90% used
baoilleach #ShefChem16 Overall 2137 protein-ligand complexes were present
baoilleach #ShefChem16 MMPs generated by Knime Erl Wood nodes
baoilleach #ShefChem16 Sequence alignment within one MMP (seq id >= 90%)...more details..leading to 10200 3D MMPs
WendyAnneW #ShefChem16 thus 2127 prot ligand pairs. MMPs generated. Seq. align. Verify orientation and loc of ligands
WendyAnneW #ShefChem16 restrictions on MMPs explained
baoilleach #ShefChem16 Some restricitions: cyclic substituents were restricted to max 9 nonH atoms, others to 5 nonH atoms, sub should be max 50%of mol
baoilleach #ShefChem16 IC50 only accepted if from same publication --> now only 1224 MMPs diverse in terms of transformation
baoilleach #ShefChem16 Now want diversity in terms of the targets and the bindings affinity effect
baoilleach #ShefChem16 This left 99 3D MMPs, arranged into 33 target clusters
nathanbroo #ShefChem16 interesting talk on 3D Matched Molecular Pairs leading to 99 in test set for validating scoring functions.
WendyAnneW #ShefChem16 now 1224 pairs left. Now apply diversity in terms of target and binding affinity. Now only 99 MMPs in 33 targt clusters left
baoilleach #ShefChem16 Can the scoring fns reproduce the affinity differences between the MMPs?
WendyAnneW #ShefChem16 receptor preparation without water
baoilleach #ShefChem16 Scoring fns: FF-based (GBVI/WSA, GoldScore, AutoDock 4.2)
baoilleach #ShefChem16 Knowledge-based scoring fns: ASP, AutoDock Vina, DSX
baoilleach #ShefChem16 Empirical: MOE London dG, X-Score, ChemPLP, ChemScore
WendyAnneW #ShefChem16 there are force field, knowledge and empirical based scoring functions
nathanbroo #ShefChem16 investigating also the effects of modelling with and without waters to see what effects they may have on scoring.
baoilleach #ShefChem16 No docking - re-scoring was used
baoilleach #ShefChem16 Transformation effect: Score A - Score B = result
nathanbroo #ShefChem16 no docking conducted, only rescoring. Wonder if this has an effect on the score if it's not used in pose generation.
WendyAnneW #ShefChem16 distributions in the dates
baoilleach #ShefChem16 Seems to have matched series of length 3 for each target if I understood correctly
baoilleach #ShefChem16 Showing pie charts to give idea of the diversity - maybe a histogram better?
WendyAnneW #ShefChem16 distributions of affinity, transformation effect, and transformations in set shown.
baoilleach #ShefChem16 Showing Pearson correlations - not sure of what exactly - missed that...
baoilleach #ShefChem16 All correlations look very low, best is R of 0.36, rule of thumb is that R of 0.8 for pred of binding affinity
WendyAnneW #ShefChem16 scoring functions and Pearson coefficient tabulated (with and without water). Best were X-score and consensus Not up to 80% tho
dgelemi #ShefChem16 correlation between scoring function and delta activity. With Pearson correlation, no correlation, best R score is 0.36
baoilleach #ShefChem16 Do I gain or lose affinity, rather than predict exact value? How often the scoring fn predicted the correct trend...
WendyAnneW #ShefChem16 trend of transformation effect on affinity. X-score and consensus good but only About 60%
baoilleach #ShefChem16 Best is now 69% of the time. X-Score. ASP also good.
dgelemi #ShefChem16 however trends (improve/decrease) are better for the scoring functions. (Pearson correlation not appropriate)
baoilleach #ShefChem16 Overall 61% of the 3D-MMPs had increased affinity if increased size of the R group
WendyAnneW #ShefChem16 correlation between mol size and score in 78% of cases
WendyAnneW #ShefChem16 some transformations where function fails to predict Correct effect
baoilleach #ShefChem16 Showing example of increased affinity from H->CH3; only corrected prediction if water molecules considered
baoilleach #ShefChem16 A pyrazole->benzene example
WendyAnneW #ShefChem16 only asp works well here
WendyAnneW #ShefChem16 worse results usually if you consider waters
nathanbroo #ShefChem16 plan is to release set of 3D-MMPs to the community for evaluation studies.
baoilleach #ShefChem16 Dataset of diverse 99 MMPs - considering water mols leads in general to *worse* results - dataset will be made available
DrJoshuaBo @HannahJemi Congrats, great talk. A bit of a baptism of fire! #ShefChem16
dgelemi #ShefChem16 current validation of the test set with external collaborator. Then set will be published, however can contact to access now
nathanbroo #ShefChem16 looks like scoring functions are just no good at predicting activity. Waters make it worse. Massively significant, if true!
baoilleach #ShefChem16 Advert for Big Data MGMS 1-day meeting at UCL early Sept @mgmsupdates
baoilleach #ShefChem16 Bob Clark Estimating the uncertainty of ensemble regression model predictions
WendyAnneW #ShefChem16 last session is the qsar one. First up Bob Clark estimating uncertainty of ensemble regression model predictions
baoilleach #ShefChem16 "Lots of statistics but we can't escape"
nathanbroo Up next: my old friend Bob Clark from @SimulationsPlus talking about estimating uncertainty in model predictions. #ShefChem16
baoilleach #ShefChem16 Med chemists should care about uncertainty because it's hard to balance lost opps versus risk of wasted effort
nathanbroo Uncertainty important for chemistry, biology, management and regulators. #ShefChem16
mvkrier #shefchem16 Robert (Bob) D. Clark from @SimulationsPlus warns us that his talk will include a lot of #statistics
baoilleach #ShefChem16 Biologists should care because in silico replications can help prioritise when they need to redo assays
WendyAnneW #ShefChem16 medchemists biologists managers regulators should all care about uncertainty
baoilleach #ShefChem16 Management should care because of costs (many types thereof)
baoilleach #ShefChem16 Regulators need to know whether to trust QSAR/QSPR results
baoilleach #ShefChem16 Pooling errors assumes that all preds are equally uncertain, but in fact ensemble analysis can pinpoint which are more trusty
WendyAnneW Redistribute errors based on Gamma analysis of ENSEMBLE model #ShefChem16
baoilleach #ShefChem16 For this, we need to do dataset partitioning - not like train/test/verify
baoilleach #ShefChem16 We need more data - because we need to sample and see the spread of data - cut out 10->20% of the data for building model, ...
InfoSchool The final day of #ShefChem16 is in full swing! pic.twitter.com/MoKmbqM9gW
baoilleach #ShefChem16 ..the rest for uncertainty estimation
baoilleach #ShefChem16 Ensembles of neural networks - each network sees the same test set an val set but sees a diff scrambling of the training pool
WendyAnneW #ShefChem16 80 to 90% of your data set must be in validation set. Neural networks in the ensemble
baoilleach #ShefChem16 To estimate uncertainty, look at to what extent different networks in the ensemble agree
WendyAnneW #ShefChem16 how much agreement among all the networks in the ensemble used to estimate uncertainty
baoilleach #ShefChem16 Try to avoid the "shadow" of the training set on the test set
baoilleach #ShefChem16 What would independ binomial errors look like?
WendyAnneW #ShefChem16 looking for logp greater or less than 2
baoilleach #ShefChem16 Look at tally of positive votes (from multiple networks) and estimating particular error rates from sensitivity/specificity
baoilleach #ShefChem16 GIven the no of positive votes, what is the chance that it is correct?
WendyAnneW #ShefChem16 curves for tally of positive votes shown. The models are NOT independent
baoilleach #ShefChem16 Actual curves look quite different than ideal curves, but not completely hopeless. The models are not completely independ.
nathanbroo #ShefChem16 beta binomials: a set of conjugate binomials that mimics cases where trials are not independent.
baoilleach #ShefChem16 Beta binomial distribution - a conjugate dist - a set of binomials. This mimics cases where trials are not indep. Better agrees.
WendyAnneW #ShefChem16 beta binomial distributions are useful tools for uncertainty analysis
baoilleach #ShefChem16 The better your model, the worse the sampling in the centre - where 50% of the networks disagree.
WendyAnneW #ShefChem16 it really does work see j cheminf 2014 6 34
baoilleach #ShefChem16 Showing validation set results. Fits well the curve from the training data. Clark J Cheminf. 2014
baoilleach #ShefChem16 Given the no of positive votes for a particular obs, this is the confidence/uncertainty
WendyAnneW #ShefChem16 people like to think of confidence rather than uncertainty
baoilleach #ShefChem16 Kolmogorov-Smirnov statistic - hang on, I think I've used this one
baoilleach #ShefChem16 ...used to assess fit via cumulative distributions
WendyAnneW #ShefChem16 kolmogorov-Smirnov statistic...
nathanbroo #ShefChem16 here is Bob's paper on beta binomials: https://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-34 #OpenAccesspic.twitter.com/Fzn30Kir9q
HannahJemi Dinner at Sheffield Cathedral last night #ShefChem16 @ Sheffield… https://www.instagram.com/p/BHhDbl1AEBNgtBE6XeSGZMhjg9MNoMIDuJEjAo0/
baoilleach #ShefChem16 Solubility prediction results shown
WendyAnneW #ShefChem16 solubility regression model q-squared train set .86. And .81 Validation set
baoilleach #ShefChem16 Errors should follow chi-squared if identically & independently distributed (IID)
baoilleach #ShefChem16 qq plot - what value do you see vs what value you expect if normally distributed
WendyAnneW #ShefChem16 normalised squared errors should follow Chi-squared
baoilleach #ShefChem16 qq plot shows points where the stddev is underestimated versus overestimated
baoilleach #ShefChem16 Conclusion: sigma and s? different for different cmpds - gamma distribs to the rescue...
baoilleach #ShefChem16 ...once it is applied then the qq plot shows the points directly along the 45 degree line
WendyAnneW #ShefChem16 gamma distributions to the rescue - math skimmed over. UnpUblished work
baoilleach #ShefChem16 Explains how the gamma distribution is worked out - lots of details - manuscript in preparation so stay tuned
baoilleach #ShefChem16 The validation set plot allows you to make inferences about which preds are good/bad/ugly; also help discard dubious values;...
nathanbroo #ShefChem16 "If you lie to a model often enough, it will begin to agree with you." - Bob Clark
baoilleach #ShefChem16 ...also, it highlights where all the networks get it wrong. "if you lie to the model enough times it will get it wrong"
baoilleach #ShefChem16 Some binning in diagrams of results. Makes me feel uneasy. But now shows full dataset - just too many to show on one graph.
WendyAnneW #ShefChem16 plot scaled squared errors . 84% of points are greater than or equal to 2. We are now getting the numbers right in logS predict
baoilleach #ShefChem16 Looking at errors: detergents at top end, flame retardants at bottom end
baoilleach #ShefChem16 Toluene sulfonic acid is super soluble - more like water dissolved in it, than v.v.
WendyAnneW #ShefChem16 a few examples with excessive chi-squared. All pull water into a soggy crystal and are underestimated
baoilleach #ShefChem16 Ryo Yoshida: A Bayesian algorithm for finding novel small organic molecules
WendyAnneW #ShefChem16 Ryo Yoshdia Bayesian algorithm for finding novel small organic mols
nathanbroo Up next: Ryo Yoshdia on Bayesian modelling for finding novel small molecules. Self-defined newcomer to the field. Welcome! #ShefChem16
baoilleach #ShefChem16 Goal to create novel promising hypothetical mols that exhibit desired props under multi-objective design
dgelemi #ShefChem16 using the method in different fields in the recent years. From drug design to organic material and now inorganic
nathanbroo #ShefChem16 transforming forward models into the backward prediction.
WendyAnneW #ShefChem16 sorry - his name is Yoshida. It was wrong on his slide!
dgelemi #ShefChem16 Bayesian equation presented!
baoilleach #ShefChem16 Bayesian analysis: explains the eqn - likelihood of observation given posterior distribution
mvkrier #ShefChem16 Ryo Yoshida (cannot decide on the twitter account to pick) from @tousuuken explores the chemical space by means of Bayesian Algo
baoilleach #ShefChem16 Refers to training as forward prediction, and application of model to unknowns as backward prediction (I think)
WendyAnneW #ShefChem16 fingerprint descriptors in forward predictn machine learning. Then backward prediction to get novel mols.
baoilleach #ShefChem16 Working on metials, e.g. polymers (thermal conductivity, glass transition temp), solar cells, Li batteries and drugs
baoilleach #ShefChem16 Data science can be an alternative to QM calcs
WendyAnneW #ShefChem16 model stitching method. Ultra fast property prediction
baoilleach #ShefChem16 Data production from QM -> Structure/Prop data -> Machine learning
baoilleach #ShefChem16 About 10^8 small mols from PubChem, fully optimized with QM, ...
baoilleach #ShefChem16 Early results on predicting HOMO-LUMO gap: deep learning (h2o), model stitching based on #CDK fingerprints
WendyAnneW #ShefChem16 deep learning and model stitching compared for homo lumo gap
dgelemi #ShefChem16 Deep learning! But not fully optimised. Model stitching is doing better (now explanation on the difference between ensemble)
baoilleach #ShefChem16 Model stitching is about using different models for different areas of activity space (my words - I'm not familiar with this)
WendyAnneW #ShefChem16 loss of info when fp made. So use ensemble learning: model stitching
mvkrier #shefchem16 @ProfvLilienfeld would enjoy Yoshidasan's talk a lot..
nathanbroo #ShefChem16 looks like generating ensembles of local models that are predictive locally.
baoilleach #ShefChem16 ...sorry, different models for diff areas of structural space. Use the most accurate model for a particular structure.
baoilleach #ShefChem16 One dataset for training. Another for estimating the accuracy fn (weighted voting).
baoilleach #ShefChem16 Kolekota-Roth ... a type of fp?
dgelemi #ShefChem16 also use different descriptors as well as different models https://twitter.com/baoilleach/status/750642202091593728
baoilleach @pwk2013 #ShefChem16 very possibly - wasn't quite clear on the why
baoilleach #ShefChem16 Backward prediction is about randomly sampling structures that fit to the posterior distribs
baoilleach #ShefChem16 Graph mutation - restricting geometry, chemically-unfavourable bonds, and more...
baoilleach #ShefChem16 Done using Chemical Language Model. Structure generation using NLP techniques to mutate SMILES string. Brackets must be matched.
dgelemi #ShefChem16 natural language processing used to learn from smiles string presented. Chemical language model
baoilleach #ShefChem16 More than 70% of generated molecules matched signif to ones in PubChem.
baoilleach #ShefChem16 (I love the idea of randomly generating molecules - this is a novel way of doing it.)
WendyAnneW #ShefChem16 paper by the student. hisaki ikebata is being reviewed. We have been asked not to photograph unpublished results
baoilleach #ShefChem16 For inverse molecular design, searching for molecules with particular H-L gap and internal energy
WendyAnneW #ShefChem16 software iqspr is at the CRAN repository
baoilleach #ShefChem16 (aside: myself and @ghutchis have a paper on this particular topic too - we used genetic algorithm working on monomers)
the_cdk Lots of CDK depictions used by Yoshida, some from the (nice) new but also the old renderings. #ShefChem16
http://cdkdepict-openchem.rhcloud.com/ 
baoilleach #ShefChem16 Shows scatterplot of HL gap vs E - very few in desired prop region - (wondering is the search completely random or is it directd
WendyAnneW @dgelemi #ShefChem16 thanks for the CRAN url
nathanbroo #ShefChem16 @baoilleach did you see this earlier this year from @Piman314 -https://medium.com/@nf508/de-novo-design-without-the-chemistry-d183e8a9f150#.l5z8zkpbq
baoilleach #ShefChem16 A problem is the poor prediction of trained forward models at far tails of the distrib of existing molecules
baoilleach #ShefChem16 ...but goal of mol design is to go where no-one has gone before - i.e. there won't be molecules available there to train on
baoilleach #ShefChem16 ...so we put the generated data back into the model, and then do the predictions again - hopefully will improve things
baoilleach #ShefChem16 Collabs with Mitsubishi-Tanabe, Harvard CEP, among others
GJPvWesten @baoilleach I like the star trek reference.. :) #ShefChem16
dgelemi #ShefChem16 questions get lost in translation between German English to Japanese speaker. #speakslow #clarity
WendyAnneW #ShefChem16 van westen on which method to use for modelling public data
baoilleach #ShefChem16 Gerard van Westen Target prediction, QSAR, PCM, Deep learning - reliable cross target modeling of ChEMBL?
baoilleach #ShefChem16 Based in Leiden, working closely with expt alists
dgelemi #ShefChem16 reference to aeronautic engineering as 1st slide from @GJPvWesten
nathanbroo #ShefChem16 up now: @GJPvWesten from Leiden LACDR talking about target prediction, QSAR, PCM & deep learning.
baoilleach #ShefChem16 Predictions from models not found to be that reliable
baoilleach #ShefChem16 Try ensembles - major vote - still had trouble
WendyAnneW #ShefChem16 he wants more reliable target predictions even using networks
baoilleach #ShefChem16 What we did was, to go back to ChEMBL, and investigate it more
nathanbroo #ShefChem16 target prediction from models did not work well enough to use. Back to @ChEMBL…
baoilleach #ShefChem16 ChEMBL size displayed - growing quickly - all good...but not perfect
baoilleach #ShefChem16 On average each molecule tested on 1.5 protein targets. Sparse. Want to predict for other targets not present.
WendyAnneW #ShefChem16 looked at ChEMBL. Amazing resource...but sparse matrix and there is huge matrix of drug target pairs
WendyAnneW #ShefChem16 high confidence part of ChEMBL very very much smaller
baoilleach #ShefChem16 When you limit to high quality pChEMBL values, there are only about 2-3% of the activity values retained
baoilleach #ShefChem16 One of the problems is the 10 micromolar activity cutoff. Then 90% of the data points are active.
nathanbroo #ShefChem16 a certain @georgeisyourman did *not* approve this message! @ChEMBL data difficult to model.
WendyAnneW #ShefChem16 Jenkins et al on target fishing. Van westen could not get it to work. Threshold cutoff 10 micro mol not ideal
baoilleach #ShefChem16 So how to model it. XKCD cartoon on standards shown.
WendyAnneW #ShefChem16 he decided to use one dataset with 14 methods.
WendyAnneW #ShefChem16 used 300 nm threshold
baoilleach #ShefChem16 What did we do differently? Use ~300nM as activity threshold. Biologically more relevant - better sep between acts and inacts
baoilleach #ShefChem16 Several ways to model a single dataset: binary class QSAR, multiclass, PCM (protochemometrics), multiclass DNN (deep learnin')
WendyAnneW #ShefChem16 binary qsar, multi class qsar, PCM, multi class DNN compared
baoilleach #ShefChem16 Python/Lasagne + #RDKIT, R + Pipeline Pilot
baoilleach #ShefChem16 All scripts made availabale
WendyAnneW #ShefChem16 hi qual dataset is available
baoilleach #ShefChem16 High quality data from ChEMBL: has pCHEMBL, no data validity comment, no literature dups
nathanbroo #ShefChem16 @xkcdComic used to illustrate growing standards: https://xkcd.com/927/ pic.twitter.com/ZBmx9L3ytK
baoilleach #ShefChem16 Explains how the Bayes classifiers were trained
WendyAnneW #ShefChem16 PLoS one 2015, 10.3: e0121492. PLoS comput biol 2013, 9:10 pp?
egonwillig this is the first time I see someone compare stat modelling with standards... not intuitive. Why??? #ShefChem16 https://twitter.com/nathanbroon/status/750649600072187904
baoilleach #ShefChem16 Who has already used DNNs? Only the last speaker puts up his hand.
WendyAnneW #ShefChem16 deep neural networks DNN explained
WendyAnneW #ShefChem16 DNN implemented in lasagne
baoilleach #ShefChem16 Feed forward NN; used dropout (25%) to prevent overtraining; 3 layers; no. of output nodes equal to no. of targets
nathanbroo #ShefChem16 deep learning neural networks explained… I should get into this. Impossible to interpret or interrogate models I'm guessing.
egonwillig I like that XKCD comic, but I don't think it has anything to do with selecting modeling methods... there're no standards 4 that #ShefChem16
baoilleach #ShefChem16 Validation: usually with random split of data; we use temporal split of the data; predict the future (I did this too!)
WendyAnneW #ShefChem16 danger in 70 30 Data set split cos medchemists make series. temporal split of the data is more challenging
WendyAnneW #ShefChem16 mcc mathews correlation coefficient
baoilleach #ShefChem16 Compares random splitting and temporal validation; with temporal validation performance goes way down
WendyAnneW #ShefChem16 temporal - performance goes down a lot. Best
baoilleach #ShefChem16 Reason is that med chemists work in series at a time, so random splits puts some of the series in training and some in the test
WendyAnneW #ShefChem16 oops best was mc DNN?
baoilleach #ShefChem16 Compares run times; DNN fast if use GPUs
WendyAnneW #ShefChem16 z-score also used. He likes the DNNs
baoilleach #ShefChem16 In a fair comparison DNN outperform the other methods
nathanbroo #ShefChem16 Bayesian models comparable to deep learning neural nets. Bayesian more interpretable? DLNN can improve with complexity?
baoilleach #ShefChem16 Did a param sweep of 63 DNNs models - took only 1 week between idea and results
WendyAnneW Proteochemometric DNNs = PCM DNN #ShefChem16
baoilleach #ShefChem16 Best are deep and wide nets, large pooled datasets and increased descriptors - ensembles best of all
baoilleach #ShefChem16 Future work: more scope to investigate DNNs, and do expts - "can tell experimentalists - we can help you now, trust me again"
WendyAnneW #ShefChem16 no single method is best. Diff methods good in diff cases. He likes DNN. PCM DNN better but slower. PCM RF bad
rguha @dgelemi @WendyAnneWarr first R tool at #shefchem16?
baoilleach #ShefChem16 Jameed Hussain: MOEsaic - making SAR analysis easier thru use of MMPs
nathanbroo #ShefChem16 next up: Jameed Hussein @CCG_MOE talking about #MOEsaic their new tool for easier SAR analysis using Matched Molecular Pairs.
WendyAnneW #ShefChem16 jameed Hussain of CCG. Making sar analysis easier thru use of matched mol pairs (MMPs). MOEsaic
rguha @baoilleach klekota roth (structural keys) #shefchem16
baoilleach #ShefChem16 Many tools available, but few are integrated environments - chemists frustrated as need to know to use many tools
dgelemi #ShefChem16 no mention of datawarrior as visualisation tool next to spotfire! If you don't know it, get it http://www.openmolecules.org/datawarrior/ 
baoilleach #ShefChem16 Interactive viz of SAR - can be a challenge - time consuming - many assays, many chemical series
WendyAnneW #ShefChem16 moesaic is web-based app for sar analysis and compound design
baoilleach #ShefChem16 MOEsaic - a web-based app for SAR analysis and cmpd design
nathanbroo #ShefChem16 Spotfire & Vortex mentioned. Don't forget the free DataWarrior for #dataviz http://www.openmolecules.org/datawarrior/ pic.twitter.com/FmZD0pQKY5
baoilleach #ShefChem16 Explore effects of structural change, investigate if a trend is general or scaffold-dependent
WendyAnneW #ShefChem16 addresses typical medchem workflows
baoilleach #ShefChem16 MedChem cycle: Have data, look at it, gen hypothesis, use to design new cmpds, come together and discuss, make them, test, rpeat
baoilleach #ShefChem16 Software built to match this process
WendyAnneW #ShefChem16 browse design and document
baoilleach #ShefChem16 Browse, design and document workflow
baoilleach #ShefChem16 Example of COX-2 inhibitor: what is the effect of making a particular change? Shows a graph comparing activities of MMPs
WendyAnneW #ShefChem16 what is effect of change of funct Group on activity etc. See the scaffolds and the 3D ligand-prot model
nathanbroo #ShefChem16 design-make-test cycle in medicinal chemistry explained. Here's my take on it. pic.twitter.com/qWXiPqEnGI
WendyAnneW #ShefChem16 compare the props , the structures , the MMPs all on one screen
baoilleach #ShefChem16 Refs @DrBostrom paper on oxadiazoles as example of MMPs
rguha Did the DNN use précomputed descs? Or did they learn a representation? Former may explain similar perf #shefchem16 https://twitter.com/nathanbroon/status/750651707844165632
baoilleach #ShefChem16 Other examples: BioDig (GSK), WizePairZ (AZ), FindPairs (Pfizer)
baoilleach #ShefChem16 Challenge to present MMPA to make it easy enough to use for med chemists
baoilleach #ShefChem16 "MMP Explore" - a mode in the software that highlight parts of the molecule where MMPs are availabale to explore
baoilleach #ShefChem16 Click on the highlighted bond to see the corresponding MMPs. Background colours used to indicate property value.
nathanbroo #ShefChem16 nice shiny new tool from @CCG_MOE. Can't wait to give it a try with some of our internal data.
dgelemi #ShefChem16 will @nmsoftware be mentioned during the talk...
baoilleach #ShefChem16 Can also look at changes in linker sizes (from double-cuts I presume) by clicking on two bonds
WendyAnneW #ShefChem16 forgive me saying so, but this is a bit of a product review
baoilleach #ShefChem16 Following the network of MMPs from one to another - a bit like the earlier talks on SmallWorld and Astex Fragment Network
baoilleach #ShefChem16 Can map different properties to y axis to see whether the SAR is transferable, and for what scaffolds
WendyAnneW #ShefChem16 R-group profile plots shown on busy slide
baoilleach #ShefChem16 Can sort the molecules along x-axis to see whether particular properties highlight a trend with respect to MMPs
baoilleach #ShefChem16 Can sort R groups by chemical similarity - often groups similarly behaving groups together
baoilleach #ShefChem16 Explains non-transferability for particular MMP in terms of steric hindrance
dgelemi #ShefChem16 smooth and lots of visualisation. But still requires a med-chemist to sit and click on what he/she wants to see.
baoilleach #ShefChem16 Design module: don't want to break the flow between analysis and design - avoid "speed bumps"
baoilleach #ShefChem16 Can modify structures and profile on-the-fly using visual models
baoilleach #ShefChem16 When saving a novel cmpd design, can add comments on why you think good, and you don't need to go out to another tool
baoilleach #ShefChem16 Document model - describe the workflow of your analysis and capture any insights - add notes and jot ideas
WendyAnneW #ShefChem16 pic.twitter.com/HzjFmbTgQM
baoilleach #ShefChem16 Summary: built to address typical workflows of med chemists and keep track of insights, mirrors how SAR is generated
baoilleach #ShefChem16 Particular suitable for series which are not amenable to R-groups, especially at early stage
WendyAnneW #ShefChem16 pic.twitter.com/ayFdnmfo2D
rguha @DrJoshuaBox @nathanbroon @GJPvWesten so learned directly from molecular graph ?(e.g. neural fps https://arxiv.org/abs/1509.09292 ) #ShefChem16
baoilleach #ShefChem16 Based on ideas/experience from 12 years of GSK
baoilleach #ShefChem16 The end
WendyAnneW @CCG_MOE #ShefChem16 pic.twitter.com/Clu8W0jlq2
WendyAnneW #ShefChem16 not quite the end. Can have more discussions over lunch https://twitter.com/baoilleach/status/750659722949296130
egonwillig @baoilleach @WendyAnneWarr @dgelemi @nathanbroon thanks for the detailed #ShefChem16 coverage! https://twitter.com/baoilleach/status/750659722949296130
WendyAnneW Overall, papers and posters were high quality #ShefChem16
nathanbroo #ShefChem16 conference closed. See you in three years! pic.twitter.com/8848re7eVy
GJPvWesten #ShefChem16 taxis did not show op so we took the 120 bus.. Adventures in #sheffield
GJPvWesten #ShefChem16 taxis did not show op so we toon the 120 bus.. Adventures in #sheffield @nathanbroon @georgeisyourman @WendyAnneWarr
dgelemi #shefchem16 taxi queue. Like Cambridge train station. May do like @GJPvWesten or try uberpic.twitter.com/Uzcux6t1ba
dgelemi #shefchem16 been called "love" and "darling" when buying train ticket @HannahJemi @DrJoshuaBox
dgelemi #shefchem16 been good to meet in person @WendyAnneWarr @nathanbroon @mvkrier @DrJoshuaBox now can put a face on tweets!
mattymattm Safe journey home to all@ #shefchem16 and for any interested parties a highly recommended job prospect at Sheffield! pic.twitter.com/lD4CfTAlO0
cressetgro Didn't see our #software case studies at #ShefChem16? See them at http://bit.ly/CressCase  #CompChem #VirtualScreening #SARanalysis
nathanbroo #ShefChem16 Legonath having a quick pint before his train back to London. The experiment went a little wrong… pic.twitter.com/NTGGgNjQoY
nathanbroo Great tweeting guys! #ShefChem16 @WendyAnneWarr @baoilleach @mvkrier @dgelemi @GJPvWesten @georgeisyourman @HannahJemi @DrJoshuaBox
cressetgro Missed our #DiscoveryServices case studies at #ShefChem16? See them at http://bit.ly/CDScase  #AnalyzeSAR #BridgeResourceGaps #Outsource
mvkrier After #shefchem16 meeting: SAVI
DrJoshuaBo Beautiful pint of mint chocolate stout @SheffieldTap to mark the end of another great #ShefChem16. That's all folks! pic.twitter.com/MNDGEqYfu0
cressetgro Missed details of our academic licensing program at #ShefChem16? See http://bit.ly/CresAcad  #Teaching #Student #Research #Free #LowCost
drjohndhol Have a safe journey home everyone #ShefChem16
GJPvWesten #ShefChem16 no draught beer at cafe balzar Manchester Airport, there is at bar mcr...
nathanbroo #ShefChem16 remember the next UK-QSAR & #Chemoinformatics meeting is being held @ICR_London on Wed 19th October. Registration opens soon!
egonwillig @WendyAnneWarr well, the flow was impressive and overwhelming! #ShefChem16 pic.twitter.com/iO6OB8Hdf6
egonwillig @egonwillighagen @baoilleach @WendyAnneWarr @dgelemi @nathanbroon it gives me a lot to catch up with! #ShefChem16 pic.twitter.com/lczl4AWiNu
nathanbroo Shall we @Storify #ShefChem16? Anyone any experience? @baoilleach @WendyAnneWarr @mvkrier @dgelemi
GJPvWesten @rguha also presented as poster on #ShefChem16, interesting indeed! pic.twitter.com/OmGXQpyn43
DrJoshuaBo @HannahJemi didn't have time to ask in Sheff. How are you doing protein pharmacophore in rdkit. Using a feature factory? #ShefChem16
HannahJemi @DrJoshuaBox Yes exactly. I modified a feature factory written by a previous student that only includes protein-specific smarts. #ShefChem16
baoilleach #ShefChem16 Just published the paper corresponding to my poster on "which fp is best?" http://jcheminf.springeropen.com/articles/10.1186/s13321-016-0148-0
nmsoftware #ShefChem16 Just published the paper corresponding to my poster on "which fp is best?" http://jcheminf.springeropen.com/articles/10.1186/s13321-016-0148-0
nmsoftware My #ShefChem16 poster "Which is the best fingerprint for medicinal chemistry?" #cheminformatics http://www.slideshare.net/NextMoveSoftware/which-is-the-best-fingerprint-for-medicinal-chemistry via @SlideShare
baoilleach My #ShefChem16 poster "Which is the best fingerprint for medicinal chemistry?" #cheminformatics http://www.slideshare.net/NextMoveSoftware/which-is-the-best-fingerprint-for-medicinal-chemistry via @SlideShare
baoilleach This was my first time live-tweeting a conference. Was my coverage of #ShefChem16...
AmethystD2 Had an excellent time at the #ShefChem16 conference. Thank you for organising. Already looking forward to the next time!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment