This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
import time | |
import itertools | |
import doctest | |
ITERATIONS = 1000000 | |
# From IBM Research's Rxn4Chemistry: | |
# https://github.com/rxn4chemistry/rxn-chemutils/blob/main/src/rxn/chemutils/tokenization.py | |
SMILES_TOKENIZER_PATTERN = r"(\%\([0-9]{3}\)|\[[^\]]+]|Br?|Cl?|N|O|S|P|F|I|b|c|n|o|s|p|\||\(|\)|\.|=|#|-|\+|\\|\/|:|~|@|\?|>>?|\*|\$|\%[0-9]{2}|[0-9])" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I will have attended 8 out of 12 (if we include the remotes). | |
12th - 2023 Mainz - Paul Czodrowski (Uni Mainz) - Talk on SmiZip | |
11th - 2022 Berlin - Bayer ML - Shared talk with Jan Jensen on Gabby | |
10th - 2021 Remote | |
9th - 2020 Remote - Flash presentation on "An efficient algorithm to find matched pairs of a peptide" | |
8th - 2019 Hamburg - Emanuel Ehmki (Uni Hamburg) **didn't attend** | |
7th - 2018 Cambridge - Andreas Bender (Uni Cambridge) - Flash presentation on DeepSMILES...almost | |
6th - 2017 Berlin - Andrea Volkamer (Charite Berlin) and Gerhard Wolber (FU Berlin) **didn't attend** | |
5th - 2016 Basel - Nadine Schneider (Novartis) **didn't attend** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I have no Twitter notes from the first day. Here are my notes from Days 2 and 3... | |
#shef2023 Adele Hardie (Uni Edinburgh) on an sMD/MSM approach for rational design of allosteric modulators. | |
Have come up with a workflow to predict allostery. Examples from two protein systems. | |
Orthosteric inhibition is where you stick a molecule into the active site blocking it. Allosteric inhibition is whether the molecule interacts somewhere else and affects protein activity. How can we predict this? Using MD. | |
Diff methods have diff cost. We use classical mechanics to compute the energies of the system, bonds, angles, torsion angles. The constants come from sets of precomputed params called forcefields. We can look at systems as big as protein-ligand, and ns timescales. | |
We can do Markov State Modelling (MSM), where we model probs of states (conformations). If the probabilities of the active vs inactive state change in the presence of a ligand then it's a modulator. Difficulty is that this is millsec to sec timescale - t |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> set PATH=C:\MinGW\bin;%PATH% | |
C:\Tools\zlib\zlib-1.2.8> C:\MinGW\bin\mingw32-make.exe -fwin32/Makefile.gcc | |
gcc -O3 -Wall -c -o adler32.o adler32.c | |
gcc -O3 -Wall -c -o compress.o compress.c | |
gcc -O3 -Wall -c -o crc32.o crc32.c | |
gcc -O3 -Wall -c -o deflate.o deflate.c | |
gcc -O3 -Wall -c -o gzclose.o gzclose.c | |
gcc -O3 -Wall -c -o gzlib.o gzlib.c | |
gcc -O3 -Wall -c -o gzread.o gzread.c |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\documentclass{article} | |
% -- Add this section to your LaTeX doc | |
% Remember to use "pdflatex -shell-escape myfile.tex" | |
% or it won't allow LaTeX to call any command-line | |
% programs! | |
\usepackage{graphicx} | |
\newcounter{smilescounter} | |
\setcounter{smilescounter}{1} | |
\newcommand{\smiles}[1]{ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Here's part of extconf.rb from OB after some edits to add support for --prefix. Unfortunately, this doesn't work (see discussion of mkmf2 which tries to fix some of these problems) | |
require 'getoptlong' | |
makeopts = {} | |
opts = GetoptLong.new(["--prefix", "-p", GetoptLong::OPTIONAL_ARGUMENT], | |
["--with-openbabel-lib", "-L", GetoptLong::OPTIONAL_ARGUMENT], | |
["--with-openbabel-include", "-I", GetoptLong::OPTIONAL_ARGUMENT] | |
).each{|o, a| makeopts[o[%r/[^-].*/]] = a} | |
prefix = makeopts.delete('prefix') || nil | |
oblib = makeopts.delete('with-openbabel-lib') || nil |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Monday morning - Analysis of Large Chemical Datasets | |
-------------------------------------------- | |
https://twitter.com/ConferenceNoel/status/1536235381313753090 | |
I missed the first tweet as I was setting up this Twitter a/c but it should have been: | |
#2022iccs Maximilian Beckers (Novartis) on 25 years of small molecule optimization at Novartis: A retrospective analysis of chemical series evolution | |
#2022iccs A chemical series is a subjective concept. Kruger JCIM 2020 published automated id of chemical series. | |
#2022iccs Specificity of a scaffold is the probability of a random match of a scaffold. More meaningful scaffolds have fewer random matches per scaffold. | |
#2022iccs The dataset includes a whole bunch of different properties from their Novartis in-house dataset. Filtering removes bifunctional degrader and others (e.g. >5 amide bonds). 310K cmpds in the end. | |
#2022iccs Ran the scaffold analysis of the dataset. 72% of the compounds were assigned to a scaffold. Median is 60 cmpds assigned to a scaffold; typical on |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import math | |
import pybel | |
def sqr_dist(a, b): | |
ac = a.coords | |
bc = b.coords | |
return (ac[0]-bc[0])**2 + (ac[1]-bc[1])**2 + (ac[2]-bc[2])**2 | |
# Definitions taken from | |
# http://baoilleach.blogspot.com/2007/07/pybel-hack-that-sd-file.html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(Use "quit;" to exit mysql prompt) | |
1. Download chembl_15_mysql.tar.gz | |
2. Get rid of the existing: "drop database chembldb" | |
mysql> drop database chembldb; | |
ERROR 1010 (HY000): Error dropping database (can't rmdir '.\chembldb', errno: 41) | |
(...the error was because I exported a file to this folder: C:\ProgramData\MySQL\MySQL Server 5.5\data\chembldb | |
I went there and deleted it and repeated the command - it worked fine) | |
3. create database chembl_15; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from collections import defaultdict | |
def toposort(graph): | |
"""http://code.activestate.com/recipes/578272-topological-sort/ | |
Dependencies are expressed as a dictionary whose keys are items | |
and whose values are a set of dependent items. Output is a list of | |
sets in topological order. The first set consists of items with no | |
dependences, each subsequent set consists of items that depend upon | |
items in the preceeding sets. |
NewerOlder