Skip to content

Instantly share code, notes, and snippets.

@baoilleach
baoilleach / time_tokenizers.py
Created December 28, 2023 19:12
Code to tokenize a SMILES string
import re
import time
import itertools
import doctest
ITERATIONS = 1000000
# From IBM Research's Rxn4Chemistry:
# https://github.com/rxn4chemistry/rxn-chemutils/blob/main/src/rxn/chemutils/tokenization.py
SMILES_TOKENIZER_PATTERN = r"(\%\([0-9]{3}\)|\[[^\]]+]|Br?|Cl?|N|O|S|P|F|I|b|c|n|o|s|p|\||\(|\)|\.|=|#|-|\+|\\|\/|:|~|@|\?|>>?|\*|\$|\%[0-9]{2}|[0-9])"
I will have attended 8 out of 12 (if we include the remotes).
12th - 2023 Mainz - Paul Czodrowski (Uni Mainz) - Talk on SmiZip
11th - 2022 Berlin - Bayer ML - Shared talk with Jan Jensen on Gabby
10th - 2021 Remote
9th - 2020 Remote - Flash presentation on "An efficient algorithm to find matched pairs of a peptide"
8th - 2019 Hamburg - Emanuel Ehmki (Uni Hamburg) **didn't attend**
7th - 2018 Cambridge - Andreas Bender (Uni Cambridge) - Flash presentation on DeepSMILES...almost
6th - 2017 Berlin - Andrea Volkamer (Charite Berlin) and Gerhard Wolber (FU Berlin) **didn't attend**
5th - 2016 Basel - Nadine Schneider (Novartis) **didn't attend**
@baoilleach
baoilleach / 2023_Sheffield_Conference.txt
Created June 27, 2023 19:17
Notes from Ninth Joint Sheffield Conference on Chemoinformatics
I have no Twitter notes from the first day. Here are my notes from Days 2 and 3...
#shef2023 Adele Hardie (Uni Edinburgh) on an sMD/MSM approach for rational design of allosteric modulators.
Have come up with a workflow to predict allostery. Examples from two protein systems.
Orthosteric inhibition is where you stick a molecule into the active site blocking it. Allosteric inhibition is whether the molecule interacts somewhere else and affects protein activity. How can we predict this? Using MD.
Diff methods have diff cost. We use classical mechanics to compute the energies of the system, bonds, angles, torsion angles. The constants come from sets of precomputed params called forcefields. We can look at systems as big as protein-ligand, and ns timescales.
We can do Markov State Modelling (MSM), where we model probs of states (conformations). If the probabilities of the active vs inactive state change in the presence of a ligand then it's a modulator. Difficulty is that this is millsec to sec timescale - t
@baoilleach
baoilleach / zlib_mingw.txt
Created July 11, 2013 13:47
Compile zlib statically with mingw
> set PATH=C:\MinGW\bin;%PATH%
C:\Tools\zlib\zlib-1.2.8> C:\MinGW\bin\mingw32-make.exe -fwin32/Makefile.gcc
gcc -O3 -Wall -c -o adler32.o adler32.c
gcc -O3 -Wall -c -o compress.o compress.c
gcc -O3 -Wall -c -o crc32.o crc32.c
gcc -O3 -Wall -c -o deflate.o deflate.c
gcc -O3 -Wall -c -o gzclose.o gzclose.c
gcc -O3 -Wall -c -o gzlib.o gzlib.c
gcc -O3 -Wall -c -o gzread.o gzread.c
@baoilleach
baoilleach / texsmiles.tex
Created March 26, 2012 19:49
Example TeX file showing how to embed SMILES strings into LaTeX as images
\documentclass{article}
% -- Add this section to your LaTeX doc
% Remember to use "pdflatex -shell-escape myfile.tex"
% or it won't allow LaTeX to call any command-line
% programs!
\usepackage{graphicx}
\newcounter{smilescounter}
\setcounter{smilescounter}{1}
\newcommand{\smiles}[1]{
@baoilleach
baoilleach / gist:771081
Created January 8, 2011 19:31
extconf.rb
Here's part of extconf.rb from OB after some edits to add support for --prefix. Unfortunately, this doesn't work (see discussion of mkmf2 which tries to fix some of these problems)
require 'getoptlong'
makeopts = {}
opts = GetoptLong.new(["--prefix", "-p", GetoptLong::OPTIONAL_ARGUMENT],
["--with-openbabel-lib", "-L", GetoptLong::OPTIONAL_ARGUMENT],
["--with-openbabel-include", "-I", GetoptLong::OPTIONAL_ARGUMENT]
).each{|o, a| makeopts[o[%r/[^-].*/]] = a}
prefix = makeopts.delete('prefix') || nil
oblib = makeopts.delete('with-openbabel-lib') || nil
@baoilleach
baoilleach / ICCS_2022_Conference_Notes.txt
Created July 3, 2022 20:14
Notes from International Conference on Chemical Structures 2022
Monday morning - Analysis of Large Chemical Datasets
--------------------------------------------
https://twitter.com/ConferenceNoel/status/1536235381313753090
I missed the first tweet as I was setting up this Twitter a/c but it should have been:
#2022iccs Maximilian Beckers (Novartis) on 25 years of small molecule optimization at Novartis: A retrospective analysis of chemical series evolution
#2022iccs A chemical series is a subjective concept. Kruger JCIM 2020 published automated id of chemical series.
#2022iccs Specificity of a scaffold is the probability of a random match of a scaffold. More meaningful scaffolds have fewer random matches per scaffold.
#2022iccs The dataset includes a whole bunch of different properties from their Novartis in-house dataset. Filtering removes bifunctional degrader and others (e.g. >5 amide bonds). 310K cmpds in the end.
#2022iccs Ran the scaffold analysis of the dataset. 72% of the compounds were assigned to a scaffold. Median is 60 cmpds assigned to a scaffold; typical on
import math
import pybel
def sqr_dist(a, b):
ac = a.coords
bc = b.coords
return (ac[0]-bc[0])**2 + (ac[1]-bc[1])**2 + (ac[2]-bc[2])**2
# Definitions taken from
# http://baoilleach.blogspot.com/2007/07/pybel-hack-that-sd-file.html
@baoilleach
baoilleach / chembl_mysql.txt
Last active November 12, 2021 10:55
Import and Export ChEMBL activities to/from MySQL
(Use "quit;" to exit mysql prompt)
1. Download chembl_15_mysql.tar.gz
2. Get rid of the existing: "drop database chembldb"
mysql> drop database chembldb;
ERROR 1010 (HY000): Error dropping database (can't rmdir '.\chembldb', errno: 41)
(...the error was because I exported a file to this folder: C:\ProgramData\MySQL\MySQL Server 5.5\data\chembldb
I went there and deleted it and repeated the command - it worked fine)
3. create database chembl_15;
@baoilleach
baoilleach / longestpath.py
Created November 24, 2013 17:54
How to find the longest path in a directed acyclic graph
from collections import defaultdict
def toposort(graph):
"""http://code.activestate.com/recipes/578272-topological-sort/
Dependencies are expressed as a dictionary whose keys are items
and whose values are a set of dependent items. Output is a list of
sets in topological order. The first set consists of items with no
dependences, each subsequent set consists of items that depend upon
items in the preceeding sets.