Instantly share code, notes, and snippets.

# ChrisRackauckas/systems_pharmacology.html

Created December 15, 2019 14:52
How Inexact Models Can Guide Decision Making in Systems Pharmacology
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 How Inexact Models Can Guide Decision Making in Systems Pharmacology

How Inexact Models Can Guide Decision Making in Systems Pharmacology

Chris Rackauckas
October 3rd, 2019

Pre-clinical Quantitiative Systems Pharmacology (QSP) is about trying to understand how a drug target effects an outcome. If I effect this part of the biological pathways, how will it induce toxicity? Will it be effective?

Recently I have been pulling in a lot of technical collegues to help with the development of next generation QSP tooling. Without a background in biological modeling, I found it difficult to explain the "how" and "why" of pharmacological modeling. Why is it differential equations, and where do these "massively expensive global optimization" runs come from? What kinds of problems can you solve with such models when you know that they are only approximate?

To solve these questions, I took a step back and tried to explain a decision making scenario with a simple model, to showcase how playing with a model can allow one to distinguish between intervention strategies and uncover a way forward. This is my attempt. Instead of talking about something small and foreign like chemical reaction concentrations, let's talk about something mathematically equivalent that's easy to visualize: ecological intervention.

Basic Modeling and Fitting

Let's take everyone's favorite ecology model: the Lotka-Volterra model. The model is the following:

• Left alone, the rabbit population will grow exponentially

• Rabbits are eaten wolves in proportion to the number of wolves (number of mouthes to feed), and inproportion to the number of rabbits (ease of food access: you eat more at a buffet!)

• Wolf populations grow exponentially, as long as there is a proportional amount of food around (rabbits)

• Wolves die overtime of old age, and any generation dies at a similar age (no major wolf medical discoveries)

The model is then the ODE:

using OrdinaryDiffEq, Plots

function f(du,u,p,t)
du[1] = dx = p[1]*u[1] - p[2]*u[1]*u[2]
du[2] = dy = -p[3]*u[2] + p[4]*u[1]*u[2]
end

u0 = [1.0;1.0]
tspan = (0.0,10.0)
p = [1.5,1.0,3.0,1.0]
prob = ODEProblem(f,u0,tspan,p)
sol = solve(prob,Tsit5())
plot(sol,label=["Rabbits" "Wolves"])

Except, me showing you that picture glossed over a major detail that every piece of the model is only mechanistic, but also contains a parameter. For example, rabbits grow exponentially, but what's the growth rate? To make that plot I chose a value for that growth rate (1.5), but in reality we need to get that from data since the results can be wildly different:

p = [0.1,1.0,3.0,1.0]
prob = ODEProblem(f,u0,tspan,p)
sol = solve(prob,Tsit5())
plot(sol)

Here the exponential growth rate of rabbits too low to sustain a wolf population, so the wolf population dies out, but then this makes the rabbits have no predators and grow exponentially, which is a common route of ecological collapse as then they will destroy the local ecosystem. More on that later.

Data and Model Issues

But okay, we need parameters from data, but no single data source is great. One gives us a noisy sample of the population yearly, another every month for the first two years and only on the wolves, etc.:

function f_true(du,u,p,t)
du[1] = dx = p[1]*u[1] - p[2]*u[1]*u[2] - p[5]*u[1]^2
du[2] = dy = -p[3]*u[2] + p[4]*u[1]*u[2]
end

p = [1.0,1.0,3.0,1.0,0.1]
prob = ODEProblem(f_true,u0,tspan,p)
sol = solve(prob,Tsit5())

data1 = sol(0:1:10)
data2 = sol(0:0.1:2;idxs=2)
scatter(data1)
scatter!(data2)

Oh, and notice that ODE is not the Lotka-Volterra model, but instead also adds a term p[5]*u[1]^2 for a rabbit disease which requires high rabbit density.

Assessing Intervention

The local population is very snobby and wants the rabbit population decreased. You're trying to find out how to best intervene with the populations so that you decrease the rabbit population to always stay below 4 million rabbits, but without causing population collapse. What should you be targetting? The rabbit birth rate? The ability for predators to find rabbits?

(In systems pharmacology, this is, which reactions should I interact with in order to achieve my target goals while not introducing toxic side effects?)

In a complex system, these will all act differently, so you need to simulate what happens under uncertainty with the model. For example, if I attack birth rate too hard, we already saw that we can lead to population collapse, but is birth rate a more robust target then wolf lifespan (i.e., could I change wolf lifespan more and get the same effect, but with less chance of collapse)?

The Modeler's Problem

But one caveat: in order to do these simulations you need to know the model and its parameters, since you want to investigate what happens when you change the model's parameters. But you just have "your best model" and "data". So you need to find out how to get "the best model you can" and the parameters of said model, since once you have that you can assess the targetting effects.

The Modeler's Approach

function f(du,u,p,t)
du[1] = dx = p[1]*u[1] - p[2]*u[1]*u[2]
du[2] = dy = -p[3]*u[2] + p[4]*u[1]*u[2]
end

u0 = [1.0;1.0]
tspan = (0.0,10.0)
p = ones(4)
prob = ODEProblem(f,u0,tspan,p)
sol = solve(prob,Tsit5())
plot(sol)

Here I took all of the parametrs to be 1, since my only guess is their relative order of magnitude which should be about correct (maybe?).

To get some parameter values, I then need do some parameter fitting. Let's define a cost function as a difference of our result against the data sources we have:

using LinearAlgebra
function cost(p)
# Needed for the optimizer to reject parameters out of the domain
any(x->x<0,p) && return Inf

# Solve the ODE with current parameters p
prob = ODEProblem(f,u0,tspan,p)
sol = solve(prob,Tsit5(),abstol=1e-8,reltol=1e-8)

# Check the difference from the data
norm(sol(0:1:10) - data1) + norm(data2 - sol(0:0.1:2;idxs=2))
end
cost(ones(4))
10.917779941345977

Yeah, those original parameters are pretty bad. But now let's optimize them:

using Optim
opt = optimize(cost,ones(4),BFGS())
p = Optim.minimizer(opt)
prob = ODEProblem(f,u0,tspan,p)
sol = solve(prob,Tsit5())
plot(sol)
scatter!(data1)
scatter!(data2)

Oh wow, that fit is pretty bad! Well generally, for large models (the models are usually >200 lines long), this is pretty standard for a local optimizer. So okay, we need to use a global optimizer.

using BlackBoxOptim
bound = Tuple{Float64, Float64}[(0, 10),(0, 10),(0, 10),(0, 10)]
result = bboptimize(cost;SearchRange = bound, MaxSteps = 21e3)
Starting optimization with optimizer BlackBoxOptim.DiffEvoOpt{BlackBoxOptim
daptiveDiffEvoRandBin{3},BlackBoxOptim.RandomBound{BlackBoxOptim.Continuous
RectSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.50 secs, 942 evals, 848 steps, improv/step: 0.259 (last = 0.2594), fitnes
s=3.317135922
1.00 secs, 1894 evals, 1800 steps, improv/step: 0.198 (last = 0.1439), fitn
ess=3.023363462
1.50 secs, 3060 evals, 2966 steps, improv/step: 0.183 (last = 0.1595), fitn
ess=2.962818427
2.00 secs, 3962 evals, 3868 steps, improv/step: 0.180 (last = 0.1707), fitn
ess=2.940117187
2.52 secs, 5128 evals, 5034 steps, improv/step: 0.177 (last = 0.1664), fitn
ess=2.936803328
3.02 secs, 6190 evals, 6096 steps, improv/step: 0.177 (last = 0.1742), fitn
ess=2.936688386
3.53 secs, 7538 evals, 7444 steps, improv/step: 0.181 (last = 0.2010), fitn
ess=2.936640663
4.03 secs, 8692 evals, 8599 steps, improv/step: 0.179 (last = 0.1662), fitn
ess=2.936639336
4.53 secs, 9864 evals, 9771 steps, improv/step: 0.175 (last = 0.1468), fitn
ess=2.936639323
5.03 secs, 10967 evals, 10874 steps, improv/step: 0.175 (last = 0.1741), fi
tness=2.936639322
5.53 secs, 11951 evals, 11858 steps, improv/step: 0.177 (last = 0.2043), fi
tness=2.936639322
6.03 secs, 13041 evals, 12948 steps, improv/step: 0.176 (last = 0.1661), fi
tness=2.936639322
6.53 secs, 14214 evals, 14123 steps, improv/step: 0.176 (last = 0.1719), fi
tness=2.936639322
7.03 secs, 15501 evals, 15410 steps, improv/step: 0.174 (last = 0.1453), fi
tness=2.936639322
7.54 secs, 16720 evals, 16629 steps, improv/step: 0.165 (last = 0.0509), fi
tness=2.936639322
8.04 secs, 17918 evals, 17827 steps, improv/step: 0.156 (last = 0.0359), fi
tness=2.936639322
8.54 secs, 19245 evals, 19154 steps, improv/step: 0.147 (last = 0.0249), fi
tness=2.936639322
9.05 secs, 20566 evals, 20475 steps, improv/step: 0.139 (last = 0.0220), fi
tness=2.936639322

Optimization stopped after 21001 steps and 9.28 seconds
Termination reason: Max number of steps (21000) reached
Steps per second = 2264.01
Function evals per second = 2273.82
Improvements/step = 0.13557
Total function evaluations = 21092

Best candidate found: [0.766198, 1.31019, 2.92441, 1.12677]

Fitness: 2.936639322
p = result.archive_output.best_candidate
prob = ODEProblem(f,u0,tspan,p)
sol = solve(prob,Tsit5())
plot(sol)
scatter!(data1)
scatter!(data2)

Oh dang, still no good. This means our model is misspecified. We see that we overshoot over time, so there's some kind of decay missing. Thus we go and talk to our collegues a bit more and find out that there's this weird bunny disease that is huge problem every few years: that may be an effect that is required to produce the data!

So then we change our model. What if this disease is just old age related? Then we would have decay of rabbits unrelated to wolves, so the model would be like:

function f2(du,u,p,t)
du[1] = dx = p[1]*u[1] - p[2]*u[1]*u[2] - p[5]*u[1]
du[2] = dy = -p[3]*u[2] + p[4]*u[1]*u[2]
end
f2 (generic function with 1 method)

So now let's optimize with this:

function cost2(p)
# Needed for the optimizer to reject parameters out of the domain
any(x->x<0,p) && return Inf

# Solve the ODE with current parameters p
prob = ODEProblem(f2,u0,tspan,p)
sol = solve(prob,Tsit5(),abstol=1e-8,reltol=1e-8)

# Check the difference from the data
norm(sol(0:1:10) - data1) + norm(data2 - sol(0:0.1:2;idxs=2))
end
bound = Tuple{Float64, Float64}[(0, 10),(0, 10),(0, 10),(0, 10),(0, 10)]
result = bboptimize(cost2;SearchRange = bound, MaxSteps = 21e3)
Starting optimization with optimizer BlackBoxOptim.DiffEvoOpt{BlackBoxOptim
daptiveDiffEvoRandBin{3},BlackBoxOptim.RandomBound{BlackBoxOptim.Continuous
RectSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.50 secs, 740 evals, 612 steps, improv/step: 0.252 (last = 0.2516), fitnes
s=5.269508848
1.00 secs, 1385 evals, 1252 steps, improv/step: 0.175 (last = 0.1016), fitn
ess=4.380580433
1.50 secs, 2006 evals, 1873 steps, improv/step: 0.146 (last = 0.0870), fitn
ess=4.136444391
2.00 secs, 2791 evals, 2659 steps, improv/step: 0.140 (last = 0.1247), fitn
ess=3.503409075
2.51 secs, 3675 evals, 3543 steps, improv/step: 0.140 (last = 0.1403), fitn
ess=3.224799080
3.02 secs, 4657 evals, 4525 steps, improv/step: 0.139 (last = 0.1385), fitn
ess=2.972947845
3.52 secs, 5697 evals, 5565 steps, improv/step: 0.141 (last = 0.1471), fitn
ess=2.940026109
4.02 secs, 6788 evals, 6656 steps, improv/step: 0.144 (last = 0.1622), fitn
ess=2.936773525
4.52 secs, 8059 evals, 7927 steps, improv/step: 0.144 (last = 0.1432), fitn
ess=2.936654026
5.02 secs, 9410 evals, 9280 steps, improv/step: 0.145 (last = 0.1515), fitn
ess=2.936639903
5.52 secs, 10848 evals, 10718 steps, improv/step: 0.144 (last = 0.1370), fi
tness=2.936639336
6.03 secs, 12051 evals, 11922 steps, improv/step: 0.141 (last = 0.1163), fi
tness=2.936639327
6.53 secs, 13156 evals, 13027 steps, improv/step: 0.143 (last = 0.1593), fi
tness=2.936639322
7.03 secs, 14131 evals, 14004 steps, improv/step: 0.144 (last = 0.1638), fi
tness=2.936639322
7.53 secs, 14953 evals, 14826 steps, improv/step: 0.145 (last = 0.1521), fi
tness=2.936639322
8.03 secs, 16116 evals, 15989 steps, improv/step: 0.145 (last = 0.1470), fi
tness=2.936639322
8.53 secs, 17517 evals, 17390 steps, improv/step: 0.144 (last = 0.1292), fi
tness=2.936639322
9.03 secs, 18811 evals, 18684 steps, improv/step: 0.141 (last = 0.1036), fi
tness=2.936639322
9.53 secs, 20128 evals, 20003 steps, improv/step: 0.135 (last = 0.0546), fi
tness=2.936639322

Optimization stopped after 21001 steps and 9.93 seconds
Termination reason: Max number of steps (21000) reached
Steps per second = 2114.05
Function evals per second = 2126.64
Improvements/step = 0.12981
Total function evaluations = 21126

Best candidate found: [6.53541, 1.31019, 2.92441, 1.12677, 5.76921]

Fitness: 2.936639322
p = result.archive_output.best_candidate
prob = ODEProblem(f2,u0,tspan,p)
sol = solve(prob,Tsit5())
plot(sol)
scatter!(data1)
scatter!(data2)

Okay, that model is not right yet, but it's better. Let's try the local one and see what we get:

opt = optimize(cost2,ones(5),BFGS())
p = Optim.minimizer(opt)
prob = ODEProblem(f2,u0,tspan,p)
sol = solve(prob,Tsit5())
plot(sol)
scatter!(data1)
scatter!(data2)

So we go back to our collegues who know the local ecology and tell them, that model is giving me junk. Then you learn it's population dependent, so this death is only going to occur when the population is higher! So you need to add an interaction between bunnies to bunnies. A simple model for this is to assume that two bunnies always have to die together, so:

function f3(du,u,p,t)
du[1] = dx = p[1]*u[1] - p[2]*u[1]*u[2] - p[5]*u[1]^2
du[2] = dy = -p[3]*u[2] + p[4]*u[1]*u[2]
end

function cost3(p)
# Needed for the optimizer to reject parameters out of the domain
any(x->x<0,p) && return Inf

# Solve the ODE with current parameters p
prob = ODEProblem(f3,u0,tspan,p)
sol = solve(prob,Tsit5(),abstol=1e-8,reltol=1e-8)

# Check the difference from the data
norm(sol(0:1:10) - data1) + norm(data2 - sol(0:0.1:2;idxs=2))
end

opt = optimize(cost3,ones(5),BFGS())
p = Optim.minimizer(opt)
prob = ODEProblem(f2,u0,tspan,p)
sol = solve(prob,Tsit5())
plot(sol)
scatter!(data1)
scatter!(data2)

That might just be an optimization issue? So let's run our big model with global optimization on the cluster:

bound = Tuple{Float64, Float64}[(0, 10),(0, 10),(0, 10),(0, 10),(0, 10)]
result = bboptimize(cost3;SearchRange = bound, MaxSteps = 21e3)
Starting optimization with optimizer BlackBoxOptim.DiffEvoOpt{BlackBoxOptim
daptiveDiffEvoRandBin{3},BlackBoxOptim.RandomBound{BlackBoxOptim.Continuous
RectSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.50 secs, 1427 evals, 1325 steps, improv/step: 0.267 (last = 0.2672), fitn
ess=3.659818646
1.00 secs, 3162 evals, 3062 steps, improv/step: 0.194 (last = 0.1376), fitn
ess=2.236392570
1.50 secs, 4556 evals, 4456 steps, improv/step: 0.180 (last = 0.1506), fitn
ess=1.053944784
2.00 secs, 5627 evals, 5528 steps, improv/step: 0.180 (last = 0.1782), fitn
ess=0.433969616
2.50 secs, 6614 evals, 6516 steps, improv/step: 0.177 (last = 0.1619), fitn
ess=0.159514518
3.01 secs, 8011 evals, 7913 steps, improv/step: 0.174 (last = 0.1589), fitn
ess=0.057836419
3.51 secs, 9013 evals, 8915 steps, improv/step: 0.171 (last = 0.1467), fitn
ess=0.020524079
4.01 secs, 9754 evals, 9656 steps, improv/step: 0.169 (last = 0.1525), fitn
ess=0.016040864
4.51 secs, 11169 evals, 11071 steps, improv/step: 0.168 (last = 0.1618), fi
tness=0.005913857
5.01 secs, 12885 evals, 12787 steps, improv/step: 0.166 (last = 0.1474), fi
tness=0.002874341
5.51 secs, 14017 evals, 13919 steps, improv/step: 0.166 (last = 0.1714), fi
tness=0.002487151
6.02 secs, 15310 evals, 15212 steps, improv/step: 0.166 (last = 0.1609), fi
tness=0.002450532
6.52 secs, 16441 evals, 16343 steps, improv/step: 0.165 (last = 0.1618), fi
tness=0.002449064
7.02 secs, 17621 evals, 17523 steps, improv/step: 0.166 (last = 0.1763), fi
tness=0.002448736
7.54 secs, 19167 evals, 19069 steps, improv/step: 0.168 (last = 0.1953), fi
tness=0.002448716
8.05 secs, 20881 evals, 20784 steps, improv/step: 0.170 (last = 0.1878), fi
tness=0.002448716

Optimization stopped after 21001 steps and 8.10 seconds
Termination reason: Max number of steps (21000) reached
Steps per second = 2591.44
Function evals per second = 2603.41
Improvements/step = 0.17048
Total function evaluations = 21098

Best candidate found: [0.999909, 1.00034, 3.00008, 1.00015, 0.0999369]

Fitness: 0.002448716
p = result.archive_output.best_candidate
prob = ODEProblem(f3,u0,tspan,p)
sol = solve(prob,Tsit5())
plot(sol)
scatter!(data1)
scatter!(data2)

Oh bingo! That looks good! So okay, is it safe to reduce the rabbit birth rate, and how much would we need to target it by to get it under 4?

test_p = p - [0.5,0,0,0,0]
prob = ODEProblem(f3,u0,tspan,test_p)
sol = solve(prob,Tsit5())
plot(sol)
test_p = p - [0.58,0,0,0,0]
prob = ODEProblem(f3,u0,tspan,test_p)
sol = solve(prob,Tsit5())
plot(sol)
test_p = p - [0.7,0,0,0,0]
prob = ODEProblem(f3,u0,tspan,test_p)
sol = solve(prob,Tsit5())
plot(sol)

Ouch. We see that effecting the birth rate, we would have to hit a reduction range of [0.55,0.7] to get the effect we want without introducing population collapse. What about effecting wolf population growth?

test_p = p + [0.0,0,0.0,0.25,0]
prob = ODEProblem(f3,u0,tspan,test_p)
sol = solve(prob,Tsit5())
plot(sol)
test_p = p + [0.0,0,0.0,0.5,0]
prob = ODEProblem(f3,u0,tspan,test_p)
sol = solve(prob,Tsit5())
plot(sol)
test_p = p + [0.0,0,1.5,2.0,0]
prob = ODEProblem(f3,u0,tspan,test_p)
sol = solve(prob,Tsit5())
plot(sol)

Wow, it's very easy to intervene this way and get the rabbit population below 4 million, and not have a population collapse!

Board Meeting Recommendations

From these results you go to your board meeting and recommend:

• Do not attempt to decrease the rabbit population by killing baby rabbits. This seems like the obvious approach, but the models show that it can easily lead to population collapse.

• However, making it easier for wolves to feast on rabbits is a robust change to the ecosystem that gives the people what they want.

• We recommend clearing a lot of bushes to give rabbits a harder chance of hiding, along with long-term investment in research for wolf binoculars.

• As we go through our results, we tell our collegues to never use a local optimizer.

• During our talk, we tell our collegues that the global optimization takes for ever, so we should probably parallelize it, or do other things, like change our models into Julia to use their faster tools.

Translating This Example to Systems Pharmacology

We have a cancer pathway and our team is synthesizing molecules to target different aspects of the pathway. Our goal as the modeler is to help the guide the choice of which part of the pathway we should target. So we:

1. Talk to biologists and consult the literature to learn about the pathway

2. Build a model

3. Find some data in the literature about how the pathway should generally act

4. Try to fit the data

5. If we don't fit well, go back to (1)

6. Great, we now have a good enough model. Investigate what happens to the cancer patient if we effect X vs Y. Does X introduce toxic effects? Can we know if it only produces toxic effects in men?

7. Gather these plots to showcase that yes, the model gives a reliable fit, and the analysis shows that targetting X will cause ...

Possible Improvements for Tooling

Given this process, the possible improvements to tooling are: