Skip to content

Instantly share code, notes, and snippets.

@CamDavidsonPilon
Last active March 26, 2020 23:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CamDavidsonPilon/9373f1d322e8fd1ac2d439cb1cd3f4d4 to your computer and use it in GitHub Desktop.
Save CamDavidsonPilon/9373f1d322e8fd1ac2d439cb1cd3f4d4 to your computer and use it in GitHub Desktop.
"""
I consider this a greedy algorithm, since at each time step, I ask which is a better "twist". I don't think it's optimal.
The idea is to estimate the probability of discovering tau in the next time step, given your current position and knowledge (position being left or right, denoted 1 and 2 here). We calculate the probability of discovering tau in the next time step as follows:
t1 is the max time observed in position 1, and t2 in position 2. Denote P the random variable of which position tau is in (1 or 2). Small p is our current position. Suppose we start in position 1, i.e. p=1
Pr(discover tau in next delta time| t1, t2, p=1) =
Pr(discover tau in next delta time| t1, t2, P=1, p=1) * Pr(P=1) +
Pr(discover tau in next delta time| t1, t2, P=2, p=1) * Pr(P=2)
= Pr(discover tau in next delta time| t1, t2, P=1 p=1) * Pr(P=1) + 0 * Pr(Pr=2)
= Pr(discover tau in next delta time| t1, t2, P=1 p=1) * Pr(P=1)
The first term is just the compliment of the conditional survival curve:
Pr(discover tau in next delta time| t1, t2, P=1, p=1) = 1 - S(t1+delta) / S(t1)
The second term is the division of "remaining white space" to explore:
Pr(P=1) = S(t1) / (S(t1) + S(t2))
So for each time step, we compare the following and choose the max
Pr(discover tau in next delta time| t1, t2, p=1)
Pr(discover tau in next delta time| t1, t2, p=2)
(and increment t1, t2 as appropriate)
It gets tricker when we have positive t1 and positive t2. I figure that the cost of switching (i.e. incurring waiting again in regions you know are not correct) needs to be less than the "reward" of staying. I thought about this for a while, and figured "amatorizing" the cost of switching over the potential next reward is a decent comparison.
Ex:
argmax(
if current in 1:
w * (1-S(t1 + delta)/S(t1)),
(1-w) * (1-S(t2 + delta) / S(t2)) / t2
else:
w * (1-S(t1 + delta)/S(t1)) / t1,
(1-w) * (1-S(t2 + delta) / S(t2))
)
w = S(t1) / (S(t1) + S(t2))
"""
from time import sleep
DELTA = 0.01
def survival_function(t, lambda_=1., rho=1.5):
# Assume simple Weibull model
return np.exp(-(t/lambda_) ** rho)
def w(t1, t2):
# equal to Pr(X = t1)
return survival_function(t1) / (survival_function(t1) + survival_function(t2))
def determine_best_action(current_position, t1, t2):
p1 = w(t1, t2) * (1-survival_function(t1 + DELTA) / survival_function(t1))
p2 = (1-w(t1, t2)) * (1-survival_function(t2 + DELTA) / survival_function(t2))
if current_position == 1:
if p1 > p2/max(t2, 1):
return 1
else:
return 2
else:
if p1/max(t1, 1) > p2:
return 1
else:
return 2
def simulate():
explored = [0.00, 0.00]
time = 0.00
# choose 1 initially
current_position = 1
explored[current_position-1] += DELTA
while True:
previous_position = current_position
choice = determine_best_action(current_position, *explored)
if choice == 1:
current_position = 1
else:
current_position = 2
explored[current_position-1] += DELTA
if previous_position != current_position:
# skip ahead to new region
time += explored[current_position-1]
print("SWITCHED")
time += DELTA
print("Time: %.2f, current position: %d, t1: %.2f, t2: %.2f" % (time, current_position, *explored))
sleep(0.02)
simulate()
@yupbank
Copy link

yupbank commented Mar 26, 2020

nice work!

"amatorizing" the cost

this part is a bit confusing, since the DELTA is 0.1 instead of 1.
shouldn't it be this ?
(1-w) * (1-S(t2 + delta) / S(t2)) / t2 -> (1-w) * (1-S(t2 + delta) / S(t2)) / (t2/DELTA)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment