CamDavidsonPilon/shower_problem.py

## shower_problem.py
"""
I consider this a greedy algorithm, since at each time step, I ask which is a better "twist". I don't think it's optimal.

The idea is to estimate the probability of discovering tau in the next time step, given your current position and knowledge (position being left or right, denoted 1 and 2 here). We calculate the probability of discovering tau in the next time step as follows:

t1 is the max time observed in position 1, and t2 in position 2. Denote P the random variable of which position tau is in (1 or 2). Small p is our current position. Suppose we start in position 1, i.e. p=1


Pr(discover tau in next delta time| t1, t2, p=1) =
        Pr(discover tau in next delta time| t1, t2, P=1, p=1) * Pr(P=1) +
        Pr(discover tau in next delta time| t1, t2, P=2, p=1) * Pr(P=2)

        = Pr(discover tau in next delta time| t1, t2, P=1 p=1) * Pr(P=1) + 0 * Pr(Pr=2)
        = Pr(discover tau in next delta time| t1, t2, P=1 p=1) * Pr(P=1)


The first term is just the compliment of the conditional survival curve:

    Pr(discover tau in next delta time| t1, t2, P=1, p=1) = 1 - S(t1+delta) / S(t1)

The second term is the division of "remaining white space" to explore:

    Pr(P=1) = S(t1) / (S(t1) + S(t2))


So for each time step, we compare the following and choose the max

Pr(discover tau in next delta time| t1, t2, p=1)
Pr(discover tau in next delta time| t1, t2, p=2)

(and increment t1, t2 as appropriate)


It gets tricker when we have positive t1 and positive t2. I figure that the cost of switching (i.e. incurring waiting again in regions you know are not correct) needs to be less than the "reward" of staying. I thought about this for a while, and figured "amatorizing" the cost of switching over the potential next reward is a decent comparison.

Ex:

argmax(
    if current in 1:
        w * (1-S(t1 + delta)/S(t1)),
        (1-w) * (1-S(t2 + delta) / S(t2)) / t2
    else:
        w * (1-S(t1 + delta)/S(t1)) / t1,
        (1-w) * (1-S(t2 + delta) / S(t2))
)

w = S(t1) / (S(t1) + S(t2))


"""
from time import sleep

DELTA = 0.01

def survival_function(t, lambda_=1., rho=1.5):
    # Assume simple Weibull model
    return np.exp(-(t/lambda_) ** rho)


def w(t1, t2):
    # equal to Pr(X = t1)
    return survival_function(t1) / (survival_function(t1) + survival_function(t2))


def determine_best_action(current_position, t1, t2):
    p1 = w(t1, t2) * (1-survival_function(t1 + DELTA) / survival_function(t1))
    p2 = (1-w(t1, t2)) * (1-survival_function(t2 + DELTA) / survival_function(t2))

    if current_position == 1:
        if p1 > p2/max(t2, 1):
            return 1
        else:
            return 2
    else:
        if p1/max(t1, 1) > p2:
            return 1
        else:
            return 2


def simulate():

    explored = [0.00, 0.00]
    time = 0.00

    # choose 1 initially
    current_position = 1
    explored[current_position-1] += DELTA


    while True:
        previous_position = current_position
        choice = determine_best_action(current_position, *explored)
        if choice == 1:
            current_position = 1
        else:
            current_position = 2

        explored[current_position-1] += DELTA

        if previous_position != current_position:
            # skip ahead to new region
            time += explored[current_position-1]
            print("SWITCHED")

        time += DELTA

        print("Time: %.2f, current position: %d, t1: %.2f, t2: %.2f" % (time, current_position, *explored))
        sleep(0.02)


simulate()
	"""
	I consider this a greedy algorithm, since at each time step, I ask which is a better "twist". I don't think it's optimal.

	The idea is to estimate the probability of discovering tau in the next time step, given your current position and knowledge (position being left or right, denoted 1 and 2 here). We calculate the probability of discovering tau in the next time step as follows:

	t1 is the max time observed in position 1, and t2 in position 2. Denote P the random variable of which position tau is in (1 or 2). Small p is our current position. Suppose we start in position 1, i.e. p=1


	Pr(discover tau in next delta time\| t1, t2, p=1) =
	Pr(discover tau in next delta time\| t1, t2, P=1, p=1) * Pr(P=1) +
	Pr(discover tau in next delta time\| t1, t2, P=2, p=1) * Pr(P=2)

	= Pr(discover tau in next delta time\| t1, t2, P=1 p=1) * Pr(P=1) + 0 * Pr(Pr=2)
	= Pr(discover tau in next delta time\| t1, t2, P=1 p=1) * Pr(P=1)


	The first term is just the compliment of the conditional survival curve:

	Pr(discover tau in next delta time\| t1, t2, P=1, p=1) = 1 - S(t1+delta) / S(t1)

	The second term is the division of "remaining white space" to explore:

	Pr(P=1) = S(t1) / (S(t1) + S(t2))


	So for each time step, we compare the following and choose the max

	Pr(discover tau in next delta time\| t1, t2, p=1)
	Pr(discover tau in next delta time\| t1, t2, p=2)

	(and increment t1, t2 as appropriate)


	It gets tricker when we have positive t1 and positive t2. I figure that the cost of switching (i.e. incurring waiting again in regions you know are not correct) needs to be less than the "reward" of staying. I thought about this for a while, and figured "amatorizing" the cost of switching over the potential next reward is a decent comparison.

	Ex:

	argmax(
	if current in 1:
	w * (1-S(t1 + delta)/S(t1)),
	(1-w) * (1-S(t2 + delta) / S(t2)) / t2
	else:
	w * (1-S(t1 + delta)/S(t1)) / t1,
	(1-w) * (1-S(t2 + delta) / S(t2))
	)

	w = S(t1) / (S(t1) + S(t2))



	"""
	from time import sleep

	DELTA = 0.01

	def survival_function(t, lambda_=1., rho=1.5):
	# Assume simple Weibull model
	return np.exp(-(t/lambda_) ** rho)


	def w(t1, t2):
	# equal to Pr(X = t1)
	return survival_function(t1) / (survival_function(t1) + survival_function(t2))


	def determine_best_action(current_position, t1, t2):
	p1 = w(t1, t2) * (1-survival_function(t1 + DELTA) / survival_function(t1))
	p2 = (1-w(t1, t2)) * (1-survival_function(t2 + DELTA) / survival_function(t2))

	if current_position == 1:
	if p1 > p2/max(t2, 1):
	return 1
	else:
	return 2
	else:
	if p1/max(t1, 1) > p2:
	return 1
	else:
	return 2


	def simulate():

	explored = [0.00, 0.00]
	time = 0.00

	# choose 1 initially
	current_position = 1
	explored[current_position-1] += DELTA


	while True:
	previous_position = current_position
	choice = determine_best_action(current_position, *explored)
	if choice == 1:
	current_position = 1
	else:
	current_position = 2

	explored[current_position-1] += DELTA

	if previous_position != current_position:
	# skip ahead to new region
	time += explored[current_position-1]
	print("SWITCHED")

	time += DELTA

	print("Time: %.2f, current position: %d, t1: %.2f, t2: %.2f" % (time, current_position, *explored))
	sleep(0.02)


	simulate()