Skip to content

Instantly share code, notes, and snippets.

@backnotprop
Last active July 21, 2019 13:19
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save backnotprop/b0a7e5728ee240a0abf00e1ebefeb6ca to your computer and use it in GitHub Desktop.
Save backnotprop/b0a7e5728ee240a0abf00e1ebefeb6ca to your computer and use it in GitHub Desktop.
Can the bear population survive

Looking for a solution to my below game problem. I believe it to require some sort of reinforcement learning, dynamic programming, or probabilistic programming solution, but am unsure... This is my original problem, and is part of an initiative to create "unique and challenging problem that you're able to conceptualize and then solve. 3 Judging criteria: uniqueness, complexity, and solution (no particular weighting and scoring may favor uniqueness/challenge over solution"

Inspirations: Conway's Game of Life, DeepMind's Starcraft Challenge, deep Q-learning, probabilistic programming

BEAR SURVIVAL

A bear is preparing for hibernation. A bear must reach life-strength 1000 in order to rest & survive the winter. A bear starts off at a health of 500. A bear explores an environment of magic berries. A bear makes a move (chosen randomly with no optional direction) and comes across a berry each time. There are 100 different types of berries that all appear across the wilderness equally and infinitely.

A magic berry always consumes 20 life from the bear upon arrival (this not an energy cost for moving and we should not think of it as such). A bear may then choose to give more, all, or none of its remaining life to the berry. If eaten, the berry may provide back to the bear 2x the amount of life given. Berries, however, are not the same and a bear knows this. A bear knows that any berry has some percentage of being poisonous. Of the 100 different types of berries, each may be 0%-100% poisonous. A berry that is 0% poisonous is the perfect berry and a bear knows that it should commit all of its remaining life to receive max health gain. If a bear wants to eat the berry, it must commit at least 20 more health. Again, a bear does not have to eat the berry, but if it chooses not to, it walks away and does not get the original 20 back.

Example: On a bear's first move (at 500 life), it comes across a magic berry and the berry automatically takes 20 life. The bear notices that the berry is 0% poisonous, the perfect berry, and gives its remaining 480 health, eats the berry, and then receives 1000 health gain. The bear has reached it's goal, hibernates, and wins the game. However, if that first berry was 100% poisonous, the anti-berry, and the bear committed all of its remaining life it would've received back 0 health gain, died, and lost the game. A bear knows to never eat the anti-berry. It knows it can come across any poisonous probability value from 0-100 (3,25,52,99, etc).

A bear must be picky & careful, but also bold & smart about how much life it wants to commit per berry, per move. A bear knows that if it never eats, it will eventually die as it loses 20 health per berry, per move.

While it's important for an individual bear to survive, it is even more important for the bear population to not go extinct. A population is going extinct if they lose over half of the population that year. Bears in a population, and their consumption of berries, are completely independent of each other.

Questions:

  1. May we find a bear's optimal strategy for committing health & eating berries to reach 1000 health gain?
  2. Is the bear population eventually doomed to a unfavorable environment?

Bonus Complexity:

Winter is coming, and conditions grow progressively harsher over time. A bear knows that every 10 moves, each berry will consume 20 * (fib(i)/ environmentFactor). fib(i) stands for fibonacci-sequence at index i, starting at 1. For all indexes where the progression is less than 20, a berry's initial health consumption remains at 20. environmentFactor is a single environment's progressive-harshness variable (how harsh winter becomes over time). The bear population is currently in an environment with environmentFactor of 4. Spelled out:

Moves 01-10: Berries consume 20 --  20*(1/4)
Moves 10-20: Berries consume 20 --  20*(1/4)
Moves 20-30: Berries consume 20 --  20*(2/4)
Moves 30-40: Berries consume 20 --  20*(3/4)
Moves 40-50: Berries consume 25 --  20*(5/4)
Moves 50-60: Berries consume 40 --  20*(8/4)
Moves 60-70: Berries consume 65 --  20*(13/4)
Moves 70-80: Berries consume 105 --  20*(21/4)
Moves 80-90: Berries consume 170 --  20*(34/4)
Moves 90-100: Berries consume 275 --  20*(55/4)
Moves 100-110: Berries consume 445 --  20*(89/4)
... and so on ...

Same questions as above, with a third: if this environment is proven unfavorable, and extinction unavoidable, what maximum environment/environmentFactor must the bear population move to in order to avoid extinction? (this may or may not exist if a berry's requirement of 20 initial life is always unfavorable without any progression).

Further Details:

QUESTION:

Can you give an example of what happens when a bear eats a semi-poisonous berry (e.g. 20%)?

Also, is the bear always immediately aware of the poison value of berries?

In this, it also seems like you're using health, life, strength, life-strength, health-gain, etc interchangeably. Are they all the same thing?

ANSWER:

if the bear eats the poison berry of 20%, then it becomes a probability problem of whether or not the berry provides back life or keeps the health amount committed by the bear. Example: the bear is at 400, the next move & berry take the initial 20 (bear is at 380 now), the bear decides to commit an additional 80 (now at 300) and eat the berry. 8/10 times the berry will return to the bear 200 (2x 100 committed -- bear ends turn at 500), 2/10 times the berry returns nothing and the bear must move on with 300 life.

the bear is always immediately aware of the poison value of a berry. life/health/strength are all the same thing.

@backnotprop
Copy link
Author

backnotprop commented Jul 20, 2019

First approach (not with the complex scenario): An aggressive strategy implements a financial investment approach. The approach, Kelly Criterion, has generated up to a 92% success rate with 1M bears. But KC creates a risk to variance and smaller population sizes. With KC, I wonder about variance with such an aggressive strategy and what that means in terms of chances that a population goes extinct in the case that there is an overall downswing in variance. I guess with the right population size this wouldn't matter, and then wonder if it's possible to calculate what that size needs to be.

import random
import math


STARTING_HEALTH = 500
WINNING_HEALTH = 1000
LOSING_HEALTH = 0

MOVE_COST = 20
EAT_COST = 20

NR_OF_GAMES = 1000000

def move(health):
    consumed_by_berry = MOVE_COST

    berry_poisonous_chance = random.randint(0, 101)

    if berry_poisonous_chance < 50:
        consumed_by_berry += EAT_COST

        # Commit health to berry according to heuristic
        value = heuristic(berry_poisonous_chance, health)
        consumed_by_berry += value

        random_roll = random.randint(1, 100)
        if random_roll >= berry_poisonous_chance:
            # Berry is not poisonous
            health -= consumed_by_berry
            return health + 2 * consumed_by_berry

    return health - consumed_by_berry


def heuristic(berry_poisonous_chance, health):
    # Kelly criterion
    return math.floor(((100 - berry_poisonous_chance) - berry_poisonous_chance) / 100 * health)


def simulate_game():
    health = STARTING_HEALTH
    while health > LOSING_HEALTH:
        health = move(health)

        if health >= WINNING_HEALTH:
            return 1

    return 0


def main():
    wins = 0
    for i in range(0, NR_OF_GAMES):
        wins += simulate_game()

    print(f'{wins} out of {NR_OF_GAMES}, {wins / NR_OF_GAMES * 100}%')


main()

@backnotprop
Copy link
Author

I believe the simulation is not completely accurate. It allows consumed_by_berry to be larger that the bear's strength (essentially, allows "cheating" by betting up to health + MOVE_COST + EAT_COST). LOSING_HEALTH should be MOVE_COST + EAT_COST - 1 as you need to "kill the bear" if its health ever drops below MOVE_COST + EAT_COST (at that point it can eat no more berries and will surely die, the current code allows it to bet "into the negative" and recover). Also, when determining the final "consumed_by_berry" you should replace consume_by_berry += value with:

consumed_by_berry = max(consumed_by_berry, value)

Interestingly, these changes seem to slightly increase the p(win) from 92.6 to 92.9% - which is counterintuitive, as one of them outright kills bears that remained alive previously

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment