backnotprop/bear_survival.md

## bear_survival.md

      
    Raw
  

              bear_survival.md
            
          
    Looking for a solution to my below game problem. I believe it to require some sort of reinforcement learning, dynamic programming, or probabilistic programming solution, but am unsure... This is my original problem, and is part of an initiative to create "unique and challenging problem that you're able to conceptualize and then solve. 3 Judging criteria: uniqueness, complexity, and solution (no particular weighting and scoring may favor uniqueness/challenge over solution"
Inspirations: Conway's Game of Life, DeepMind's Starcraft Challenge, deep Q-learning, probabilistic programming
BEAR SURVIVAL

A bear is preparing for hibernation. A bear must reach life-strength 1000 in order to rest & survive the winter. A bear starts off at a health of 500. A bear explores an environment of magic berries. A bear makes a move  (chosen randomly with no optional direction) and comes across a berry each time. There are 100 different types of berries that all appear across the wilderness equally and infinitely.
A magic berry always consumes 20 life from the bear upon arrival (this not an energy cost for moving and we should not think of it as such). A bear may then choose to give more, all, or none of its remaining life to the berry. If eaten, the berry may provide back to the bear 2x the amount of life given. Berries, however, are not the same and a bear knows this. A bear knows that any berry has some percentage of being poisonous.  Of the 100 different types of berries, each may be 0%-100% poisonous. A berry that is 0% poisonous is the perfect berry and a bear knows that it should commit all of its remaining life to receive max health gain. If a bear wants to eat the berry, it must commit at least 20 more health. Again, a bear does not have to eat the berry, but if it chooses not to, it walks away and does not get the original 20 back.
Example: On a bear's first move (at 500 life), it comes across a magic berry and the berry automatically takes 20 life.  The bear notices that the berry is 0% poisonous, the perfect berry, and gives its remaining 480 health, eats the berry, and then receives 1000 health gain. The bear has reached it's goal, hibernates, and wins the game. However, if that first berry was 100% poisonous, the anti-berry, and the bear committed all of its remaining life it would've received back 0 health gain, died, and lost the game. A bear knows to never eat the anti-berry. It knows it can come across any poisonous probability value from 0-100 (3,25,52,99, etc).
A bear must be picky & careful, but also bold & smart about how much life it wants to commit per berry, per move. A bear knows that if it never eats, it will eventually die as it loses 20 health per berry, per move.
While it's important for an individual bear to survive, it is even more important for the bear population to not go extinct. A population is going extinct if they lose over half of the population that year. Bears in a population, and their consumption of berries, are completely independent of each other.
Questions:

May we find a bear's optimal strategy for committing health & eating
berries to reach 1000 health gain?
Is the bear population eventually doomed to a unfavorable environment?

Bonus Complexity:
Winter is coming, and conditions grow progressively harsher over time. A bear knows that every 10 moves, each berry will consume 20 * (fib(i)/ environmentFactor). fib(i) stands for fibonacci-sequence at index i, starting at 1. For all indexes where the progression is less than 20, a berry's initial health consumption remains at 20. environmentFactor is a single environment's progressive-harshness variable (how harsh winter becomes over time). The bear population is currently in an environment with environmentFactor of 4. Spelled out:
Moves 01-10: Berries consume 20 --  20*(1/4)
Moves 10-20: Berries consume 20 --  20*(1/4)
Moves 20-30: Berries consume 20 --  20*(2/4)
Moves 30-40: Berries consume 20 --  20*(3/4)
Moves 40-50: Berries consume 25 --  20*(5/4)
Moves 50-60: Berries consume 40 --  20*(8/4)
Moves 60-70: Berries consume 65 --  20*(13/4)
Moves 70-80: Berries consume 105 --  20*(21/4)
Moves 80-90: Berries consume 170 --  20*(34/4)
Moves 90-100: Berries consume 275 --  20*(55/4)
Moves 100-110: Berries consume 445 --  20*(89/4)
... and so on ...

Same questions as above, with a third: if this environment is proven unfavorable, and extinction unavoidable, what maximum environment/environmentFactor must the bear population move to in order to avoid extinction? (this may or may not exist if a berry's requirement of 20 initial life is always unfavorable without any progression).
Further Details:
QUESTION:

Can you give an example of what happens when a bear eats a semi-poisonous berry (e.g. 20%)?


Also, is the bear always immediately aware of the poison value of berries?


In this, it also seems like you're using health, life, strength, life-strength, health-gain, etc interchangeably. Are they all the same thing?

ANSWER:

if the bear eats the poison berry of 20%, then it becomes a probability problem of whether or not the berry provides back life or keeps the health amount committed by the bear. Example: the bear is at  400, the next move & berry take the initial  20 (bear is at 380 now), the bear decides to commit an additional 80 (now at 300) and eat the berry. 8/10 times the berry will return to the bear 200 (2x 100 committed -- bear ends turn at 500), 2/10 times the berry returns nothing and the bear must move on with 300 life.
the bear is always immediately aware of the poison value of a berry.
life/health/strength are all the same thing.