Skip to content

Instantly share code, notes, and snippets.

@XMPPwocky
Created August 9, 2021 21:55
Show Gist options
  • Save XMPPwocky/cc605d33059d3d387382652278370a0d to your computer and use it in GitHub Desktop.
Save XMPPwocky/cc605d33059d3d387382652278370a0d to your computer and use it in GitHub Desktop.
# ye olde imports
import math, random

Imagine a virus which is, somehow, a single protein with only 3 "amino acids" (actually just uppercase letters here, oh well)

Currently, only one type is circulating, which is "XYZ".

EXISTING_TYPE = "XYZ"

But there's a possible variant, the Super Nasty Variant, "BAD", which escapes antibodies and is hard to even create good antibodies against (meaning even a vaccine targeted specifically at it has trouble):

SUPER_NASTY_TYPE = "BAD"

Now we'll run a comically oversimplified simulation of evolution: starting from a population of 100 copies of the existing virus (XYZ)...

POPULATION_SIZE = 100

def make_starting_population(): return [EXISTING_TYPE] * POPULATION_SIZE

Each generation, every virus makes a copy of itself (to form a population twice as big), possibly mutating in the process.

Then, each virus may die (in this case, be neutralized by antibodies or whatever); the odds of survival each generation is the virus's fitness.

Note that the steady-state here happens for viruses with fitness of 0.5 (they make a copy of themselves each generation, but on average the original or the copy dies).

def evolve(starting_population, fitness):
    population = starting_population

    new_population = [virus for virus in population] 
    
    
    # each virus makes a copy of itself, maybe mutating...
    for virus in population:
        new_population.append(maybe_mutate(virus))
    # but then some maybe die, based on their fitness:

    population = []
    
    for virus in new_population:
        if fitness(virus) > random.random():
            # virus survived
            population.append(virus)
            
    if not population:
        population = [EXISTING_TYPE] # goofy hack so that the virus never completely dies out

    return population

def geometric_mean(x):
    if not x: return float("nan")
    
    return math.prod(x) / len(x)

MAX_GENERATIONS = 2500 # timeout to keep things sensible
def run_experiment(fitness_fn, quiet=True):
    population = make_starting_population()

    if not quiet: print("Generation\tPop. size\tAvg. fitness\tMax. fitness")
    for generation in range(MAX_GENERATIONS):
        if not population:
            if not quiet: print("Virus died out...")
            break
            
        if generation % 100 == 0:
            fitnesses = [fitness_fn(virus) for virus in population]
            avg_fitness = geometric_mean(fitnesses)
            max_fitness = max(fitnesses)
            if not quiet: print("{:5}\t\t{:8}\t{:5.4f}\t\t{:5.4f}".format(generation, len(population), avg_fitness, max_fitness))

        population = evolve(population, fitness_fn)
        if SUPER_NASTY_TYPE in population:
            if not quiet: print("Super Nasty Variant evolved in {} generations!".format(generation))
            return generation

In this model, "mutations" are just random amino acid substitutions:

ERROR_RATE = 1e-3 # 1/1000 chance per amino acid to replace it with a random one

def random_letter():
    return chr(ord("A")+random.randint(0, 26))

def maybe_mutate(virus):
    # random substitutions
    newvirus = virus
    
    for i in range(len(newvirus)):
        if random.random() < ERROR_RATE:
            # mutate this position
            newvirus = newvirus[:i] + random_letter() + newvirus[i+1:]
    return newvirus

If each of the individual 3 substitutions (X->B, Y->A, Z->D) required to get from the existing type to the Super Nasty Type confers a small advantage on its own...

SINGLE_SUBSTITUTION_FITNESS_GAIN = 1.05 # 5% advantage per substitution

BASE_FITNESS = 0.5 # steady state...

def fitness_normal(virus):
    num_mutations = 0
    if virus[0] == "B": num_mutations += 1
    if virus[1] == "A": num_mutations += 1
    if virus[2] == "D": num_mutations += 1

    return BASE_FITNESS * pow(SINGLE_SUBSTITUTION_FITNESS_GAIN, num_mutations)

then within a few thousand generations, the Super Nasty Variant usually shows up:

for experiment in range(20):
    generations_to_nasty = run_experiment(fitness_normal, quiet=True)
    if generations_to_nasty is not None:
        print("Experiment #{}: Super Nasty Variant evolved in {} generations".format(experiment, generations_to_nasty))
    else:
        print("Experiment #{}: Super Nasty Variant never evolved".format(experiment))
Experiment #0: Super Nasty Variant never evolved
Experiment #1: Super Nasty Variant evolved in 2445 generations
Experiment #2: Super Nasty Variant evolved in 701 generations
Experiment #3: Super Nasty Variant never evolved
Experiment #4: Super Nasty Variant evolved in 331 generations
Experiment #5: Super Nasty Variant evolved in 722 generations
Experiment #6: Super Nasty Variant evolved in 472 generations
Experiment #7: Super Nasty Variant evolved in 2227 generations
Experiment #8: Super Nasty Variant never evolved
Experiment #9: Super Nasty Variant evolved in 627 generations
Experiment #10: Super Nasty Variant evolved in 516 generations
Experiment #11: Super Nasty Variant evolved in 195 generations
Experiment #12: Super Nasty Variant evolved in 693 generations
Experiment #13: Super Nasty Variant evolved in 300 generations
Experiment #14: Super Nasty Variant never evolved
Experiment #15: Super Nasty Variant evolved in 1506 generations
Experiment #16: Super Nasty Variant evolved in 837 generations
Experiment #17: Super Nasty Variant evolved in 643 generations
Experiment #18: Super Nasty Variant evolved in 302 generations
Experiment #19: Super Nasty Variant never evolved

But if all 3 mutations must occur together to get a fitness gain...

def fitness_new(virus):
    num_mutations = 0
    if virus[0] == "B": num_mutations += 1
    if virus[1] == "A": num_mutations += 1
    if virus[2] == "D": num_mutations += 1

    # now, without all 3 mutations, it's no better than no mutations:
    if num_mutations < 2: num_mutations = 0
        
    return BASE_FITNESS * pow(SINGLE_SUBSTITUTION_FITNESS_GAIN, num_mutations)

... it's much rarer (and slower) for the variant to evolve in the first place, even though it's just as bad when it does show up.

for experiment in range(20):
    generations_to_nasty = run_experiment(fitness_new, quiet=True)
    if generations_to_nasty is not None:
        print("Experiment #{}: Super Nasty Variant evolved in {} generations".format(experiment, generations_to_nasty))
    else:
        print("Experiment #{}: Super Nasty Variant never evolved".format(experiment))
Experiment #0: Super Nasty Variant never evolved
Experiment #1: Super Nasty Variant never evolved
Experiment #2: Super Nasty Variant never evolved
Experiment #3: Super Nasty Variant never evolved
Experiment #4: Super Nasty Variant never evolved
Experiment #5: Super Nasty Variant never evolved
Experiment #6: Super Nasty Variant never evolved
Experiment #7: Super Nasty Variant never evolved
Experiment #8: Super Nasty Variant never evolved
Experiment #9: Super Nasty Variant never evolved
Experiment #10: Super Nasty Variant never evolved
Experiment #11: Super Nasty Variant never evolved
Experiment #12: Super Nasty Variant never evolved
Experiment #13: Super Nasty Variant never evolved
Experiment #14: Super Nasty Variant never evolved
Experiment #15: Super Nasty Variant never evolved
Experiment #16: Super Nasty Variant never evolved
Experiment #17: Super Nasty Variant never evolved
Experiment #18: Super Nasty Variant evolved in 943 generations
Experiment #19: Super Nasty Variant never evolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment