Skip to content

Instantly share code, notes, and snippets.

@sfentress
Last active Oct 2, 2018
Embed
What would you like to do?

biologica 2.0 RFC

Motivation 1

To have a more maintainable library with type definitions

Motivation 2

I have always hated needing to transform organisms back and forth between the BioLogica.Organism class objects and plain-object formats. Frequently in Geni* we need to turn objects into BioLogica.Organisms before we can use them, or, worse, check whether we've already been passed an Organism or a plain object.

For this reason it would be nice to define an Organism so that it was simply a typed object, which could be serialized, read, and used without any further transformations.

As a result of this, it seems that the library should consist only of pure functions which take Organisms (or other typed objects) as inputs (e.g. createGametes(Organism) instead of Organism.createGametes()). A worry is that this may result in some awkward syntax.

Organism definition and creation

Organism

  {
    alleles: "a-a b-b ...",
    sex: "female"
    // optional
    phenotype: {
      trait: characteristic
    },
    // optional
    secondXAlleles: "c d ..."
  }
  • alleles: string
    Fully-defined allele string. Two organisms with the same allele string will always have the same phenotype. Or should this be called genotype?
  • sex: string
  • phenotype: { [s: string]: string; }
    It's useful to be able to read the phenotype of an organism easily, e.g. createOrganism(...).phenotype.color instead of org = createOrganism(...); getCharacteristic(org, 'wings'), so it seems to make sense for this to be a part of any organism generated by the library. However, it is redundant information, so it probably doesn't need to be passed in when using an organism.
  • secondXAlleles: string
    This formalizes the pattern from geni* that allows us to switch sexes without changing the phenotype. Maybe this is unneeded? We could presumably also have a changeSex function that explictly picks second-x-alleles that don't change the phenotype. Do we need this for anything else?

Creation

  organism({
    /** optional. One would pass in fully-specified alleles when re-generating
      phenotypes */
    alleles: "a-a b-b ..."
    /** optional. Explicitly specifying the authored syntax */
    authoredAlleles: "a-a b- -c d-[D1, D2] e-^[E1] XY"
    /** optional */
    sex: "male"
  }): Organism
  
  breed(org: Organism, org: Organism, crossover=true): Organism
  
  breed(org: Organism, org: Organism, quantity: number, crossover=true): Organism[]
  
  fertilize(gamete: Gamete, gamete: Gamete): Organism

Problem: Species

In biologica 1.0, we need to pass in a BioLogica.Species object in any Organism creation method. This seems ugly, and would be even more so now because we'd no longer create Organism classes that have their own references to the species, so we'd have to either add the entire species spec to every organism, or pass in the species spec in every single method (e.g. breed(Organism, Organism, Species)).

One solution would be to have an initialization method that sets up the species from the start:

  import biologica from `biologica`

  const Drakes = biologica(drakeSpecies)
  const org1 = Drakes.organism(...)
  const child = Drakes.breed(org1, org2)

  // instead of

  import { organism, breed } from `biologica`

  const org1 = organism(..., drakeSpecies)
  const child = breed(org1, org2, drakeSpecies)

The top example seems a little uglier than the alternative below, but saves us from including drakeSpecies everywhere. Thoughts?

Edit: Or maybe we can keep the top solution but make it cleaner by using

  import biologica from `biologica`
  const { organism, breed } = biologica(drakeSpecies)
  
  const org1 = organism(...)
  const child = breed(org1, org2)

Small problem of not using classes

This is probably not so important, but classes with methods can often be read in a more literate manner: If we wanted to, say, get the image src for an authored organism in 1.0, we could write

new Organism(authoredAlleles).getImageSrc()

In 2.0 it seems that we would either need to split it into two lines or use the backwards

getImageSrc(organism(authoredAlleles))

(or we could use the futuristic syntax organism(authoredAlleles)::getImageSrc() but let's not.)

Authored alleles syntax

  a1-a2 b1- -c2 d1 e1-[e2, e3] f1-^[f3] XY

This makes the new dashed syntax the prime format (though we probably need to suport authoring with the old a:a1,b:a1 synatax) and adds a couple features Geni* has been needing

  • a1-a2 b1- -c2
    Any allele listed on one side or the other of a dash indicates a requirement for that allele on the left or right chromosome. (Note, we need to make explicit which side is from which parent. This is implicit, not explicit, in 1.0).
  • d1
    A required allele that could be on either side.
  • [e2, e3]
    Either e2 or e3
  • ^[f3]
    Not f3
  • XY
    It seems like it would be nice to be able to specify the sex of an organism directly in the allele string. The sex chromosome are part of the genes, after all, and this would allow for authoring organisms with a single string. Unsure whether it should also be part of the fully-defined allele string, because it is redundant with sex, and it seems like the sex property is a useful one to have in an Organism. That said, sex could also be treated like phenotype: part of the organism object when created by biologica, but not necessary for any functions.

Thoughts?

Gametes

Gametes are generated through meiosis, and it's useful to know where the crossovers occured and the provenance of the alleles, so that we can accurately depict a meiosis animation if needed.

In 1.0 we pass back an object with gametes and metadata as separate properties. I think we can do something similar, with the added addition that the MeiosisData object should have everything it needs to display meiosis, so this includes the original genotype:

MeiosisData

  {
    genotype: `a1-a2 b1-b2 ...`,
    crosses: {
        [chromosome1Name-a]: [500, 1252, ...],
        [chromosome1Name-b]: [...],
        [chromosome2Name-a]: [...],
        [chromosome2Name-b]: [...]
        ...
    },
    haploidCells: [
        [chromosome1Name-a-1, chromosome2Name-b-1, ...]
        ...
    ]
    gametes: [
        `a1 b2 ... X`,
        `a2 b1 ... Y`,
        ...
    ]
  }
  • genotype
    Original genotype of the organism, so we don't need to maintain a reference to it in order to display meiosis
  • crosses
    The location along the length of the chromatid of where crosses occurred. Note a chromosome named chromosome1Name will eventually split into four chromatids. These are made up of two pairs with complementary crosses. So we can call them chromosome1Name-a-1, chromosome1Name-a-2, chromosome1Name-b-1, chromosome1Name-b-2 where chromosome1Name-a-1 chromosome1Name-a-2 have the exact same crosses but alleles from oposite parents.
  • haploidCells
    These are four cells which become the gamete cells, but here we specify which chromatid ends up in each cell, rather than their actual alleles. We need to specify chromatids beause the actual alleles may be identical, but we still want to know which one came from which parent.
  • gametes
    The genotype of each gamete. This is redundant and can be deduced from the above, but it seems useful to put this here rather than requiring another step to transform the above into alleles.

Not sure if there is a clearer way to represent this, and if there are any thoughts on the "haploidCells/gametes" redundancy.

A single gamete in this case is just a string, so we'll have

fertilize(gamete1: string, gamete2: string): Organism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment