Skip to content

Instantly share code, notes, and snippets.

@brentp
Last active January 3, 2018 05:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brentp/4b3cbeebfaa7360b5ce6 to your computer and use it in GitHub Desktop.
Save brentp/4b3cbeebfaa7360b5ce6 to your computer and use it in GitHub Desktop.
  • how to use those with unknown phenotype and check places where we do use them.

  • any additional places to emit warnings?

  • there is a --permissive flag (strict=False) that prevents considering family relations in the filtering. default is strict.

  • function signatures are: def auto_rec(self, min_depth=0, gt_ll=False, strict=True):

  • if there are no affecteds, do we return nothing (and a warning?)

auto_rec

genotypes

  • all affected must be hom_alt
  • no unaffected can be hom_alt (can be unknown)

strict

  • if parents exist they must be unaffected and het for all affected kids (can't have unknown parent).
  • if there are no affecteds that have a parent, a warning is issued.

auto_dom

genotypes

  • all affecteds must be het
  • no unaffected can be het or homalt (can be unknown)
  • de_novo mutations are not auto_dom (at least not in the first generation)

strict

  • parents of affected cant have unknown phenotype.
  • all affected kids must have at least 1 affected parent
  • if no affected has a parent, a warning is issued.

questions

  • should this be autosomal dominant (mom is unknown):

    3_dad(3_dad;affected),3_mom(3_mom;unknown),3_kid(3_kid;affected) T/C,T/C,T/C

de_novo

genotypes

  • all affected must be het
  • all unaffected must be homref (or homalt)
  • at least 1 affected kid must have unaffected parents

strict

  • if an affected has affected parents, it's not de_novo. (see note from Aaron).
  • all affected kids must have unaffected (or no) parents
  • warning none of the affected samples have without parents.

comp_het

filter by variant:

  • affecteds must be het
  • unaffecteds are not hom_alt
  • TODO: consider case where 1 var of CH is de novo

aggregate by gene, test by pairs:

  • phase above sites to remove false (Jessica, we think all sites can be phased; agree?)
  • remove candidate pairs iff unaffected has same pair

strict

  • ???

mendelian_error

(no strict mode) "opposite homozygotes" == 1 sample hom_ref the other is hom_alt

LOH

kid and one parent are opposite homozygotes other parent is het.

uniparental disomy

parents are opposite homozygotes. kid is homozygote.

implausible de_novo

everyone is homozygote. kid is opposite to parents.

plausible de_novo

kid is het. parents are both hom_ref or both hom_alt.

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jxchong
Copy link

jxchong commented May 19, 2015

I might be being dense, but I'm not sure I understand what "if parents exist they must be either affected or het for all affected kids" (for auto_rec model) or "if no affected has a parent, a warning is issued" means?

For comp_het, I'm not sure it makes sense to do this, but throwing it out there just in case:
One could in theory partially phase during the by-variant step -- a variant that is het in all the affected kids should be het in one and only one parent (if parents are available). Then during the by-gene step, you check possible comp_het pairs to make sure that in each pair, one variant comes from each parent. I agree that with parents, you can always phase every variant. Of course without parents, you have to skip the in-tool phasing step and just produce all passing, shared comp_het pairs.

Another mendelian_error possibility that's not super important but interesting to consider: comp_het pairs in which one allele is inherited from a parent and the other allele in the pair is a de novo.

@jxchong
Copy link

jxchong commented May 19, 2015

for de_novo --strict, produce a warning when there is only the affected proband in a family without parents?(essentially GEMINI would be outputting candidate de novo variants, which would be every het variant in the proband)

@arq5x
Copy link

arq5x commented May 20, 2015

Hey Jessica, good questions.

I might be being dense, but I'm not sure I understand what "if parents exist they must be either affected or het for all affected kids" (for auto_rec model) or "if no affected has a parent, a warning is issued" means?

The "strict" mode is largely motivated by the potential for families where there are individuals (especially parents) whose phenotype is unknown. For example, lacking "strict" mode for autosomal recessive, the rule is: all affecteds must be homozygous alternate and anyone that is either unaffected or _unknown_ must not be hom_alt. This unknown opens up the possibility that a parent of an affected child, while the phenotype is unknown, may in truth be affected and this truth would violate the "strictness" of the expected autosomal recessive inheritance pattern.

Therefore, in "strict" mode for autosomal recessive, we would further enforce that for affected kids (i.e., excluding affected grandparents in the eldest generation of a pedigree), both parents must be unaffected and heterozygous (I think @brentp just had a few typos here).

Now, the raising a warning bit is meant to alert users when a given family does not make sense for a given inheritance model. For example, if a child and her parent are affected and the user asks for auto_rec, we could warn that it doesn't make sense. Similarly, if an affected kid doesn't have parents in the database for a given family, then we can't test for the expected inheritance pattern.

For comp_het, I'm not sure it makes sense to do this, but throwing it out there just in case:
One could in theory partially phase during the by-variant step -- a variant that is het in all the affected kids     should be het in one and only one parent (if parents are available). Then during the by-gene step, you     check possible comp_het pairs to make sure that in each pair, one variant comes from each parent. I agree that with parents, you can always phase every variant. Of course without parents, you have to skip the in-  tool phasing step and just produce all passing, shared comp_het pairs.

This is interesting - we need to give that a bit of thought.

Another mendelian_error possibility that's not super important but interesting to consider: comp_het pairs in which one allele is inherited from a parent and the other allele in the pair is a de novo.

Yes, completely agree. That is an interesting case and it is something we could possibly test for and add a special status to the output to flag these cases.

@arq5x
Copy link

arq5x commented May 20, 2015

@brentp - "if an affected has affected parents, it's not de_novo." I this this may not be phrased quite right, as when an affected parent has a causal de novo themselves and passes it onto an offspring, that would lead to this situation (i.e., the de novo dominant model).

@oleraj
Copy link

oleraj commented May 21, 2015

"For example, lacking "strict" mode for autosomal recessive, the rule is: all affecteds must be homozygous alternate and anyone that is either unaffected or unknown must not be hom_alt"

Aaron, just double-checking here, do you mean in the non-strict model, the rule would be "all affecteds must be homozygous alternate and anyone that is unaffected must not be hom_alt" (ignoring those that have unknown phenotype)? That's my interpretation of the auto_rec model notes above. In particular this phrase from the notes above for auto_rec -- although I'm not quite sure I understand, it seems the "unknown" part is referring to genotype instead of phenotype:
"no unaffected can be hom_alt (can be unknown)".

@brentp
Copy link
Author

brentp commented May 21, 2015

@oleraj, I agree, there is confusion about unknown phenotype vs unknown genotype. Trying to address that. I'm looking through the other comments and adjust as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment