-
how to use those with unknown phenotype and check places where we do use them.
-
any additional places to emit warnings?
-
there is a
--permissive
flag (strict=False) that prevents considering family relations in the filtering. default is strict. -
function signatures are:
def auto_rec(self, min_depth=0, gt_ll=False, strict=True):
-
if there are no affecteds, do we return nothing (and a warning?)
- all affected must be hom_alt
- no unaffected can be hom_alt (can be unknown)
- if parents exist they must be unaffected and het for all affected kids (can't have unknown parent).
- if there are no affecteds that have a parent, a warning is issued.
- all affecteds must be het
- no unaffected can be het or homalt (can be unknown)
- de_novo mutations are not auto_dom (at least not in the first generation)
- parents of affected cant have unknown phenotype.
- all affected kids must have at least 1 affected parent
- if no affected has a parent, a warning is issued.
-
should this be autosomal dominant (mom is unknown):
3_dad(3_dad;affected),3_mom(3_mom;unknown),3_kid(3_kid;affected) T/C,T/C,T/C
- all affected must be het
- all unaffected must be homref (or homalt)
- at least 1 affected kid must have unaffected parents
- if an affected has affected parents, it's not de_novo. (see note from Aaron).
- all affected kids must have unaffected (or no) parents
- warning none of the affected samples have without parents.
filter by variant:
- affecteds must be het
- unaffecteds are not hom_alt
- TODO: consider case where 1 var of CH is de novo
aggregate by gene, test by pairs:
- phase above sites to remove false (Jessica, we think all sites can be phased; agree?)
- remove candidate pairs iff unaffected has same pair
- ???
(no strict mode) "opposite homozygotes" == 1 sample hom_ref the other is hom_alt
kid and one parent are opposite homozygotes other parent is het.
parents are opposite homozygotes. kid is homozygote.
everyone is homozygote. kid is opposite to parents.
kid is het. parents are both hom_ref or both hom_alt.
I might be being dense, but I'm not sure I understand what "if parents exist they must be either affected or het for all affected kids" (for auto_rec model) or "if no affected has a parent, a warning is issued" means?
For comp_het, I'm not sure it makes sense to do this, but throwing it out there just in case:
One could in theory partially phase during the by-variant step -- a variant that is het in all the affected kids should be het in one and only one parent (if parents are available). Then during the by-gene step, you check possible comp_het pairs to make sure that in each pair, one variant comes from each parent. I agree that with parents, you can always phase every variant. Of course without parents, you have to skip the in-tool phasing step and just produce all passing, shared comp_het pairs.
Another mendelian_error possibility that's not super important but interesting to consider: comp_het pairs in which one allele is inherited from a parent and the other allele in the pair is a de novo.