- State from the beginning that you are focusing on exome
- Talk through the pLI figure in more detail
- Any slide needs to talked through in detail if you present it
- Make it clear what missense variants are what you are seeing
- Should the observed CpG density be modulated by the codon content of the region
- Model ideas?
- Exclude singletons???
- use just synonymous density as the independent variable
- Molly Przezorski, CpG
- Simplify the explanation of each of the essential gene descriptions
View genome.txt
chr1 10 |
View go.sh
cat a.bed | |
chr1 10 50 10 | |
cat b.bed | |
chr1 20 40 20 | |
cat c.bed | |
chr1 30 33 30 | |
# Find the sub-intervals shared and unique to each file. |
View notes.md
View merge-callable.py
#!/usr/bin/env python | |
""" | |
given a callable file from goleft depth on stdin, merge adjacent LOW_COVERAGE and CALLABLE regions. | |
also split regions that are larger than max_region. | |
this is to even out parallelism for sending regions to freebayes. | |
""" | |
from __future__ import print_function, division | |
import sys | |
from itertools import groupby | |
from operator import itemgetter |
View gist:0dbc7f40a29c898f79482b98ace232a8
# GeneDx: https://www.genedx.com/test-catalog/disorders/early-onset-epileptic-encephalopathy-andor-infantile-spasms/ | |
# Invitae: https://www.invitae.com/en/physician/tests/03402/ | |
ADSL | |
ALDH7A1 | |
ALG13 | |
ARHGEF9 | |
ARID1B | |
ARX | |
ATP1A2 | |
ATP6AP2 |
View run.py
# STEP 1: for all fams, print each SGS region (chrom, start, end) | |
import sys | |
import subprocess as sub | |
def run_jobs(commands): | |
""" | |
This function takes a set of max_work commands and executes | |
them with rj | |
""" | |
f = open('rungemini.sh', 'w') |
View gist:8d9c2767d2495ba4b9bf6f555d29c088
# mnake sure all rows have 99 fields | |
$ awk 'BEGIN{FS="\t"} {print NF}' /uufs/chpc.utah.edu/common/home/u1072926/gemini_queries/all.txt | uniq | |
99 | |
# how many HET (1) and HOM_ALT (3) genotypes were there? | |
$ awk 'BEGIN{FS="\t"} {print $99}' /uufs/chpc.utah.edu/common/home/u1072926/gemini_queries/all.txt | sort | uniq -c | |
# get rid of headers except for the first one | |
(head -n 1 /uufs/chpc.utah.edu/common/home/u1072926/gemini_queries/all.txt; grep -v gt_types /uufs/chpc.utah.edu/common/home/u1072926/gemini_queries/all.txt) |
View make-simrep-micsat-bed.sh
# download simple repeats from UCSC and convert to BED | |
curl -s http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/simpleRepeat.txt.gz \ | |
| gzcat \ | |
| cut -f 2-5 \ | |
> simrep.hg19.bed | |
# download microsatellites from UCSC and convert to bed | |
curl -s http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/microsat.txt.gz \ | |
| gzcat \ | |
| cut -f 2-5 \ |
View make-humvar-snps.bed
# get HumVar | |
wget ftp://genetics.bwh.harvard.edu/pph2/training/humvar-2011_12.predictions.tar.gz | |
tar -zxvf humvar-2011_12.predictions.tar.gz | |
# get db snp | |
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp138.txt.gz | |
gunzip snp138.txt.gz | |
# get the deleterious SNPs | |
grep snp138.txt -wFf <(grep rs humvar-2011_12.deleterious.pph.output | cut -f 5) \ |
View sim.R
tosses <- 200 | |
experiments <- 1000 | |
hist((rbinom(experiments, tosses, 0.5) / tosses), | |
breaks=20, xlim=c(0,1), | |
main=paste("Distribution of % heads from", experiments, | |
"experiments with", tosses, "tosses each"), | |
xlab = "Fraction of tosses that were heads") |
NewerOlder