Skip to content

Instantly share code, notes, and snippets.

Avatar

Aaron Quinlan arq5x

View GitHub Profile
@arq5x
arq5x / go.sh
Last active Feb 3, 2020
compute average scores for share intervals
View go.sh
cat a.bed
chr1 10 50 10
cat b.bed
chr1 20 40 20
cat c.bed
chr1 30 33 30
# Find the sub-intervals shared and unique to each file.
@arq5x
arq5x / notes.md
Created Mar 8, 2017
Jim's Thesis Talk
View notes.md
  1. State from the beginning that you are focusing on exome
  2. Talk through the pLI figure in more detail
  3. Any slide needs to talked through in detail if you present it
  4. Make it clear what missense variants are what you are seeing
  5. Should the observed CpG density be modulated by the codon content of the region
  6. Model ideas?
    • Exclude singletons???
    • use just synonymous density as the independent variable
  7. Molly Przezorski, CpG
  8. Simplify the explanation of each of the essential gene descriptions
View merge-callable.py
#!/usr/bin/env python
"""
given a callable file from goleft depth on stdin, merge adjacent LOW_COVERAGE and CALLABLE regions.
also split regions that are larger than max_region.
this is to even out parallelism for sending regions to freebayes.
"""
from __future__ import print_function, division
import sys
from itertools import groupby
from operator import itemgetter
@arq5x
arq5x / gist:0dbc7f40a29c898f79482b98ace232a8
Created Nov 13, 2016
Union EIEE Genes on GeneDx and Invitae Gene Panels
View gist:0dbc7f40a29c898f79482b98ace232a8
# GeneDx: https://www.genedx.com/test-catalog/disorders/early-onset-epileptic-encephalopathy-andor-infantile-spasms/
# Invitae: https://www.invitae.com/en/physician/tests/03402/
ADSL
ALDH7A1
ALG13
ARHGEF9
ARID1B
ARX
ATP1A2
ATP6AP2
@arq5x
arq5x / run.py
Created Nov 10, 2016
python batch submission
View run.py
# STEP 1: for all fams, print each SGS region (chrom, start, end)
import sys
import subprocess as sub
def run_jobs(commands):
"""
This function takes a set of max_work commands and executes
them with rj
"""
f = open('rungemini.sh', 'w')
View gist:8d9c2767d2495ba4b9bf6f555d29c088
# mnake sure all rows have 99 fields
$ awk 'BEGIN{FS="\t"} {print NF}' /uufs/chpc.utah.edu/common/home/u1072926/gemini_queries/all.txt | uniq
99
# how many HET (1) and HOM_ALT (3) genotypes were there?
$ awk 'BEGIN{FS="\t"} {print $99}' /uufs/chpc.utah.edu/common/home/u1072926/gemini_queries/all.txt | sort | uniq -c
# get rid of headers except for the first one
(head -n 1 /uufs/chpc.utah.edu/common/home/u1072926/gemini_queries/all.txt; grep -v gt_types /uufs/chpc.utah.edu/common/home/u1072926/gemini_queries/all.txt)
View make-simrep-micsat-bed.sh
# download simple repeats from UCSC and convert to BED
curl -s http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/simpleRepeat.txt.gz \
| gzcat \
| cut -f 2-5 \
> simrep.hg19.bed
# download microsatellites from UCSC and convert to bed
curl -s http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/microsat.txt.gz \
| gzcat \
| cut -f 2-5 \
@arq5x
arq5x / make-humvar-snps.bed
Last active Feb 6, 2016
Make a BED file of HumVar variants with rsIds
View make-humvar-snps.bed
# get HumVar
wget ftp://genetics.bwh.harvard.edu/pph2/training/humvar-2011_12.predictions.tar.gz
tar -zxvf humvar-2011_12.predictions.tar.gz
# get db snp
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp138.txt.gz
gunzip snp138.txt.gz
# get the deleterious SNPs
grep snp138.txt -wFf <(grep rs humvar-2011_12.deleterious.pph.output | cut -f 5) \
@arq5x
arq5x / sim.R
Created Feb 3, 2016
Binomial coin toss simulation
View sim.R
tosses <- 200
experiments <- 1000
hist((rbinom(experiments, tosses, 0.5) / tosses),
breaks=20, xlim=c(0,1),
main=paste("Distribution of % heads from", experiments,
"experiments with", tosses, "tosses each"),
xlab = "Fraction of tosses that were heads")