Skip to content

Instantly share code, notes, and snippets.


Aaron Quinlan arq5x

View GitHub Profile
arq5x /
Last active Feb 3, 2020
compute average scores for share intervals
cat a.bed
chr1 10 50 10
cat b.bed
chr1 20 40 20
cat c.bed
chr1 30 33 30
# Find the sub-intervals shared and unique to each file.
arq5x /
Created Mar 8, 2017
Jim's Thesis Talk
  1. State from the beginning that you are focusing on exome
  2. Talk through the pLI figure in more detail
  3. Any slide needs to talked through in detail if you present it
  4. Make it clear what missense variants are what you are seeing
  5. Should the observed CpG density be modulated by the codon content of the region
  6. Model ideas?
    • Exclude singletons???
    • use just synonymous density as the independent variable
  7. Molly Przezorski, CpG
  8. Simplify the explanation of each of the essential gene descriptions
#!/usr/bin/env python
given a callable file from goleft depth on stdin, merge adjacent LOW_COVERAGE and CALLABLE regions.
also split regions that are larger than max_region.
this is to even out parallelism for sending regions to freebayes.
from __future__ import print_function, division
import sys
from itertools import groupby
from operator import itemgetter
arq5x / gist:0dbc7f40a29c898f79482b98ace232a8
Created Nov 13, 2016
Union EIEE Genes on GeneDx and Invitae Gene Panels
View gist:0dbc7f40a29c898f79482b98ace232a8
# GeneDx:
# Invitae:
arq5x /
Created Nov 10, 2016
python batch submission
# STEP 1: for all fams, print each SGS region (chrom, start, end)
import sys
import subprocess as sub
def run_jobs(commands):
This function takes a set of max_work commands and executes
them with rj
f = open('', 'w')
View gist:8d9c2767d2495ba4b9bf6f555d29c088
# mnake sure all rows have 99 fields
$ awk 'BEGIN{FS="\t"} {print NF}' /uufs/ | uniq
# how many HET (1) and HOM_ALT (3) genotypes were there?
$ awk 'BEGIN{FS="\t"} {print $99}' /uufs/ | sort | uniq -c
# get rid of headers except for the first one
(head -n 1 /uufs/; grep -v gt_types /uufs/
# download simple repeats from UCSC and convert to BED
curl -s \
| gzcat \
| cut -f 2-5 \
> simrep.hg19.bed
# download microsatellites from UCSC and convert to bed
curl -s \
| gzcat \
| cut -f 2-5 \
arq5x / make-humvar-snps.bed
Last active Feb 6, 2016
Make a BED file of HumVar variants with rsIds
View make-humvar-snps.bed
# get HumVar
tar -zxvf humvar-2011_12.predictions.tar.gz
# get db snp
gunzip snp138.txt.gz
# get the deleterious SNPs
grep snp138.txt -wFf <(grep rs humvar-2011_12.deleterious.pph.output | cut -f 5) \
arq5x / sim.R
Created Feb 3, 2016
Binomial coin toss simulation
View sim.R
tosses <- 200
experiments <- 1000
hist((rbinom(experiments, tosses, 0.5) / tosses),
breaks=20, xlim=c(0,1),
main=paste("Distribution of % heads from", experiments,
"experiments with", tosses, "tosses each"),
xlab = "Fraction of tosses that were heads")