Skip to content

Instantly share code, notes, and snippets.

View arq5x's full-sized avatar

Aaron Quinlan arq5x

View GitHub Profile
@arq5x
arq5x / make-humvar-snps.bed
Last active February 6, 2016 21:52
Make a BED file of HumVar variants with rsIds
# get HumVar
wget ftp://genetics.bwh.harvard.edu/pph2/training/humvar-2011_12.predictions.tar.gz
tar -zxvf humvar-2011_12.predictions.tar.gz
# get db snp
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp138.txt.gz
gunzip snp138.txt.gz
# get the deleterious SNPs
grep snp138.txt -wFf <(grep rs humvar-2011_12.deleterious.pph.output | cut -f 5) \
@arq5x
arq5x / sim.R
Created February 3, 2016 03:45
Binomial coin toss simulation
tosses <- 200
experiments <- 1000
hist((rbinom(experiments, tosses, 0.5) / tosses),
breaks=20, xlim=c(0,1),
main=paste("Distribution of % heads from", experiments,
"experiments with", tosses, "tosses each"),
xlab = "Fraction of tosses that were heads")
@arq5x
arq5x / methods.sh
Last active March 14, 2018 20:05
breast-cancer-evolution-cnv-segmentation
# bedtools --version
# bedtools v2.24.0-14-gaa11ef9
########################################################
# Create a BED file of 5kb windows with 2.5kb overlap
# tiling build 37 (hg19) of the human genome
########################################################
bedtools makewindows -g hg19.txt -w 5000 -s 2500 > hg19.w5k.s2.5k.bedg
########################################################
@arq5x
arq5x / example.sh
Last active January 24, 2019 13:02
Natural sort a VCF
chmod a+x vcfsort.sh
vcfsort.sh trio.trim.vep.vcf.gz
@arq5x
arq5x / example.sh
Created April 24, 2015 16:22
aws s3 CLI
sudo pip install awscli
aws configure
aws s3 ls
aws s3 ls s3://gqt-data
@arq5x
arq5x / example.sh
Created April 4, 2015 20:26
minimum tiling path
cat ivl.bed
chr1 10 30
cat data.bed
chr1 9 20 d1
chr1 12 18 d2
chr1 12 20 d3
chr1 15 16 d4
chr1 25 40 d5
chr1 26 30 d6
@arq5x
arq5x / cl.py
Last active August 29, 2015 14:15
Python simulation of Chutes and Ladders
import sys
import numpy as np
"""
Simulate chutes and ladders.
Reports the number of moves for 1-player to reach the end,
followed by the list of rolls that player had.
Run as follows for 100000 games with 1 player. Report the total
number of moves made by the winning player:
@arq5x
arq5x / complexity.py
Last active February 6, 2019 21:14
kmer fun with jellyfish
import sys
from itertools import *
"""
compute the complexity of each kmer passed in
given the format of the output of `jellyfish dump -ct`
complexity is measured as the number of runs divided
by the total length of the sequence.
e.g., "AAAAA" would be 1/5
and "ACTGC" would be 5/5
@arq5x
arq5x / table_s1.txt
Created January 2, 2015 22:55
Vogelstein Table S1
Cancer_type Lifetime_cancer_incidence Total_cells_tissue Total_Stem_Cells Stem_cell_divisions_per_year Stem_cell_divisions_per_lifetime LCSD
ALL 0.0041 3000000000000 135000000 12 960 129900000000
BCC 0.3 180000000000 5820000000 7.6 608 3550000000000
CLL 0.0052 3000000000000 135000000 12 960 129900000000
Colorectal 0.048 30000000000 200000000 73 5840 1168000000000
Colorectal_FAP 1 30000000000 200000000 73 5840 1168000000000
Colorectal_Lynch 0.5 30000000000 200000000 73 5840 1168000000000
Duodenum_adenocarcinoma 0.0003 680000000 4000000 24 1947 7796000000
Duodenum_adenocarcinoma_with_FAP 0.035 680000000 4000000 24 1947 7796000000
Esophageal_squamous_cell_carcinoma 0.001938 3240000000 846000 17.4 1390 1203000000
@arq5x
arq5x / workflow.sh
Last active August 29, 2015 14:12
big multi-file intersect examples
# 1. Download BED files of 349 DHS experiments from Science, 337, no. 6099, pp. 1190-1195, 7 Sep. 2012
# http://www.uwencode.org/proj/Science_Maurano_Humbert_et_al/
wget http://www.uwencode.org/proj/Science_Maurano_Humbert_et_al/data/all_fdr0.05_hot.tgz
# 2. Unpack.
tar -zxvf all_fdr0.05_hot.tgz
# 3. Make sure all of the files are sorted lexicographically by chrom, then numerically by start.
# This is required for the sweep allgorithm.
# Hint: they are sorted correctly, this is just a sanity check.