Skip to content

Instantly share code, notes, and snippets.

View arq5x's full-sized avatar

Aaron Quinlan arq5x

View GitHub Profile
@arq5x
arq5x / multi.yaml
Created December 4, 2014 23:42
multline
attributes:
name: cpg
version: 0.1
recipe:
full:
recipe_type: bash
recipe_cmds:
- >
mysql --user=genome --host=genome-mysql.cse.ucsc.edu \
@arq5x
arq5x / workflow.sh
Last active August 29, 2015 14:12
big multi-file intersect examples
# 1. Download BED files of 349 DHS experiments from Science, 337, no. 6099, pp. 1190-1195, 7 Sep. 2012
# http://www.uwencode.org/proj/Science_Maurano_Humbert_et_al/
wget http://www.uwencode.org/proj/Science_Maurano_Humbert_et_al/data/all_fdr0.05_hot.tgz
# 2. Unpack.
tar -zxvf all_fdr0.05_hot.tgz
# 3. Make sure all of the files are sorted lexicographically by chrom, then numerically by start.
# This is required for the sweep allgorithm.
# Hint: they are sorted correctly, this is just a sanity check.
@arq5x
arq5x / table_s1.txt
Created January 2, 2015 22:55
Vogelstein Table S1
Cancer_type Lifetime_cancer_incidence Total_cells_tissue Total_Stem_Cells Stem_cell_divisions_per_year Stem_cell_divisions_per_lifetime LCSD
ALL 0.0041 3000000000000 135000000 12 960 129900000000
BCC 0.3 180000000000 5820000000 7.6 608 3550000000000
CLL 0.0052 3000000000000 135000000 12 960 129900000000
Colorectal 0.048 30000000000 200000000 73 5840 1168000000000
Colorectal_FAP 1 30000000000 200000000 73 5840 1168000000000
Colorectal_Lynch 0.5 30000000000 200000000 73 5840 1168000000000
Duodenum_adenocarcinoma 0.0003 680000000 4000000 24 1947 7796000000
Duodenum_adenocarcinoma_with_FAP 0.035 680000000 4000000 24 1947 7796000000
Esophageal_squamous_cell_carcinoma 0.001938 3240000000 846000 17.4 1390 1203000000
@arq5x
arq5x / cl.py
Last active August 29, 2015 14:15
Python simulation of Chutes and Ladders
import sys
import numpy as np
"""
Simulate chutes and ladders.
Reports the number of moves for 1-player to reach the end,
followed by the list of rolls that player had.
Run as follows for 100000 games with 1 player. Report the total
number of moves made by the winning player:
@arq5x
arq5x / example.sh
Created April 4, 2015 20:26
minimum tiling path
cat ivl.bed
chr1 10 30
cat data.bed
chr1 9 20 d1
chr1 12 18 d2
chr1 12 20 d3
chr1 15 16 d4
chr1 25 40 d5
chr1 26 30 d6
@arq5x
arq5x / example.sh
Created April 24, 2015 16:22
aws s3 CLI
sudo pip install awscli
aws configure
aws s3 ls
aws s3 ls s3://gqt-data
@arq5x
arq5x / ggplot2.tcga_and_1kg_cpv.R
Created January 7, 2011 14:19
Example of using qplot for a bar plot colored by sample or populations
library(ggplot2)
library(gridExtra)
cov <- read.table("/Users/arq5x/Documents/Projects/HallLab/TCGA-1KG/ForKeystone/tcga_and_1kg_span_cov.txt",header=TRUE)
span <- qplot(sample, span_cov, data=cov, fill=factor(type_num), geom="bar",
binwidth=1,
xlab="Sample",
ylab="Spanning coverage") +
opts(axis.ticks = theme_blank(),
axis.text.x = theme_blank(),
axis.title.x = theme_text(size = 18, face = "bold"),
@arq5x
arq5x / rs-exome-pbs.sh
Created February 7, 2011 18:20
RS Exome analysis on the PBS environment
export BATCH1="1094PC0005 1094PC0009 1094PC0012 1094PC0013 "
export BATCH2="1094PC0016 1094PC0017 1094PC0018 1094PC0019 \
1094PC0020 1094PC0021 1094PC0022 1094PC0023 1094PC0025 "
export BATCH3="1478PC0001B 1478PC0002 1478PC0003 1478PC0004 \
1478PC0005 1478PC0006B 1478PC0007B 1478PC0008B \
1478PC0009B 1478PC0010 1478PC0011 1478PC0012 \
1478PC0013B 1478PC0014B 1478PC0015B 1478PC0016 \
1478PC0017B 1478PC0018 1478PC0019 1478PC0020 \
1478PC0021 1478PC0022B 1478PC0023B 1478PC0024B"
export BATCH4="1719PC0001 1719PC0002 1719PC0003 1719PC0004 \
@arq5x
arq5x / t1d-exome.hg19.sh
Last active September 24, 2015 22:47
T1D Loci Exome for all 7 pools on the PBS environment
############################################################
# Pair the alignments.
# Keep proper, on-target (i.e. +/- 500 bp of a probe) pairs.
# Require mapping quality >= 20
############################################################
export DIR=/home/arq5x/cphg-home/projects/t1d/t1d-exome-suna/
export STEPNAME=t1d-ex-bwa-par
export GENOME=/home/arq5x/cphg-home/shared/genomes/hg19/bwa/gatk/hg19_gatk.fa
# Step 1: Get transcripts from UCSC refGene (hg19) into a BED file.
# Notes:
# the awk statement reorders the "raw" columns into BED12 format
# bed12ToBed6 converts the BED12 into discrete BED6 entries for each exon
# - the -n option is new and in the bedtools repository
$ curl -s http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz | \
zcat | \
awk '{OFS="\t"; print $3,$5,$6,$2,$9,$4,$7,$8,"0",$9,$10,$11}' | \
bed12ToBed6 -n \
> refGene.bed