danielecook / pubmed pairwise.R
Example of pubmed pairwise searching
# Given two lists of terms, lets see how 'hot' they are together
set1 <- c("ebola","autoimmune","Diabetes","HIV","Glioblastoma","Asthma","Schizophrenia")
set2 <- c("C. elegans","D. Melanogaster","C. japonica", "M. Musculus","S. Cerevisiae")
# Generate all possible pairs
pairs <- expand.grid(set1, set2, stringsAsFactors=F)
danielecook / google_calendar.js
Google App Script Calendar Reservations
* Get a user's name, by accessing contacts.
* @returns {String} FullName, or UserID
* if record not found in contacts.
function getUserName(email){
var user = ContactsApp.getContact(email);
// If user in contacts, return their name
danielecook /
Heterozygote Polarization - Polarizes Heterozygous calls based on a prior likelyhood of identifying a heterozygous call in a VCF File. Useful for calling variants in organisms with low levels of heterozygosity. Frequently, this is the case in hermaphroditic organisms such as C. elegans. #VCF
Heterozygote Polarization Script
bcftools view -M 2 <filename> | python | bcftools view -O b > <filename.het.polarized.bcf>
Tags variants 'pushed' to ref or alt as follows:
AA - Pushed towards reference
AB - Kept as het
danielecook / plot_peaks.R
Plot Peaks from Homer
args <- commandArgs(trailing=T)
if (length(args) == 0) {
setwd("/Users/dancook/Dropbox/Andersen lab/LabFolders/Dan/ForOthers/Maneeshi")
df <- fread("Snail2_Twist_MergedPeaksPS1005kbSummit300bp_Sn_TwEBox.txt")
danielecook /
Generate Fasta sequence lengths
cat file.fa | awk '$0 ~ ">" {print c; c=0;printf substr($0,2,100) "\t"; } $0 !~ ">" {c+=length($0);} END { print c; }'
danielecook / rprofile.R
My .Rprofile
# Place in /Users/Username/.Rprofile
## Create a new invisible environment for all the functions to go in so it doesn't clutter your workspace.
.env <- new.env()
danielecook /
Generates the masked ranges within a fasta file.
import gzip
import io
import sys
import os
# This file will generate a bedfile of the masked regions a fasta file.
# STDIN or arguments
danielecook /
Calculate Depth of Coverage and Breadth of Coverage from a bam file. This function calculates by chromsome and for the entire genome. Additionally, if the mtchr (Mitochondrial chromosome name) is provided, nuclear coverage and the ratio of mtDNA:nuclear DNA is calculated. #bam #stats
# This script calculates the depth of coverage and breadth of coverage for a given bam.
# Outputs a dictionary containing the contig/chromosome names and the depth and breadth of coverage for each
# and for the entire genome.
# If you optionally specify the name of the mitochondrial chromosome (e.g. mtDNA, chrM, chrMT)
# The script will also generate breadth and depth of coverage for the nuclear genome AND the ratio
# of mtDNA:nuclearDNA; which can act as a proxy in some cases for mitochondrial count within an individual.
# Author: Daniel E. Cook
danielecook / import_homer_ChIP_Seq_motif_data.R
Import Homer ChIP-Seq Motif Data
# Daniel Cook 2014
# Use this function to import ChIP Seq Data generated by Homer. This data is generated using homers
# command with the '-find <motif file>' argument. Generate output
# looks like this:
# 1. Peak/Region ID
# 2. Chromosome
# 3. Start
danielecook /
Generate test FASTQs for developing a pipeline.
# Generate tiny FASTQs for quick testing.
function test_set() {
for r in `ls *$1*.fq.gz`; do
gunzip -kfc $r | head -n 50000 | gzip > ${r/$1/$2}