Skip to content

Instantly share code, notes, and snippets.

View danielecook's full-sized avatar
😀
Things are going good

Daniel E Cook danielecook

😀
Things are going good
View GitHub Profile
@danielecook
danielecook / pubmed pairwise.R
Last active August 29, 2015 14:04
Example of pubmed pairwise searching
library(RISmed)
library(parallel)
library(ggplot2)
# Given two lists of terms, lets see how 'hot' they are together
set1 <- c("ebola","autoimmune","Diabetes","HIV","Glioblastoma","Asthma","Schizophrenia")
set2 <- c("C. elegans","D. Melanogaster","C. japonica", "M. Musculus","S. Cerevisiae")
# Generate all possible pairs
pairs <- expand.grid(set1, set2, stringsAsFactors=F)
@danielecook
danielecook / google_calendar.js
Created August 2, 2014 19:55
Google App Script Calendar Reservations
/**
* Get a user's name, by accessing contacts.
*
* @returns {String} FullName, or UserID
* if record not found in contacts.
*/
function getUserName(email){
var user = ContactsApp.getContact(email);
// If user in contacts, return their name
@danielecook
danielecook / het_polarization.py
Last active August 29, 2015 14:05
Heterozygote Polarization - Polarizes Heterozygous calls based on a prior likelyhood of identifying a heterozygous call in a VCF File. Useful for calling variants in organisms with low levels of heterozygosity. Frequently, this is the case in hermaphroditic organisms such as C. elegans. #VCF
#!bin/usr/python
'''
Heterozygote Polarization Script
usage:
bcftools view -M 2 <filename> | python het_polarization.py | bcftools view -O b > <filename.het.polarized.bcf>
Tags variants 'pushed' to ref or alt as follows:
AA - Pushed towards reference
AB - Kept as het
@danielecook
danielecook / plot_peaks.R
Created August 13, 2014 19:55
Plot Peaks from Homer
library(data.table)
library(stringr)
library(splitstackshape)
args <- commandArgs(trailing=T)
if (length(args) == 0) {
setwd("/Users/dancook/Dropbox/Andersen lab/LabFolders/Dan/ForOthers/Maneeshi")
}
df <- fread("Snail2_Twist_MergedPeaksPS1005kbSummit300bp_Sn_TwEBox.txt")
@danielecook
danielecook / fasta_sequence_lengths.sh
Last active February 22, 2017 10:40 — forked from maneeshi/gist:412ef98ab0fba2ac4d0c
Generate Fasta sequence lengths
cat file.fa | awk '$0 ~ ">" {print c; c=0;printf substr($0,2,100) "\t"; } $0 !~ ">" {c+=length($0);} END { print c; }'
@danielecook
danielecook / rprofile.R
Created August 21, 2014 18:39
My .Rprofile
# Place in /Users/Username/.Rprofile
"
sys.source('~/Dropbox/appdata/Rprofile.r')
"
## Create a new invisible environment for all the functions to go in so it doesn't clutter your workspace.
.env <- new.env()
@danielecook
danielecook / generate_masked_ranges.py
Last active December 28, 2019 03:00
Generates the masked ranges within a fasta file.
#!bin/python
import gzip
import io
import sys
import os
# This file will generate a bedfile of the masked regions a fasta file.
# STDIN or arguments
@danielecook
danielecook / depth_of_coverage.py
Last active March 29, 2021 14:47
Calculate Depth of Coverage and Breadth of Coverage from a bam file. This function calculates by chromsome and for the entire genome. Additionally, if the mtchr (Mitochondrial chromosome name) is provided, nuclear coverage and the ratio of mtDNA:nuclear DNA is calculated. #bam #stats
#
# This script calculates the depth of coverage and breadth of coverage for a given bam.
# Outputs a dictionary containing the contig/chromosome names and the depth and breadth of coverage for each
# and for the entire genome.
#
# If you optionally specify the name of the mitochondrial chromosome (e.g. mtDNA, chrM, chrMT)
# The script will also generate breadth and depth of coverage for the nuclear genome AND the ratio
# of mtDNA:nuclearDNA; which can act as a proxy in some cases for mitochondrial count within an individual.
#
# Author: Daniel E. Cook
@danielecook
danielecook / import_homer_ChIP_Seq_motif_data.R
Created September 22, 2014 15:18
Import Homer ChIP-Seq Motif Data
# Daniel Cook 2014
# Danielecook.com
#
# Use this function to import ChIP Seq Data generated by Homer. This data is generated using homers findMotifsGenome.pl
# command with the '-find <motif file>' argument. Generate output
# looks like this:
#
# 1. Peak/Region ID
# 2. Chromosome
# 3. Start
@danielecook
danielecook / test_fastq.sh
Created September 25, 2014 20:21
Generate test FASTQs for developing a pipeline.
# Generate tiny FASTQs for quick testing.
function test_set() {
for r in `ls *$1*.fq.gz`; do
gunzip -kfc $r | head -n 50000 | gzip > ${r/$1/$2}
done;
}