Skip to content

Instantly share code, notes, and snippets.

View danielecook's full-sized avatar
😀
Things are going good

Daniel E Cook danielecook

😀
Things are going good
View GitHub Profile
@danielecook
danielecook / install_R_packages.R
Last active August 27, 2015 00:08
This script is sourced from setup_bootcamp.sh and sets R up.
# Install the hadleyverse
install.packages(c("dplyr","tidyr","stringr", "ggplot2", "reshape2", "httr", "readxl"), repos = 'http://cran.us.r-project.org')
# Other useful R Packages
install.packages(c("rio", "data.table", "knitr", "RColorBrewer", "RCurl", "readr"), repos = 'http://cran.us.r-project.org')
# Install bioconductor
source("http://bioconductor.org/biocLite.R")
biocLite()
@danielecook
danielecook / R_helper_functions.R
Last active August 29, 2015 13:56
A list of helper functions in R
# I am trying to make R a little easier by adding a few helper functions. Most of these mimic functionality seen in Stata.
# This function attempts to mimic the order command in Stata;
# Usage:
# df <- corder(df,<list of columns>)
# Order variables in a data frame.
corder <- function(df,...) {
cols <-as.vector(eval(substitute((alist(...)))),mode="character")
@danielecook
danielecook / orthologs.sh
Last active August 29, 2015 13:58
Generates the pairwise mapping between human <==> c. elegans genes #bash
wget 'ftp://ftp.ncbi.nih.gov/pub/HomoloGene/current/homologene.data'
egrep "\t9606\t" homologene.data | sort | cut -f 1,3,4 > human.txt
egrep "\t6239\t" homologene.data | sort | cut -f 1,3,4 > celegans.txt
join -1 1 -2 1 -t $'\t' human.txt celegans.txt | cut -f 2,3,4,5 | sort | echo -e "Human_Entrez\tHuman_Symbol\tElegans_Entrez\tElegans_Symbol\n$(cat -)" > orthologs.txt
rm human.txt celegans.txt homologene.data
@danielecook
danielecook / Check_Fastqs.py
Last active August 29, 2015 14:01
This code will pull out the header information from the first 1000 lines of all the fastq's in the folder where it is executed. Then it takes the most commonly found index and outputs a summary for each fastq.
#!/usr/bin/python
import re
from itertools import groupby as g
import subprocess
import sys
from collections import OrderedDict
def most_common(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
@danielecook
danielecook / SRX_SRA_download.sh
Last active August 29, 2015 14:02
Download all of the sequence Runs for a given experiment from the sequence read archive (SRA); Requires edirect and the sra-toolkit.
function SRX_fetch_fastq() {
sra_set=`esearch -db sra -query $1 | efetch -format docsum | xtract -element Run@acc`
echo "Downloading Run $1:"
echo ${sra_set}
echo "-------"
for SRA in $sra_set; do
echo "Downloading $SRA"
fastq-dump $SRA
done;
}
@danielecook
danielecook / worm_tracker.R
Created June 18, 2014 21:50
In conjunction with included bash, concatenates multiple files within folders (with foldername and filename)
library(stringr)
library(dplyr)
"""
# Generate concatenated worm_track data using the following
for folder in `ls -d *\/`; do
for file in `ls $folder/worm*`; do
cat $file | awk -v file=$file '{print file","$1}' >> worm_track_all.txt
done;
done;
"""
@danielecook
danielecook / bcftools wrapper.py
Last active August 29, 2015 14:03
A lightweight wrapper for bcftools written in python (a work in progress)
import os, subprocess, uuid, re
import vcf.filters
class bcf(file):
def __init__(self, file):
# Start by storing basic information about the vcf/bcf
self.file = file
self.ops = []
@danielecook
danielecook / pubmed pairwise.R
Last active August 29, 2015 14:04
Example of pubmed pairwise searching
library(RISmed)
library(parallel)
library(ggplot2)
# Given two lists of terms, lets see how 'hot' they are together
set1 <- c("ebola","autoimmune","Diabetes","HIV","Glioblastoma","Asthma","Schizophrenia")
set2 <- c("C. elegans","D. Melanogaster","C. japonica", "M. Musculus","S. Cerevisiae")
# Generate all possible pairs
pairs <- expand.grid(set1, set2, stringsAsFactors=F)
@danielecook
danielecook / google_calendar.js
Created August 2, 2014 19:55
Google App Script Calendar Reservations
/**
* Get a user's name, by accessing contacts.
*
* @returns {String} FullName, or UserID
* if record not found in contacts.
*/
function getUserName(email){
var user = ContactsApp.getContact(email);
// If user in contacts, return their name
@danielecook
danielecook / het_polarization.py
Last active August 29, 2015 14:05
Heterozygote Polarization - Polarizes Heterozygous calls based on a prior likelyhood of identifying a heterozygous call in a VCF File. Useful for calling variants in organisms with low levels of heterozygosity. Frequently, this is the case in hermaphroditic organisms such as C. elegans. #VCF
#!bin/usr/python
'''
Heterozygote Polarization Script
usage:
bcftools view -M 2 <filename> | python het_polarization.py | bcftools view -O b > <filename.het.polarized.bcf>
Tags variants 'pushed' to ref or alt as follows:
AA - Pushed towards reference
AB - Kept as het