Skip to content

Instantly share code, notes, and snippets.

View danielecook's full-sized avatar
😀
Things are going good

Daniel E Cook danielecook

😀
Things are going good
View GitHub Profile
@danielecook
danielecook / lafitness.py
Last active October 11, 2022 18:20
Generates an ics (icalendar / ical) for an LA fitness club. You must know the club ID, which you can get by going through their website, finding your club, and going to the fitness classes page. It's at the end of the URL (clubid=...)
from pyquery import PyQuery as pq
from icalendar import Calendar, Event
import datetime
from datetime import date, timedelta
from dateutil.relativedelta import *
from dateutil.parser import *
from pprint import pprint as pp
clubid = 722
url = "https://www.lafitness.com/Pages/ClassSchedulePrintVersion.aspx?clubid=%s" % clubid
@danielecook
danielecook / R_helper_functions.R
Last active August 29, 2015 13:56
A list of helper functions in R
# I am trying to make R a little easier by adding a few helper functions. Most of these mimic functionality seen in Stata.
# This function attempts to mimic the order command in Stata;
# Usage:
# df <- corder(df,<list of columns>)
# Order variables in a data frame.
corder <- function(df,...) {
cols <-as.vector(eval(substitute((alist(...)))),mode="character")
@danielecook
danielecook / orthologs.sh
Last active August 29, 2015 13:58
Generates the pairwise mapping between human <==> c. elegans genes #bash
wget 'ftp://ftp.ncbi.nih.gov/pub/HomoloGene/current/homologene.data'
egrep "\t9606\t" homologene.data | sort | cut -f 1,3,4 > human.txt
egrep "\t6239\t" homologene.data | sort | cut -f 1,3,4 > celegans.txt
join -1 1 -2 1 -t $'\t' human.txt celegans.txt | cut -f 2,3,4,5 | sort | echo -e "Human_Entrez\tHuman_Symbol\tElegans_Entrez\tElegans_Symbol\n$(cat -)" > orthologs.txt
rm human.txt celegans.txt homologene.data
@danielecook
danielecook / Check_Fastqs.py
Last active August 29, 2015 14:01
This code will pull out the header information from the first 1000 lines of all the fastq's in the folder where it is executed. Then it takes the most commonly found index and outputs a summary for each fastq.
#!/usr/bin/python
import re
from itertools import groupby as g
import subprocess
import sys
from collections import OrderedDict
def most_common(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
@danielecook
danielecook / plot_runkeeper.R
Last active January 16, 2017 17:47
This R Script will plot all of your runkeeper data. It uses cluster analysis to group activities by location as needed, and outputs a graph for each location. For example - I have run in Iowa City, Boston, and Chicago - and this script is able to identify those locations and output separately.
# Special thanks for insights from flowingdata.com regarding this.
library(plotKML)
library(plyr)
library(dplyr)
library(fpc)
num_locations <- 5
# Usage: Place this script in the directory containing your runkeeper data. You can run from terminal using 'Rscript map_runkeeper.R', or
@danielecook
danielecook / SRX_SRA_download.sh
Last active August 29, 2015 14:02
Download all of the sequence Runs for a given experiment from the sequence read archive (SRA); Requires edirect and the sra-toolkit.
function SRX_fetch_fastq() {
sra_set=`esearch -db sra -query $1 | efetch -format docsum | xtract -element Run@acc`
echo "Downloading Run $1:"
echo ${sra_set}
echo "-------"
for SRA in $sra_set; do
echo "Downloading $SRA"
fastq-dump $SRA
done;
}
@danielecook
danielecook / worm_tracker.R
Created June 18, 2014 21:50
In conjunction with included bash, concatenates multiple files within folders (with foldername and filename)
library(stringr)
library(dplyr)
"""
# Generate concatenated worm_track data using the following
for folder in `ls -d *\/`; do
for file in `ls $folder/worm*`; do
cat $file | awk -v file=$file '{print file","$1}' >> worm_track_all.txt
done;
done;
"""
@danielecook
danielecook / LCR_region.sh
Last active March 7, 2017 08:24
Generate Low Complexity Region (LCR) bedfile of masked regions from UCSC repeatmasker data and its complement for use with bcftools
#!/bin/bash
wget 'http://hgdownload.soe.ucsc.edu/goldenPath/ce10/database/rmsk.txt.gz' -O LCR_rmsk.txt.gz
gunzip -kfc LCR_rmsk.txt.gz | grep 'Low_complexity' | cut -f 6,7,8 > LCR_ce10_rmsk.bed
rm LCR_rmsk.txt.gz
# Generate the set of regions complementary (e.g. NOT low complexity)
# Download c. elegans chromosome information
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from ce10.chromInfo" > ce10.genome
bedtools complement -i LCR_ce10_rmsk.bed -g ce10.genome | sort -k 1,1 -k2,2n > LCR_complement_ce10.bed
@danielecook
danielecook / bcftools wrapper.py
Last active August 29, 2015 14:03
A lightweight wrapper for bcftools written in python (a work in progress)
import os, subprocess, uuid, re
import vcf.filters
class bcf(file):
def __init__(self, file):
# Start by storing basic information about the vcf/bcf
self.file = file
self.ops = []
@danielecook
danielecook / .bash_profile
Last active September 22, 2020 17:01
My Bash Profile
echo "Sync Profile Loaded"
export PS1="\w 🍔 "
alias refresh="source ~/.bash_profile"
alias git log=“git log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit”
export PATH=/usr/local/bin:$PATH
# Get working directory of frontmost finder window.
cdf() {