Skip to content

Instantly share code, notes, and snippets.

View willtownes's full-sized avatar

Will Townes willtownes

View GitHub Profile
@willtownes
willtownes / pythonprimes
Created December 6, 2011 01:43
Faster way to list primes in Python
def listprimes2(m):
'''another attempt to list all primes below m'''
values = range(m+1) #note that in this list the key and the value are the same.
primes = values[:]
primes[1] = 0 #1 doesn't count as a prime
for i in values:
if primes[i] == 0:
pass
else:
for j in values[i+1:]:
@willtownes
willtownes / newton_recursive.R
Created November 7, 2012 21:46
Example of recursion in R
#Example of using recursion to apply newton's method to an arbitrary function in order to find the zeros of the function. If no zeros are found after some number of iterations, NA is returned
newton.recursive<-function(func,deriv,counter=100,x=1,relchange=1){
diff = func(x)/deriv(x)
relchange = abs(diff/x)
if(counter==0){
print("Failed to converge")
return(NA)
}
else if(relchange <= 1e-10){
@willtownes
willtownes / py27topy24.py
Created February 5, 2013 19:08
Checklist for making python 2.7 scripts run in python 2.4
'''I recently had to deploy a test script I had written in python 2.7 to a server where only python 2.4 was allowed. In the process of making my code backwards compatible, I made a brief checklist of things to watch out for:
* any and all functions can still be used but you have to manually code them
* with-as blocks must be replaced with try-finally
* try-except-finally must be converted to try-except-else
* string formatting: "".format() must be replaced with "%s"%(args) approach.
* Replace all instances of "except Exception as err:" with "except Exception, e:"
* may need to manually convert some unicode strings into str, especially when loading json
* Some modules have different names, eg: "email.MIMEText" versus "email.mime.text.MIMEText"
* The datetime module does not have strptime() in 2.4. However, you can import it with a different name and then make a subclass with the desired behavior by calling strptime() from the time module instead:'''
@willtownes
willtownes / GtownMath611hw11prob1bBasicFilter.R
Created November 17, 2013 15:07
Hamilton's "basic filter" algorithm implemented in R
index2slag<-function(index){
# Takes an index between 1 and 32 and converts it into a binary vector. This is used to iterate all of the possible values of (s_t,s_t1,s_t2,s_t3,s_t4)
num<-paste(rev(intToBits(index-1)),"")
return(as.logical(as.numeric(num[28:32])))
}
ref.s<-t(sapply(1:32,index2slag)) #table of all S_t1,... combinations
ref.s #structure of this reference table is crucial in step5!
slag2index<-function(slag){
@willtownes
willtownes / paintball.r
Last active August 29, 2015 14:05
Paintball Simulation
# Simulation of a Paintball Game.
# The two players start off in random positions. They move one square each turn. Then they shoot at each other.
# Probability of who shoots first is 50/50.
# Probability of a hit is proportional to the distance between the players.
# Tracks history of last 3 moves.
# Prints out graphs to show the movement of the players and the shots.
# Game over when someone is hit.
shoot<-function(a.pos,b.pos){
# returns TRUE if the shot was a hit, otherwise FALSE
@willtownes
willtownes / research_interests.md
Last active August 29, 2015 14:06
Research Interests

My background

I am interested in biological problems that require both computational and statistical methods. I started out doing tropical ecology. Then I was a software tester for a while, where I learned programming. I took a bunch of statistics classes at night (two of my favorites were Bayesian Statistics and Stochastic Simulation). I am a fan of open source collaboration and plan on using software engineering techniques like automated testing, version control and continuous integration in my future research. Here are some things I want to learn more about:

Statistics

  • Bayesian hierarchical modeling and probabilistic graphical models
  • Bayesian nonparametrics, including all the stochastic processes theory beneath it (Dirichlet processes, etc)
  • Nonparametric and semiparametric statistics in general
  • Variational Inference- I find this an intriguing alternative to MCMC
  • Spatial Statistics (I love looking at maps)
  • How does the choice of projection affect things? Is there a way to do 3-D spatial
@willtownes
willtownes / enet_cp.r
Created January 3, 2015 18:42
Elastic Net Cp function
#sigma2, df, and Cp may not be reliable in default elasticnet function enet
#provide replacements
#S2L is the estimate for sigma2 based on maximal model (full_ols_model from lm())
S2L<- sum(residuals(full_ols_model)^2)/full_ols_model$df.residual
Cp_enet<-function(enet_obj,y,X,S2L){
#returns the Mallows Cp statistics for a given elastic net object
betamat<-enet_obj$beta.pure
#create a matrix where each column is prediction at one of the df levels
yhats<-X%*%t(betamat) + mean(y)
@willtownes
willtownes / compbio_hw2_rnaseq_template.sh
Created March 6, 2015 20:41
shell script template for RNA-seq processing on Odyssey cluster
#!/bin/bash
#SBATCH -n 1
#SBATCH -t 400
#SBATCH -p serial_requeue
#SBATCH --mem=35000
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=my_username@mail.harvard.edu
#take filename from command line argument
FILENAME=$1
@willtownes
willtownes / compbio_hw2_rnaseq_master.sh
Created March 6, 2015 20:43
master shell script to run multiple RNA-seq samples simultaneously, each on a separate node
#!/bin/bash
LOGFILE="master_job_ids.log"
echo "======Starting new analysis run========" >> $LOGFILE
names=( SRR1592185 SRR1592187 SRR1592189 SRR1592186 SRR1592188 SRR1592190 )
for ((i=0; i<${#names[@]} ; i++)) do
echo "Running Batch Job for ${names[i]}. SLURM Job ID shown below:" >> $LOGFILE
#make results directory
mkdir -p ${names[i]}_results
@willtownes
willtownes / refgene2bed.py
Last active January 30, 2024 16:31
Splits a refGene.txt file into multiple .bed files for various genome features (exon,intron, etc), suitable for input to bedtools coverage
"""
Python Script to read in a reference genome of refseq IDs and output several tab-delimited BED (text) files suitable for use with bedtools coverage for counting ChIP-seq reads that map to various gene features.
All output files have the structure expected by bedtools, namely,
CHROM POSITION1 POSITION2 REFSEQ_ID
Possible output files include:
1. distal promoter (transcription start [-5KB,-1KB]) KB means kilobase pairs, not kilobyte
2. proximal promoter (transcription start [-1KB,1KB])
3. gene body (anywhere between transcription start and transcription end)
4. transcript (anywhere in an exon)- outputs each exon as a separate line
5. first 1/3 transcript- outputs each exon as a separate line