Skip to content

Instantly share code, notes, and snippets.

View willtownes's full-sized avatar

Will Townes willtownes

View GitHub Profile
@willtownes
willtownes / refgene2bed.py
Last active January 30, 2024 16:31
Splits a refGene.txt file into multiple .bed files for various genome features (exon,intron, etc), suitable for input to bedtools coverage
"""
Python Script to read in a reference genome of refseq IDs and output several tab-delimited BED (text) files suitable for use with bedtools coverage for counting ChIP-seq reads that map to various gene features.
All output files have the structure expected by bedtools, namely,
CHROM POSITION1 POSITION2 REFSEQ_ID
Possible output files include:
1. distal promoter (transcription start [-5KB,-1KB]) KB means kilobase pairs, not kilobyte
2. proximal promoter (transcription start [-1KB,1KB])
3. gene body (anywhere between transcription start and transcription end)
4. transcript (anywhere in an exon)- outputs each exon as a separate line
5. first 1/3 transcript- outputs each exon as a separate line
@willtownes
willtownes / mp_benchmark.py
Created July 12, 2016 17:01
python multiprocessing benchmarks
"""
Testing python multiprocessing speed. Based on
http://blogs.warwick.ac.uk/dwatkins/entry/benchmarking_parallel_python_1_2/
"""
import time
import math
import multiprocessing as mp
def isprime(n):
@willtownes
willtownes / poisson_prediction_interval.Rmd
Created November 21, 2023 21:34
Prediction interval for Poisson regression
---
title: "Poisson prediction interval"
author: "Will Townes"
output: html_document
---
Poisson prediction interval based on [Kim et al 2022](https://doi.org/10.1002/wics.1568)
```{r}
n<-100
@willtownes
willtownes / loess_vs_gp.R
Last active February 17, 2022 16:23
Is LOESS a special case of a Gaussian process?
library(pdist)
n<-200
X<-matrix(10*runif(n),ncol=1)
y<-sin(X[,1])#+rnorm(n,sd=.2)
#plot(X[,1],y)
#xnew<-3
#span<-1
my_loess<-function(xnew,X,y,span=.75){
@willtownes
willtownes / monotone_spline.R
Created February 2, 2017 19:56
monotone splines using package mgcv
library(mgcv)
#library(modules) #devtools::install_github(klmr/modules)
#mgcv<-import_package("mgcv")
mspline<-function(x,y,k=10,lower=NA,upper=NA){
#fits a monotonic spline to data
#small values of k= more smoothing (flatter curves)
#large values of k= more flexible (wiggly curves)
#k is related to effective degrees of freedom and number of knots
#use unconstrained gam to get rough parameter estimates
@willtownes
willtownes / grigorev_twitter_question.py
Created February 5, 2021 16:46
Grigorev twitter interview question
"""
From https://twitter.com/Al_Grigor/status/1357028887209902088
Most candidates cannot solve this interview problem:
* Input: "aaaabbbcca"
* Output: [("a", 4), ("b", 3), ("c", 2), ("a", 1)]
Write a function that converts the input to the output
I ask it in the screening interview and give it 25 minutes
How would you solve it?
"""
@willtownes
willtownes / gpflow_multioutput.py
Created September 21, 2020 15:04
GPflow multioutput
from gpflow.conditionals import conditional
from gpflow.inducing_variables import SeparateIndependentInducingVariables
from gpflow.kernels import SeparateIndependent
#note: object 'm' is of type gpflow.models.svgp.SVGP
ind_conditional = conditional.dispatch(
object, SeparateIndependentInducingVariables, SeparateIndependent, object)
gmu, gvar = ind_conditional(
X,
@willtownes
willtownes / pythonprimes
Created December 6, 2011 01:43
Faster way to list primes in Python
def listprimes2(m):
'''another attempt to list all primes below m'''
values = range(m+1) #note that in this list the key and the value are the same.
primes = values[:]
primes[1] = 0 #1 doesn't count as a prime
for i in values:
if primes[i] == 0:
pass
else:
for j in values[i+1:]:
@willtownes
willtownes / omptest.cpp
Created October 5, 2018 21:32
Test whether clang++ recognizes openmp
#include <omp.h>
#include <stdio.h>
//clang++ -Xpreprocessor -fopenmp -lomp omptest.cpp -o omptest
int main() {
#pragma omp parallel
printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
@willtownes
willtownes / fuzzy_clustering_eval.R
Created August 22, 2018 15:53
Fuzzy Clustering Jaccard and Rand Indices
#Fuzzy version of Jaccard and Rand Indices
#based on Suleman: "Assessing a Fuzzy Extension of Rand Index and Related Measures"
L<-4 #number of clusters
N<-50 #number of objects
X<-gtools::rdirichlet(N,rep(.01,L)) #NxL soft random clustering
y<-apply(X,1,which.max) #hard cluster version of X
table(y)
Y<-model.matrix(~factor(y)-1) #hard cluster version of X
Z<-matrix(1/L,nrow=N,ncol=L) #perfect uncertainty soft clustering