Skip to content

Instantly share code, notes, and snippets.

Avatar

Wesley Tansey tansey

View GitHub Profile
@tansey
tansey / aicc_select.py
Created Jul 6, 2021
Post-selection inference example for AICc-based model screening
View aicc_select.py
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
def generalized_liang_sim_xy(N=500, P=500, S=100):
'''Generates data from a simple linear model'''
X = (np.random.normal(size=(N,1)) + np.random.normal(size=(N,P))) / 2.
w0 = np.random.normal(1, size=S//4)
w1 = np.random.normal(2, size=S//4)
@tansey
tansey / binary_matrix_factorization.py
Last active Jun 1, 2020
Quick and dirty binary matrix factorization via alternating logistic regression
View binary_matrix_factorization.py
import numpy as np
from functools import partial
from scipy.optimize import fmin_l_bfgs_b
from sklearn.linear_model import LogisticRegression
def binary_mf(Y, nembeds, lam=None, lams=30, cv=5, max_steps=30, tol=1e-4, verbose=False):
# Convert to a log-space grid
if lam is None and isinstance(lams, int):
lams = np.exp(np.linspace(np.log(1e-2), np.log(1), lams))
@tansey
tansey / knockoffs.py
Last active Mar 28, 2020
A fast (nlogn time) implementation of the knockoff filter
View knockoffs.py
'''
A O(nlogn) time implementation of the knockoff filter.
Author: Wesley Tansey
Date: 3/27/2020
'''
import numpy as np
def knockoff_filter(knockoff_stats, alpha, offset=1.0, is_sorted=False):
'''Perform the knockoffs selection procedure at the target FDR threshold.
@tansey
tansey / multifactor.py
Created Aug 9, 2019
Heterogeneous (AKA multi-view) factor modeling in pytorch.
View multifactor.py
'''
Heterogeneous factor modeling.
This model fits a heterogeneous factor model where columns may be:
1) Binary
2) Categorical
3) Gaussian
Everything is fit via alternating minimization and stochastic gradient descent.
The code relies on pytorch for SGD and a demo is included.
@tansey
tansey / factor_pav.py
Last active May 11, 2019
Pool adjacent violators algorithm for monotone matrix factorization
View factor_pav.py
'''Pool adjacent violators algorithm for (column-)monotone matrix factorization.
Applies the PAV algorithm to column factors of a matrix factorization:
Given: M = W.V'
Returns: V_proj, a projected version of V such that M[i] is monotone decreasing
for all i.
Author: Wesley Tansey
Date: May 2019
'''
@tansey
tansey / fast_mvn.py
Last active May 8, 2019
Fast multivariate normal sampling for some common cases
View fast_mvn.py
'''Fast sampling from a multivariate normal with covariance or precision
parameterization. Supports sparse arrays. Params:
- mu: If provided, assumes the model is N(mu, Q)
- mu_part: If provided, assumes the model is N(Q mu_part, Q).
This is common in many conjugate Gibbs steps.
- sparse: If true, assumes we are working with a sparse Q
- precision: If true, assumes Q is a precision matrix (inverse covariance)
- chol_factor: If true, assumes Q is a (lower triangular) Cholesky
decomposition of the covariance matrix
(or of the precision matrix if precision=True).
@tansey
tansey / nurse_schedules.py
Last active Apr 19, 2021
Solver for a nurse scheduling problem
View nurse_schedules.py
'''
Program to generate valid time allocations of a mental ward staff.
Given:
Two staff lists. Each list applies for a specific window of time. Lists may
contain non-empty intersections of employees.
Each employee has a designation as RMN or HCA.
@tansey
tansey / fitWeightedNegativeBinomial.R
Created Jul 2, 2017
Fit negative binomial with weighted observations in R
View fitWeightedNegativeBinomial.R
# Fit using a simple EM algorithm
# observations are x
# weights are w (must be same length as x)
# returns (r, p)
# r - dispersion parameter
# p - probability of success
weightedNegBinomFit <- function(x, w, maxsteps=30)
{
sum.wx = sum(x*w)
sum.w = sum(w)
@tansey
tansey / chol_unpack.py
Created Feb 10, 2017
Unpack a lower triangular Cholesky matrix from a neural network output in Tensorflow
View chol_unpack.py
import tensorflow as tf
def unpack_cholesky(q, ndims):
# Build the lower-triangular Cholesky from the flat buffer (assumes q shape is [batchsize, cholsize])
chol_diag = tf.nn.softplus(q[:,:ndims])
chol_offdiag = q[:,ndims:]
chol_rows = []
chol_start = 0
chol_end = 1
for i in xrange(ndims):
@tansey
tansey / wtfset.cpp
Created Nov 9, 2016
terrible STL set iterator behavior
View wtfset.cpp
#include <set>
#include <iostream>
class A
{
std::set<int> s;
public:
A() { s.insert(0); s.insert(1); }
std::set<int> getSet() { return s; }