Skip to content

Instantly share code, notes, and snippets.

@ngreifer
Last active May 29, 2021 23:36
Show Gist options
  • Save ngreifer/21c85ffd02d951de6600ef78d662382e to your computer and use it in GitHub Desktop.
Save ngreifer/21c85ffd02d951de6600ef78d662382e to your computer and use it in GitHub Desktop.
Implements sampling with balance constraints using mixed integer programming
constrained_sample <- function(X, ns = .5*nrow(X), tols = .01, targets = colMeans(X), time = 2*60, solver = "glpk") {
#Arguments
#X - dataset (matrix) from which sample is to be drawn
#ns - maximum size of the resulting sample
#tols - maximum distance between resulting sample means and the targets
#targets - target means for sample means to pursue
#time - number of seconds before aborting optimizer
#solver - which solver to use; "glpk" or "gurobi" (gurobi is better)
#
#Output: a vector of indices of X to retain in the sample
n <- nrow(X)
#Objective function: maximize number of units
O <- c(rep(1, n), 0)
#Constraint matrix
C <- rbind(
c(rep(1, n), -1), #Establish slack variable for number of units
t(rbind(X, -targets-tols)), #Imbalance upper bound
t(rbind(X, -targets+tols)), #Imbalance lower bound
c(rep(0, n), 1) #Bound on ns
)
#Constraint RHS and direction
Crhs <- c(0, rep(0, ncol(X)), rep(0, ncol(X)), ns)
Cdir <- c("==", rep("<", ncol(X)), rep(">", ncol(X)), "<")
#Variable types
types <- c(rep("B", n), "I")
#Optimize!
if (solver == "glpk") {
opt.out <- Rglpk::Rglpk_solve_LP(obj = O, mat = C, dir = Cdir, rhs = Crhs, max = TRUE,
types = types, control = list(tm_limit = time*1000))
x <- opt.out$solution[seq_len(n)]
}
else if (solver == "gurobi") {
Cdir[Cdir == "=="] <- "="
opt.out <- gurobi::gurobi(list(obj = O, A = C, sense = Cdir, rhs = Crhs, vtype = types,
modelsense = "max"),
params = list(OutputFlag = 0, TimeLimit = time))
x <- opt.out$x[seq_len(n)]
}
return(which(x>0))
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment