Skip to content

Instantly share code, notes, and snippets.

View Alrecenk's full-sized avatar

Alrecenk Alrecenk

View GitHub Profile
@Alrecenk
Alrecenk / LogisticRegressionSimple.java
Last active August 29, 2015 14:01
A logistic regression algorithm for binary classification implemented using Newton's method and a Wolfe condition based inexact line-search.
/* A logistic regression algorithm for binary classification implemented using Newton's method and
* a Wolfe condition based inexact line-search.
*created by Alrecenk for inductivebias.com May 2014
*/
public class LogisticRegressionSimple {
double w[] ; //the weights for the logistic regression
int degree ; // degree of polynomial used for preprocessing
//preprocessed list of input/output used for calculating error and its gradients
@Alrecenk
Alrecenk / RotationForestsplit.java
Created November 5, 2013 04:43
The core learning algorithm for the rotation forest that calculates the best split based on approximate information gain.
//splits this node if it should and returns whether it did
//data is assumed to be a set of presorted lists where data[k][j] is the jth element of data when sorted by axis[k]
public boolean split(int minpoints){
//if already split or one class or not enough points remaining then don't split
if (branchnode || totalpositive == 0 || totalnegative == 0 || totalpositive + totalnegative < minpoints){
return false;
}else{
int bestaxis = -1, splitafter=-1;
double bestscore = Double.MAX_VALUE;//any valid split will beat no split
int bestLp=0, bestLn=0;
@Alrecenk
Alrecenk / normalize.java
Last active December 26, 2015 22:09
Normalizing a vector to length one, normalizing a data point into a distribution of mean zero and standard deviation of one, and generating a vector by a normal distribution. Different operations that are named similarly and might be confusion.
//makes a vector of length one
public static void normalize(double a[]){
double scale = 0 ;
for(int k=0;k<a.length;k++){
scale+=a[k]*a[k];
}
scale = 1/Math.sqrt(scale);
for(int k=0;k<a.length;k++){
a[k]*=scale ;
}
@Alrecenk
Alrecenk / pseudobootstrap.java
Last active December 26, 2015 04:39
Bootstrap aggregation for a random forest algorithm.
//bootstrap aggregating of training data for a random forest
Random rand = new Random(seed);
treenode tree[] = new treenode[trees] ;
for(int k=0;k<trees;k++){
ArrayList<Datapoint> treedata = new ArrayList<Datapoint>()
for (int j = 0; j < datapermodel; j++){
//add a random data point to the training data for this tree
int nj = Math.abs(rand.nextInt())%data.size();
treedata.add(alldata.get(nj)) ;
}
@Alrecenk
Alrecenk / pseudonaivetreelearn.java
Last active December 26, 2015 01:59
Pseudocode for a naive implementation of a decision tree learning algorithm.
int splitvariable=-1; // split on this variable
double splitvalue ;//split at this value
// total positives and negatives used for leaf node probabilities
int totalpositives,totalnegatives ;
Datapoint trainingdata[]; //the training data in this node
treenode leftnode,rightnode;//This node's children if it's a branch
//splits this node greedily using approximate information gain
public void split(){
double bestscore = Maxvalue ;//lower is better so default is very high number
@Alrecenk
Alrecenk / RotationForestSimple.java
Last active December 25, 2015 17:49
An optimized rotation forest algorithm for binary classification on fixed length feature vectors.
/*A rotation forest algorithm for binary classification with fixed length feature vectors.
*created by Alrecenk for inductivebias.com Oct 2013
*/
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Random;
public class RotationForestSimple{
double mean[] ; //the mean of each axis for normalization
@Alrecenk
Alrecenk / basicLDLsolve.java
Last active December 23, 2015 17:09
Solves for C given an LDL decomposition in the form LDL^T C = X^T Y.
public double[] solvesystem(double L[][], double D[], double XTY[]){
//back substitution with L
double p[] = new double[XTY.length] ;
for (int j = 0; j < inputs; j++){
p[j] = XTY[j] ;
for (int i = 0; i < j; i++){
p[j] -= L[j][i] * p[i];
}
}
//Multiply by inverse of D matrix
@Alrecenk
Alrecenk / basicLDL.java
Last active December 23, 2015 17:09
A basic LDL decomposition of a matrix X times its transpose.
double[][] L = new double[inputs][ inputs];
double D[] = new double[inputs] ;
//for each column j
for (int j = 0; j < inputs; j++){
D[j] = XTX[j][j];//calculate Dj
for (int k = 0; k < j; k++){
D[j] -= L[j][k] * L[j][k] * D[k];
}
//calculate jth column of L
L[j][j] = 1 ; // don't really need to save this but its a 1
@Alrecenk
Alrecenk / LeastSquaresTrain.java
Last active August 1, 2016 04:16
This code provides all functions necessary to perform and apply a least squares fit of a polynomial from multiple inputs to multiple outputs. The fit is performed using an in-place LDL Cholesky decomposition based on the Cholesky–Banachiewicz algorithm.
//performs a least squares fit of a polynomial function of the given degree
//mapping each input[k] vector to each output[k] vector
//returns the coefficients in a matrix
public static double[][] fitpolynomial(double input[][], double output[][], int degree){
double[][] X = new double[input.length][];
//Run the input through the polynomialization and add the bias term
for (int k = 0; k < input.length; k++){
X[k] = polynomial(input[k], degree);
}
int inputs = X[0].length ;//number of inputs after the polynomial
@Alrecenk
Alrecenk / NaiveBayesApplication.java
Created September 8, 2013 07:42
Calculate the output for a Naive Bayes classifier.
//Calculate the probability that the given input is in the positive class
public double probability(double in[]){
double relativepositive=0,relativenegative=0;
for(int j=0; j<in.length; j++){
relativepositive += (in[j]-posmean[j])*(in[j]-posmean[j]) / posvariance[j] ;
relativenegative += (in[j]-negmean[j])*(in[j]-negmean[j]) / negvariance[j] ;
}
relativepositive = positives*Math.exp(0.5*relativepositive) ;
relativenegative = negatives*Math.exp(0.5*relativenegative) ;
return relativepositive / (relativepositive + relativenegative) ;