Skip to content

Instantly share code, notes, and snippets.

@Xachriel
Created February 5, 2014 06:57
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Xachriel/8818554 to your computer and use it in GitHub Desktop.
Save Xachriel/8818554 to your computer and use it in GitHub Desktop.
#Example for the kaggle forums.
library(FNN)
library(stats)
train <- read.csv("data/train.csv", header=TRUE, comment.char="")
test <- read.csv("data/test.csv", header=TRUE, comment.char="")
N <- 30000
set.seed(20140202)
trainingSet <- train[sample(1:nrow(train), N), ]
validationSet <- train[!(rownames(train) %in% rownames(trainingSet)), ]
trainingLabel <- trainingSet[, 1]
trainingSet <- trainingSet[, -1]
validationLabel <- validationSet[, 1]
validationSet <- validationSet[, -1]
principalComps <- prcomp( ~. , data = trainingSet)
trainingPRC <- as.matrix(trainingSet) %*% principalComps$rotation
validationPRC <- as.matrix(validationSet) %*% principalComps$rotation
prc <- proc.time()
resultPRC <- (0:9)[knn(trainingPRC[, 1:36], validationPRC[, 1:36], trainingLabel, k = 10, algorithm="cover_tree")]
print(proc.time() - prc)
sum(resultPRC == validationLabel)/length(validationLabel)
# 0.9714167
# optimal 36, region 31-41
prc <- proc.time()
resultPRC <- (0:9)[knn(principalComps$x[, 1:36], validationPRC[, 1:36], trainingLabel, k = 10, algorithm="cover_tree")]
print(proc.time() - prc)
sum(resultPRC == validationLabel)/length(validationLabel)
# 0.9066667
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment