Skip to content

Instantly share code, notes, and snippets.

@tcibinan
Created March 19, 2018 09:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tcibinan/f87071f78d503aa70671d61d217d0651 to your computer and use it in GitHub Desktop.
Save tcibinan/f87071f78d503aa70671d61d217d0651 to your computer and use it in GitHub Desktop.
Classify with kknn method with and without cross-validation
library(kknn)
library(dplyr)
library(stringr)
set.seed(2342)
# Data preprocessing
data <-
read.csv('lab8_data.csv', sep=',') %>%
mutate(Diagnosis = str_replace_all(Diagnosis, " ", "")) %>%
mutate(Diagnosis = factor(Diagnosis)) %>%
mutate(Sex = factor(Sex)) %>%
mutate(Left_right = factor(Left_right)) %>%
select(-contains(".mm")) %>%
select(-X) %>%
mutate(Sex = as.numeric(Sex)) %>%
mutate(Left_right = as.numeric(Left_right))
learn_to_test <- 5/1
observations_count <- dim(data)[1]
learn_idx <- sample(seq_len(observations_count),
observations_count*learn_to_test/(learn_to_test+1))
learn_data <- data[learn_idx, ]
test_data <- data[-learn_idx, ]
# Computations
kknn(Diagnosis ~ .,
learn_data,
test_data,
kernel = "triangular",
k = 7,
distance = 1) %>%
summary
train.kknn(Diagnosis ~ .,
data,
kmax = 16,
kernel = c("rectangular", "triangular", "epanechnikov"),
distance = 2) %>%
plot(.,
main = paste("Kmeans clusterisation\n",
"Best method:", .$best.parameters$kernel, ".",
"Best k:", .$best.parameters$k, "\n"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment