Last active
May 26, 2018 22:15
-
-
Save amitbhsingh/4f29d84f15fa29185128defbf03c457e to your computer and use it in GitHub Desktop.
Analysis of Wine Quality KNN (k nearest neighbour)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
date: "April , 2 , 2018" | |
output: | |
word_document: default | |
pdf_document: default | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE) | |
``` | |
## R Markdown | |
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>. | |
```{r} | |
install.packages("class") | |
#install.packages("gmodels") | |
library(class) | |
library(gmodels) | |
#importing data from URL | |
dataurl<-"http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv" | |
wine<-read.csv(dataurl,header = TRUE,sep = ";") | |
``` | |
1. Check data characteristics. Is theremissing data? | |
```{r} | |
str(wine) | |
which(is.na(wine)) | |
``` | |
2. What is the correlation between the attributes other than wine quality? | |
```{r} | |
#wine without quality | |
library(corrplot) | |
winewoq<-wine[,-12 ] | |
a<-cor(winewoq) | |
corrplot.mixed(a, lower.col = "black", number.cex = 0.85) | |
``` | |
3.Graph the frequency distribution of wine quality. | |
```{r} | |
hist(wine$quality) | |
``` | |
4. Reduce the levels of rating for quality rto three levels as high, medium and low | |
```{r} | |
wine$quality<-factor(wine$quality,ordered = T) | |
wine$rating <-ifelse(wine$quality < 5, 'low', ifelse( | |
wine$quality < 7, 'medium', 'high')) | |
round(prop.table(table(wine$rating )) * 100, digits = 1) | |
head(wine) | |
``` | |
5. Normalize the data set | |
```{r} | |
normalize <- function(x) { | |
return ((x - min(x)) / (max(x) - min(x))) } | |
winenew <- as.data.frame(lapply(wine[1:11], normalize)) | |
``` | |
6. Divide the data to training and testing groups | |
```{r} | |
smp_size <- floor(0.75 * nrow(winenew )) | |
train_ind <- sample(seq_len(nrow(winenew )), size = smp_size) | |
wine_train <- winenew[train_ind, ] | |
wine_test <- winenew[-train_ind, ] | |
set.seed(123) | |
smp_size1 <- floor(0.75 * nrow(wine)) | |
train_ind1 <- sample(seq_len(nrow(wine )), size = smp_size1) | |
wine_train_labels <- wine[train_ind1,13 ] | |
wine_test_labels <- wine[-train_ind1,13 ] | |
``` | |
7. Use the KNN algorithm to predict the quality of wine using its attributes. | |
```{r} | |
wine_test_pred <- knn(train = wine_train, test = wine_test,cl = wine_train_labels , k=67) | |
``` | |
8. Evaluate the model performance | |
```{r} | |
require(class) | |
table(wine_train_labels) | |
class(wine_test_pred) | |
summary(CrossTable(x=wine_test_labels, y=wine_test_pred, prop.chisq=FALSE)) | |
``` |
Author
amitbhsingh
commented
May 26, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment