Skip to content

Instantly share code, notes, and snippets.

@amitbhsingh
Last active May 26, 2018 22:15
Show Gist options
  • Save amitbhsingh/4f29d84f15fa29185128defbf03c457e to your computer and use it in GitHub Desktop.
Save amitbhsingh/4f29d84f15fa29185128defbf03c457e to your computer and use it in GitHub Desktop.
Analysis of Wine Quality KNN (k nearest neighbour)
---
date: "April , 2 , 2018"
output:
word_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
```{r}
install.packages("class")
#install.packages("gmodels")
library(class)
library(gmodels)
#importing data from URL
dataurl<-"http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"
wine<-read.csv(dataurl,header = TRUE,sep = ";")
```
1. Check data characteristics. Is theremissing data?
```{r}
str(wine)
which(is.na(wine))
```
2. What is the correlation between the attributes other than wine quality?
```{r}
#wine without quality
library(corrplot)
winewoq<-wine[,-12 ]
a<-cor(winewoq)
corrplot.mixed(a, lower.col = "black", number.cex = 0.85)
```
3.Graph the frequency distribution of wine quality.
```{r}
hist(wine$quality)
```
4. Reduce the levels of rating for quality rto three levels as high, medium and low
```{r}
wine$quality<-factor(wine$quality,ordered = T)
wine$rating <-ifelse(wine$quality < 5, 'low', ifelse(
wine$quality < 7, 'medium', 'high'))
round(prop.table(table(wine$rating )) * 100, digits = 1)
head(wine)
```
5. Normalize the data set
```{r}
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x))) }
winenew <- as.data.frame(lapply(wine[1:11], normalize))
```
6. Divide the data to training and testing groups
```{r}
smp_size <- floor(0.75 * nrow(winenew ))
train_ind <- sample(seq_len(nrow(winenew )), size = smp_size)
wine_train <- winenew[train_ind, ]
wine_test <- winenew[-train_ind, ]
set.seed(123)
smp_size1 <- floor(0.75 * nrow(wine))
train_ind1 <- sample(seq_len(nrow(wine )), size = smp_size1)
wine_train_labels <- wine[train_ind1,13 ]
wine_test_labels <- wine[-train_ind1,13 ]
```
7. Use the KNN algorithm to predict the quality of wine using its attributes.
```{r}
wine_test_pred <- knn(train = wine_train, test = wine_test,cl = wine_train_labels , k=67)
```
8. Evaluate the model performance
```{r}
require(class)
table(wine_train_labels)
class(wine_test_pred)
summary(CrossTable(x=wine_test_labels, y=wine_test_pred, prop.chisq=FALSE))
```
@amitbhsingh
Copy link
Author

image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment