Skip to content

Instantly share code, notes, and snippets.

@tbobin
Created April 27, 2017 11:18
Show Gist options
  • Save tbobin/347dafc26235a7cb01369e10b53ecfe9 to your computer and use it in GitHub Desktop.
Save tbobin/347dafc26235a7cb01369e10b53ecfe9 to your computer and use it in GitHub Desktop.
example how to handel outliers in a dataset by seting outliers to the 5% re 95% quantile
#quick and dirty
library(tidyverse)
# some data
x <- c(12,8,80,56,round(runif(100,1000,1050)),1100,1150,1222,1180,1200,1190)
hist_data <- data.frame(x)
qnt <- quantile(x, probs = c(0.25, 0.75))
caps <- quantile(x, probs = c(0.05, 0.95))
H <- 1.5*IQR(x)
# left outliers
x[x < (qnt[1] - H)] <- caps[1]
# rigth outliers
x[x > (qnt[2] + H)] <- caps[2]
hist_data <- cbind(hist_data, x2=x)
hist_data %>% ggplot(aes(x)) + geom_histogram(binwidth = 1)
hist_data %>% ggplot(aes(x2)) + geom_histogram(binwidth = 1)
hist_data %>% ggplot(aes(x="",y=x)) + geom_boxplot()
hist_data %>% ggplot(aes(x="",y=x2)) + geom_boxplot()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment