Skip to content

Instantly share code, notes, and snippets.

@SwampThingPaul
Created October 6, 2019 20:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SwampThingPaul/09f76573873c0600115fd6b0e103f780 to your computer and use it in GitHub Desktop.
Save SwampThingPaul/09f76573873c0600115fd6b0e103f780 to your computer and use it in GitHub Desktop.
Weekend of a Data Scientist in R
# R version of https://medium.com/cindicator/must-have-statistical-tests-for-any-data-scientist-weekend-of-a-data-scientist-4543f2c393cd
# Medium Article written by Alexander Osipenko (https://medium.com/@subpath)
# Fake Data (not the same data as article)
set.seed(123)
groupA=rnorm(1000)
groupB=rnorm(1100,mean=0.001)
# Quick Density plot
plot(0:1,type="n",xlim=c(-5,5),ylim=c(0,0.6),ylab="Density",xlab="Value",las=1)
with(density(groupA),lines(x,y,col="blue"))
with(density(groupB),lines(x,y,col="red"))
legend("topright",legend=c("Group A","Group B"),col=c("blue","red"),lwd=1,lty=1,
ncol=1,cex=0.8,bty="n",y.intersp=1.75,x.intersp=0.75,xpd=NA,xjust=0.5,yjust=0.5)
# Student’s t-test
shapiro.test(groupA); #normally distributed
shapiro.test(groupB); #normally distributed
t.test(groupA,groupB); # not significantly different
# Mann-Whitney’s U-test
groupA2=rnorm(100,sd=1.25)*runif(100)
groupB2=rnorm(100,mean=1.5)*runif(100)
# Quick Density plot
plot(0:1,type="n",xlim=c(-5,5),ylim=c(0,1),ylab="Density",xlab="Value",las=1)
with(density(groupA2),lines(x,y,col="blue"))
with(density(groupB2),lines(x,y,col="red"))
legend("topright",legend=c("Group A","Group B"),col=c("blue","red"),lwd=1,lty=1,
ncol=1,cex=0.8,bty="n",y.intersp=1.75,x.intersp=0.75,xpd=NA,xjust=0.5,yjust=0.5)
shapiro.test(groupA2); #not normally distributed
shapiro.test(groupB2); #not normally distributed
wilcox.test(groupA2,groupB2); # significantly different
# Fisher’s F-test
#using data from first example
dat=data.frame(val=c(groupA,groupB),groups=c(rep("A",length(groupA)),rep("B",length(groupB))))
var.test(val~groups,dat)
# Bartlett’s test
dat2=data.frame(val=c(groupA2,groupB2),groups=c(rep("A",length(groupA2)),rep("B",length(groupB2))))
bartlett.test(val~groups,dat)
# Levene’s test in R
library(car)
leveneTest(val~groups,dat)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment