Skip to content

Instantly share code, notes, and snippets.

@coppeliaMLA
Created June 26, 2014 16:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save coppeliaMLA/3db2873a22a2916d0aa7 to your computer and use it in GitHub Desktop.
Save coppeliaMLA/3db2873a22a2916d0aa7 to your computer and use it in GitHub Desktop.
Bagging algorithm for hclust
library(reshape2)
#Bagging hierarchical clustering
bagHClust<-function(data, n, k, size, outlier.th) {
clus.bs<-NULL
for (i in 1:n) {
n<-nrow(data)
bs.ind<-sample.int(n, size, replace=FALSE)
hc <- hclust(dist(data[bs.ind,]), "ave")
ct<-cutree(hc, k)
add<-data.frame(iter=rep(i, size), ind=names(ct), cluster=ct)
clus.bs<-rbind(clus.bs, add)
}
#Cartesian products
m<-merge(clus.bs, clus.bs, by=c("iter", "cluster"))
d<-dcast(m, ind.x~ind.y, length)
dm<-d[,-1]/diag(as.matrix(d[,-1]))
dm.rep<-dm
diag(dm.rep)<-0
most<-apply(dm.rep,2,max)
most[most<0.75]
disim<-as.dist(1-dm)
h<-hclust(disim, "ave")
return(h)
}
bhc<-bagHClust(USArrests, 100, 8, 40)
plot(bhc)
hc<-hclust(dist(USArrests), "ave")
plot(hc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment