Aug 25, 2016

looking to see if t-SNE replicates linear separation of groups
 n <- 50 m <- 40 m_inform <- 10 set.seed(1) niter <- 200 intradist <- numeric(niter) interdist <- numeric(niter) mus <- seq(from=0, to=3, length=niter) library(Rtsne) cols <- rep(1:2, each=n/2) for (i in seq_len(niter)) { mu <- mus[i] cat(i,"") x <- cbind(rbind(matrix(rnorm(n/2 * m_inform, -mu/2), ncol=m_inform), matrix(rnorm(n/2 * m_inform, mu/2), ncol=m_inform)), matrix(rnorm(n * (m - m_inform)), nrow=n)) # see comment below on raising perplexity to 16 for n=50 and 30 for n=100 res <- Rtsne(x, perplexity=10) #plot(res\$Y, col=cols, pch=20, xlab="", ylab="") mid1 <- colMeans(res\$Y[cols==1,]) mid2 <- colMeans(res\$Y[cols==2,]) intradist[i] <- mean(c(sqrt(colSums((t(res\$Y[cols==1,]) - mid1)^2)), sqrt(colSums((t(res\$Y[cols==2,]) - mid2)^2)))) interdist[i] <- sqrt(sum((mid1 - mid2)^2)) } # make plot dat <- data.frame(mu=sqrt(m_inform)*rep(mus,2), dist=c(intradist, interdist), type=rep(c("intra","inter"),each=niter)) library(ggplot2) print( ggplot(dat, aes(x=mu,y=dist,col=type)) + geom_point() + geom_smooth() + xlab("distance between sub-population centers") + ylab("distance recovered by t-SNE") + ggtitle(paste(n,"points")) )
### mikelove commented Aug 25, 2016 • edited

 I had previously lowered perplexity until I didn't get an error, but then Michael Schubert asked what if I use higher values: https://twitter.com/_ms03/status/768827491536502785 I tried again, using default value (perplexity=30) for n=100: https://twitter.com/mikelove/status/768830108761255937 And using perplexity=16 (highest without error) for n=50: https://twitter.com/mikelove/status/768830491042652160 Using these values the plots look more linear on the left side until a breakpoint at which the two populations are spread apart beyond their actual distance. This is more or less what the method advertises.