looking to see if t-SNE replicates linear separation of groups
n <- 50
m <- 40
m_inform <- 10
niter <- 200
intradist <- numeric(niter)
interdist <- numeric(niter)
mus <- seq(from=0, to=3, length=niter)
cols <- rep(1:2, each=n/2)
for (i in seq_len(niter)) {
mu <- mus[i]
x <- cbind(rbind(matrix(rnorm(n/2 * m_inform, -mu/2), ncol=m_inform),
matrix(rnorm(n/2 * m_inform, mu/2), ncol=m_inform)),
matrix(rnorm(n * (m - m_inform)), nrow=n))
# see comment below on raising perplexity to 16 for n=50 and 30 for n=100
res <- Rtsne(x, perplexity=10)
#plot(res$Y, col=cols, pch=20, xlab="", ylab="")
mid1 <- colMeans(res$Y[cols==1,])
mid2 <- colMeans(res$Y[cols==2,])
intradist[i] <- mean(c(sqrt(colSums((t(res$Y[cols==1,]) - mid1)^2)),
sqrt(colSums((t(res$Y[cols==2,]) - mid2)^2))))
interdist[i] <- sqrt(sum((mid1 - mid2)^2))
# make plot
dat <- data.frame(mu=sqrt(m_inform)*rep(mus,2),
dist=c(intradist, interdist),
ggplot(dat, aes(x=mu,y=dist,col=type)) + geom_point() + geom_smooth() +
xlab("distance between sub-population centers") +
ylab("distance recovered by t-SNE") + ggtitle(paste(n,"points"))

Aug 25, 2016

I had previously lowered perplexity until I didn't get an error, but then Michael Schubert asked what if I use higher values:

I tried again, using default value (perplexity=30) for n=100:

And using perplexity=16 (highest without error) for n=50:

Using these values the plots look more linear on the left side until a breakpoint at which the two populations are spread apart beyond their actual distance. This is more or less what the method advertises.

