Skip to content

Instantly share code, notes, and snippets.

@mikelove mikelove/tsne_snapping.R
Last active Aug 25, 2016

What would you like to do?
looking to see if t-SNE replicates linear separation of groups
n <- 50
m <- 40
m_inform <- 10
niter <- 200
intradist <- numeric(niter)
interdist <- numeric(niter)
mus <- seq(from=0, to=3, length=niter)
cols <- rep(1:2, each=n/2)
for (i in seq_len(niter)) {
mu <- mus[i]
x <- cbind(rbind(matrix(rnorm(n/2 * m_inform, -mu/2), ncol=m_inform),
matrix(rnorm(n/2 * m_inform, mu/2), ncol=m_inform)),
matrix(rnorm(n * (m - m_inform)), nrow=n))
# see comment below on raising perplexity to 16 for n=50 and 30 for n=100
res <- Rtsne(x, perplexity=10)
#plot(res$Y, col=cols, pch=20, xlab="", ylab="")
mid1 <- colMeans(res$Y[cols==1,])
mid2 <- colMeans(res$Y[cols==2,])
intradist[i] <- mean(c(sqrt(colSums((t(res$Y[cols==1,]) - mid1)^2)),
sqrt(colSums((t(res$Y[cols==2,]) - mid2)^2))))
interdist[i] <- sqrt(sum((mid1 - mid2)^2))
# make plot
dat <- data.frame(mu=sqrt(m_inform)*rep(mus,2),
dist=c(intradist, interdist),
ggplot(dat, aes(x=mu,y=dist,col=type)) + geom_point() + geom_smooth() +
xlab("distance between sub-population centers") +
ylab("distance recovered by t-SNE") + ggtitle(paste(n,"points"))

This comment has been minimized.

Copy link
Owner Author

commented Aug 25, 2016

I had previously lowered perplexity until I didn't get an error, but then Michael Schubert asked what if I use higher values:

I tried again, using default value (perplexity=30) for n=100:

And using perplexity=16 (highest without error) for n=50:

Using these values the plots look more linear on the left side until a breakpoint at which the two populations are spread apart beyond their actual distance. This is more or less what the method advertises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.