Skip to content

Instantly share code, notes, and snippets.

@tdunning
Last active August 29, 2015 14:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tdunning/3eac75b7bb0e36ed77b7 to your computer and use it in GitHub Desktop.
Save tdunning/3eac75b7bb0e36ed77b7 to your computer and use it in GitHub Desktop.
Extremely ordered data appears random when you look at it from a random direction

When you look at extremely ordered data from a random direction, you may well not see any order at all.

As an example, lets take 10,000 samples that are randomly placed on the corners of a 100-dimensional hyper-cube. In R, this is amazingly easy to do:

x = matrix(runif(1000000)>0.5, ncol=100) - 0.5

This matrix originally contains boolean values, but these values will turn into 0's and 1's when we do any math. By subtracting 0.5, I center the hypercube around the origin rather than having all the values being in the first quadrant (well, the positive 100-dimensional orthant).

We can also pick a random projection of this data into 2 dimensions:

projection = matrix(rnorm(200), ncol=2)
projection = qr.Q(qr(projection))

This works by picking random values using a normal distribution and then normalizing and orthogonalizing the resulting vectors. The result is a uniformly chosen random projection with no bias in terms of direction or rotation.

Once we have these, we can project the data and present it. Projecting is done by just multiplying by our projection and plotting is done with semi-transparent disks so that we get a good feel for the density of the distribution.

> dim(x %*% projection)
[1] 10000     2
> plot(x %*% projection, cex=0.7, pch=21, bg=color, col=color)

You can see what this looks like in the image attached to this gist called fig-1-random-projection.png

On the other hand, if you project the data using a projection aligned along the axes of the original or along just a few axes, you get a picture that is much more organized. Here we pick a projection that has only a few directions and then rotate it for interest

projection.2 = qr.Q(qr(matrix(runif(2*100)<0.05, ncol=2))) %*% qr.Q(qr(matrix(rnorm(4), ncol=2)))

And then we can plot the data again with this new projection. I add a little tiny bit of noise to the data so we can see that multiple data points get projected to the same place

small.noise = matrix(rnorm(10000 * 100, 0, 2e-2), ncol=100)
plot((x + small.noise) %*% projection.2, cex=0.7, pch=21, bg=color, col=color)

This is what you get from the image called fig-2-aligned-projection-png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment