tdunning/fig-1-random-projection.png

## fig-1-random-projection.png

      
    Raw
  

              fig-1-random-projection.png
            
          
## fig-2-aligned-projection.png

      
    Raw
  

              fig-2-aligned-projection.png
            
          
## hypercube-random-projection.md

      
    Raw
  

              hypercube-random-projection.md
            
          
    When you look at extremely ordered data from a random direction, you may well not see any order at all.
As an example, lets take 10,000 samples that are randomly placed on the corners of a 100-dimensional hyper-cube. In R, this is amazingly easy to do:
x = matrix(runif(1000000)>0.5, ncol=100) - 0.5
This matrix originally contains boolean values, but these values will turn into 0's and 1's when we do any math. By subtracting 0.5, I center the hypercube around the origin rather than having all the values being in the first quadrant (well, the positive 100-dimensional orthant).
We can also pick a random projection of this data into 2 dimensions:
projection = matrix(rnorm(200), ncol=2)
projection = qr.Q(qr(projection))
This works by picking random values using a normal distribution and then normalizing and orthogonalizing the resulting vectors.  The result is a uniformly chosen random projection with no bias in terms of direction or rotation.
Once we have these, we can project the data and present it.  Projecting is done by just multiplying by our projection and plotting is done with semi-transparent disks so that we get a good feel for the density of the distribution.
> dim(x %*% projection)
[1] 10000     2
> plot(x %*% projection, cex=0.7, pch=21, bg=color, col=color)
You can see what this looks like in the image attached to this gist called fig-1-random-projection.png
On the other hand, if you project the data using a projection aligned along the axes of the original or along just a few axes, you get a picture that is much more organized. Here we pick a projection that has only a few directions and then rotate it for interest
projection.2 = qr.Q(qr(matrix(runif(2*100)<0.05, ncol=2))) %*% qr.Q(qr(matrix(rnorm(4), ncol=2)))
And then we can plot the data again with this new projection. I add a little tiny bit of noise to the data so we can see that multiple data points get projected to the same place
small.noise = matrix(rnorm(10000 * 100, 0, 2e-2), ncol=100)
plot((x + small.noise) %*% projection.2, cex=0.7, pch=21, bg=color, col=color)
This is what you get from the image called fig-2-aligned-projection-png