{{ message }}

Instantly share code, notes, and snippets.

# CerebralMastication/pca example.R

Created Sep 16, 2010
 set.seed(2) x <- 1:100 y <- 20 + 3 * x e <- rnorm(100, 0, 60) y <- 20 + 3 * x + e plot(x,y) yx.lm <- lm(y ~ x) lines(x, predict(yx.lm), col="red") xy.lm <- lm(x ~ y) lines(predict(xy.lm), y, col="blue") # so lm() depends on which variable is x and wich is y # lm minimizes y distance (the error term is y-yhat) #normalize means and cbind together xyNorm <- cbind(x=x-mean(x), y=y-mean(y)) plot(xyNorm) #covariance xyCov <- cov(xyNorm) eigenValues <- eigen(xyCov)\$values eigenVectors <- eigen(xyCov)\$vectors eigenValues eigenVectors plot(xyNorm, ylim=c(-200,200), xlim=c(-200,200)) lines(xyNorm[x], eigenVectors[2,1]/eigenVectors[1,1] * xyNorm[x]) lines(xyNorm[x], eigenVectors[2,2]/eigenVectors[1,2] * xyNorm[x]) # the largest eigenValue is the first one # so that's our principal component. # but the principal component is in normalized terms (mean=0) # and we want it back in real terms like our starting data # so let's denormalize it plot(x,y) lines(x, (eigenVectors[2,1]/eigenVectors[1,1] * xyNorm[x]) + mean(y)) # that looks right. line through the middle as expected # what if we bring back our other two regressions? lines(x, predict(yx.lm), col="red") lines(predict(xy.lm), y, col="blue")

### rubgb commented Sep 17, 2010

 Hi there, this is a great post. I have always wondered about the intuition behind PCA... I'm curious about two of your graphs posted in the blog illustrating the difference between errors in yx and xy. Would it be possible to have access to the R code to generate them? Many thanks in advance, Ruben

### CerebralMastication commented Sep 17, 2010

 It's in the first 13 lines. Plot the lines() one at a time to recreate.

### rubgb commented Sep 17, 2010

 I meant the code for generating: OLS1.png OLS2.png pca.png Many thanks in advance, Ruben

### CerebralMastication commented Sep 17, 2010

 Ahh. I just had a revelation in what you were asking. The yellow lines in all 3 of those were drawn by hand. I did the base plots (which I think you can see from the code) and then I drew 2 yellow lines on each one for illustration. Does that answer your question? Can you pick out of the code where the base plots are? plot(x,y) yx.lm <- lm(y ~ x) lines(x, predict(yx.lm), col="red") # <- that's OLS1 plot(x,y) xy.lm <- lm(x ~ y) lines(predict(xy.lm), y, col="blue") # <- that's OLS2

### rubgb commented Sep 17, 2010

 Nice trick ha,ha. Very illustrative! Now I'm curious. Would it be possible to do something like that with R?

### CerebralMastication commented Sep 17, 2010

 Let's ask the hive mind: http://stackoverflow.com/questions/3737165/drop-lines-from-actual-to-modeled-points-in-r -JD

### CerebralMastication commented Sep 17, 2010

 And Josh Ulrich provided an answer in under 20 minutes. Hive mind FTW! http://stackoverflow.com/questions/3737165/drop-lines-from-actual-to-modeled-points-in-r/3737183#3737183 -JD

### rubgb commented Sep 18, 2010

 Thanks a lot!

### SwampThingPaul commented Jan 3, 2019

 I really enjoyed the blog post and thanks for sharing the code. I have been trying to figure out how to generate the orange (drop) lines on this plot(and below). I tried the segments() function as suggested by Josh Ulrich's posts...but keep going in circles. Any help is appreciated. 