Skip to content

Instantly share code, notes, and snippets.

@plpxsk
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save plpxsk/9453979 to your computer and use it in GitHub Desktop.
Save plpxsk/9453979 to your computer and use it in GitHub Desktop.
very quick summary of PCA
## I don't know if any of this will make sense
## but if you can't wait to get started doing PCA, take a look at the below
## all R code is right below. annoted code is below that
dataset.PCAcor <- princomp(dataset,cor=T)
summary(dataset.PCAcor)
loadings(dataset.PCAcor)
biplot(dataset.PCAcor)
biplot(dataset.PCAcor,col=c("azure4","black"), cex=c(0.8,1), expand=0.9)
title("Biplot based on correlation matrix")
## Id look at bivariate correlations first (ie, pairwise correlations between variables)
## your dataframe should only have the variables you want to summarize, in columns
## to get pca
## if the variables are standardized (or all have the same units)
## then cor = F
## if variables have different units (like lbs, miles, hrs, etc)
## then cor = T
dataset.PCAcor <- princomp(dataset,cor=T)
summary(dataset.PCAcor)
loadings(dataset.PCAcor)
### one of these will show you "proportion of variance". this is variance explained by each principal component (each component is a summary of a few variables. loadings show which variables are in each component)
### max # of components = the # of variables
### you want a low# of components to explain a large % of the variance (look at cumulative variance)
# a 2-d projection of first two components
biplot(dataset.PCAcor)
# biplot(dataset.PCAcor,col=c("azure4","black"), cex=c(0.8,1), expand=0.9)
title("Biplot based on correlation matrix")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment