Created
March 28, 2023 19:59
-
-
Save bvancil/85ef5d28ce7cd23519462158219918cb to your computer and use it in GitHub Desktop.
Compositional PCA
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Compositional PCA" | |
--- | |
## Problem | |
From [https://sciencemastodon.com/\@Alex_Koiter/109909545205270622](https://sciencemastodon.com/@Alex_Koiter/109909545205270622), "Question for the [#soil](https://fosstodon.org/tags/soil) and [#stats](https://fosstodon.org/tags/stats) peeps. Can you include the percent sand, silt, and clay in to a PCA? The reason I am asking is that the three variables always sum to 100% My lurking on stack exchange etc seems to suggest that it is ok, but some how it doesn't feel right." | |
Let's check this for three variables. We don't have a proof but suspect different eigenvalues to $X^TX$ . Let's try to shoot for a negative case. | |
## Trying for a counterexample | |
First, we'll create some sample data. | |
```{r} | |
library(withr) | |
get_sample_data <- function(n = 100L) { | |
withr::with_seed(20230320L, { | |
x1 <- runif(n, 0, 1) | |
x2 <- runif(n, 0, 1-x1) | |
x3 <- 1 - x1 - x2 | |
X <- cbind(x1, x2, x3) | |
X | |
}) | |
} | |
X <- get_sample_data() | |
``` | |
Next let's do the PCA with all three variables. | |
```{r} | |
pca_X <- princomp(X, cor = TRUE) | |
print(summary(pca_X)) | |
print(loadings(pca_X)) | |
biplot(pca_X) | |
``` | |
We compare this to the PCA with two variables. | |
```{r} | |
pca_X12 <- princomp(X[,1:2], cor = TRUE) | |
print(summary(pca_X12)) | |
print(loadings(pca_X12)) | |
biplot(pca_X12) | |
``` | |
Are the PCA-coded loadings the same? | |
```{r} | |
X_prime <- predict(pca_X, X) | |
X12_prime <- predict(pca_X12, X[,1:2]) | |
head(X_prime) | |
head(X12_prime) | |
cor(t(X_prime[,1:2])) - cor(t(X12_prime)) | |
``` | |
## Compositional approach | |
```{r} | |
library(compositions) | |
``` | |
```{r} | |
X_comp <- acomp(X) | |
pca_X_comp <- princomp(X_comp) | |
summary(pca_X_comp) | |
plot(pca_X_comp, type="biplot") | |
plot(X_comp) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment