Created
April 13, 2021 12:47
-
-
Save cdriveraus/0cd88396a3112ef0140e77607fd528eb to your computer and use it in GitHub Desktop.
bivarate binary correlation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
invlogit <- function(x) { | |
exp(x)/(1+exp(x)); | |
} | |
corbase <- matrix(c(1,2,0,1),2,2) | |
mcor <- cov2cor(corbase %*% t(corbase)) | |
print(mcor) | |
corchol <- t(chol(mcor)) | |
r <- matrix(rnorm(2000000),ncol=2) | |
cdat <- t(corchol %*% t(r)) | |
cor(cdat) #pearson correlation in the Gaussian data | |
bdat <- matrix(round(invlogit(cdat),0),ncol=2) #turn continuous into binary data | |
cor(bdat) #pearson correlation in the binarised data |
Well, that may be, I'd be happy to see such a case, your example on stackoverflow also gave the correct result though -- the variables are unrelated, even though they may have similar proportions.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is one example where both statistics give the same results. That might be well the case. My point however is that there are cases where the give rather opposing results, and in these the binary correlation coefficient gives the intuitively right answer, and not the correlation coefficient.
Do not get my wrong, often I agree, that the correlation coefficient can be used for binary data. I have again and again used linear regression for binary outcomes because there are only few cases for which results differ to, for example probit/logit. But this does not imply for me that you generaly should use Pearson correlation coefficient for binary data (neither linear regression for binary outcomes),