Skip to content

Instantly share code, notes, and snippets.

@cdriveraus
Created April 13, 2021 12:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cdriveraus/0cd88396a3112ef0140e77607fd528eb to your computer and use it in GitHub Desktop.
Save cdriveraus/0cd88396a3112ef0140e77607fd528eb to your computer and use it in GitHub Desktop.
bivarate binary correlation
invlogit <- function(x) {
exp(x)/(1+exp(x));
}
corbase <- matrix(c(1,2,0,1),2,2)
mcor <- cov2cor(corbase %*% t(corbase))
print(mcor)
corchol <- t(chol(mcor))
r <- matrix(rnorm(2000000),ncol=2)
cdat <- t(corchol %*% t(r))
cor(cdat) #pearson correlation in the Gaussian data
bdat <- matrix(round(invlogit(cdat),0),ncol=2) #turn continuous into binary data
cor(bdat) #pearson correlation in the binarised data
@AJWarnke
Copy link

This is one example where both statistics give the same results. That might be well the case. My point however is that there are cases where the give rather opposing results, and in these the binary correlation coefficient gives the intuitively right answer, and not the correlation coefficient.

Do not get my wrong, often I agree, that the correlation coefficient can be used for binary data. I have again and again used linear regression for binary outcomes because there are only few cases for which results differ to, for example probit/logit. But this does not imply for me that you generaly should use Pearson correlation coefficient for binary data (neither linear regression for binary outcomes),

@cdriveraus
Copy link
Author

Well, that may be, I'd be happy to see such a case, your example on stackoverflow also gave the correct result though -- the variables are unrelated, even though they may have similar proportions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment