Skip to content

Instantly share code, notes, and snippets.

View adrianolszewski's full-sized avatar

Adrian Olszewski adrianolszewski

View GitHub Profile
@adrianolszewski
adrianolszewski / Olszewski_quartet.md
Created October 16, 2023 20:07
My own version of the Anscombe's quartet :-)

Olszewski's vs. Anscombe's quartet

You probably heard about the Anscombe's quartet. It's almost a textbook justification for looking at the data first and not trusting solely descriptive statistics!

I decided to make my own, Olszewski's quartet! It shows 4 faces in different moods. The mean and variance of the Y coordinate is exactly (NOT approximately!) the same for all 4 faces. Also, the Pearson's correlation is almost 0.

93b1d1f6-8710-485e-a751-0d7cb45b1cae

How did I make it?

@adrianolszewski
adrianolszewski / logistic_regression_testing_hypotheses.md
Last active February 18, 2024 23:38
Logistic regression is often used for testing hypotheses, replacing a variety of common classic tests

Despite the widespread and nonsensical claim, that "logistic regression is not a regression", it constitutes one of the key regression and hypothesis testing tools used in the experimental research (like clinical trials).

Let me show you how the logistic regression (with a few extensions) can be used to test hypotheses about fractions (%) of successes, repacling the classic "test for proportions". Namely, it can replicate the results of:

  1. the Wald's (normal approximation) z test for 2 proportions with non-pooled standard errors (common in clinical trials) via LS-means on the prediction scale or AME (average marginal effect)
  2. the Rao's score (normal appr.) z test for 2 proportions with pooled standard errors (just what the prop.test() does in R)
  3. the z test for multiple (2+) proportions
  4. ANOVA-like (joint) test for multiple caterogical predictors (n-way ANOVA). Also (n-way) ANCOVA if you employ numerical covariates.
  5. [the **Cochran-Mantel-Haenszel
@adrianolszewski
adrianolszewski / quantreg_mww.r
Last active April 16, 2024 19:22
Mann-Whitney (Wilcoxon) and Kruskal-Wallis FAIL to compare medians in general. Quantile regression should be used to compare medians instead
# Let's make some data to play with
set.seed(1234)
v1 <- rexp(500)
v2 <- rnorm(500) + log(2)
v3 <- -rgamma(500, 2.5, 3)
v4 <- runif(500, -2,4)
# Look at the data
layout(matrix(c(1:4), nrow = 2))