ANOVA (analysis of variance)
- testing whether the means of two or more populations are equal
- must have a continuous response and at least one categorical factor
one-way ANOVA
- one fixed factor
model
degree of freedom | sum of squares | mean ss | f-statistics | p-value | |
---|---|---|---|---|---|
group | df_g = k - 1 | ssb (between sample means) |
msb = ssb / df_g | f = msb / msw | f in f_{df_g,df_r) distribution |
residuals | df_r = k (n - 1) | ssw (within each sample) |
msw = ssw / df_r |
assumptions
- equal variance
- normal distribution
hypothesis test
- examine the presence of the treatment effect
- if the p-value is less than the significance level, at least one of the means is different.
- you can analyze further to find out which mean is different with Tukey test
coefficients
Tukey’s pair-wise comparison
- check if any interval does not contain zero
two-way ANOVA
- two factors on a response, also called balanced anova
model y_ijk (observation) = mu (grand mean) + alpha_i (treatment A effect) + beta_j (treatment B effect) + gamma_ij (interaction effect) + epsilon_ijk (residual)
hypothesis test
- in reverse order,
-
- examine the presence of interaction effect
- H_0: all the gammas are zeroes
-
- examine the presence of main effects (either alpha or beta)
- H_0: all the betas are zeroes
- H_0: all the alphas are zeroes
Kruscal-Wallis test and Dunn test
Linear Regression simple linear model: y_i = b_0 (intercept) + b_1 (slope) * x_i + epsilon_i (random error)
assumptions
- validity: data should be valid upon your research question
- additivity and linearity (residuals-fitted)
- independence of errors: errors should be independent from any other errors
- equal variance of errors: if the variances are not equal, use weighted least squares (residual-fitted, scale-location)
- normality of errors: errors are normally distributed (QQ, residuals-leverage (examine outliers))
diagnostic plots