romanegloo/sta580_final_review.md

## sta580_final_review.md

      
    Raw
  

              sta580_final_review.md
            
          
    STA580 final summary

ANOVA (analysis of variance)

testing whether the means of two or more populations are equal
must have a continuous response and at least one categorical factor

one-way ANOVA

one fixed factor

model
$$y_ij (\text{observation}) = mu (grand mean) + tau_i (treatment effect) + epsilon_ij (residual)$$
statistics


degree of freedom
sum of squares
mean ss
f-statistics
p-value


group
df_g = k - 1
ssb
(between sample means)
msb = ssb / df_g
f = msb / msw
f in f_{df_g,df_r) distribution


residuals
df_r = k (n - 1)
ssw
(within each sample)
msw = ssw / df_r


assumptions

equal variance
normal distribution

hypothesis test

examine the presence of the treatment effect
if the p-value is less than the significance level, at least one of the means is different.
you can analyze further to find out which mean is different with Tukey test

coefficients
Tukey’s pair-wise comparison

check if any interval does not contain zero

two-way ANOVA

two factors on a response, also called balanced anova

model
y_ijk (observation) = mu (grand mean) + alpha_i (treatment A effect) + beta_j (treatment B effect) + gamma_ij (interaction effect) + epsilon_ijk (residual)
hypothesis test

in reverse order,


examine the presence of interaction effect


H_0: all the gammas are zeroes


examine the presence of main effects (either alpha or beta)


H_0: all the betas are zeroes
H_0: all the alphas are zeroes


Kruscal-Wallis test and Dunn test
Linear Regression
simple linear model:
y_i = b_0 (intercept) + b_1 (slope) * x_i + epsilon_i (random error)
assumptions

validity: data should be valid upon your research question
additivity and linearity (residuals-fitted)
independence of errors: errors should be independent from any other errors
equal variance of errors: if the variances are not equal, use weighted least squares (residual-fitted, scale-location)
normality of errors: errors are normally distributed (QQ, residuals-leverage (examine outliers))

diagnostic plots
	degree of freedom	sum of squares	mean ss	f-statistics	p-value
group	df_g = k - 1	ssb (between sample means)	msb = ssb / df_g	f = msb / msw	f in f_{df_g,df_r) distribution
residuals	df_r = k (n - 1)	ssw (within each sample)	msw = ssw / df_r