Last active
August 5, 2021 17:17
-
-
Save edsko/985de631c0077561359fdbe5536e012e to your computer and use it in GitHub Desktop.
Mean and standard deviation corrected for a baseline
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{------------------------------------------------------------------------------- | |
Mean and standard deviation, corrected for the baseline | |
By a slight abuse of notation we'll let X and Y range over two random | |
variables, as well as over two samples /drawn/ from those random variables. | |
We'll assume the sample size (for both variables) is N. | |
* Basic definitions: | |
> mean X = sum X_i / N (mean aka average) | |
> var X = sum ((X_i - mean X) ** 2) / (N - 1) (variance) | |
> std X = sqrt (var X) (standard deviation) | |
See <https://en.wikipedia.org/wiki/Standard_deviation> | |
* Correlation between two variables: | |
> cov X Y = sum ((X_i - mean X) * (Y_i - mean Y)) / (N - 1) | |
Notice that | |
> cov X X = sum ((X_i - mean X) * (X_i - mean X)) / (N - 1) | |
> = sum ((X_i - mean X) ** 2) / (N - 1) | |
> = var X | |
See <https://en.wikipedia.org/wiki/Covariance#Covariance_with_itself> | |
* Combining and scaling variables | |
For two random variables X and Y, we have | |
> mean (X + Y) = mean X + mean Y | |
> var (X + Y) = var X + var Y - 2 * cov X Y | |
Moreover | |
> mean (a * X) = a * mean X | |
> var (a * X) = (a ** 2) * var X | |
> cov (a * X) (b * Y) = (a * b) * cov X Y | |
See <https://en.wikipedia.org/wiki/Covariance#Covariance_of_linear_combinations> | |
* Correcting for the baseline | |
To correct for the baseline, we are interested in | |
> X - Y = X + -1 * Y | |
Applying the above definitions, we get | |
> mean (X - Y) | |
> = mean (X + (-1 * Y)) | |
> = mean X + mean (-1 * Y) | |
> = mean X + (-1 * mean Y) | |
> = mean X - mean Y | |
> var (X - Y) | |
> = var (X + (-1 * Y)) | |
> = var X + var (-1 * Y) - 2 * cov X (-1 * Y) | |
> = var X + var Y - 2 * (-1 * cov X Y) | |
> = var X + var Y - 2 * cov X Y | |
Notice that if we let Y = X, then | |
> mean (X - X) = mean X - mean X = 0 | |
> var (X - X) = var X + var X - 2 * cov X X = var X + var X - 2 * var X = 0 | |
as expected: if we think of X as the "baseline", then correcting the baseline | |
"for itself" yields an average of zero and no variance. | |
With many thanks to Lars Brünjes for explaining all this :) | |
-------------------------------------------------------------------------------} | |
-- | Mean value and standard deviation, both corrected for the baseline | |
data Summary a = Summary { | |
-- | Mean (average) | |
mean :: a | |
-- | Standard deviation | |
-- | |
-- The standard deviation indicates how much a new measurement is likely | |
-- to diverge from the mean. | |
, stdDev :: a | |
-- | Standard error | |
-- | |
-- The standard error indicates how much the TRUE mean is likely to be | |
-- away from your experimental mean. | |
-- | |
-- See <https://www.investopedia.com/ask/answers/042415/what-difference-between-standard-error-means-and-standard-deviation.asp> | |
, stdErr :: a | |
} | |
deriving (Functor) | |
summarise :: | |
[Double] -- ^ Baseline (X) | |
-> [Double] -- ^ Sample (Y) | |
-> Summary Double | |
summarise ys xs = Summary {mean, stdDev, stdErr} | |
where | |
n :: Double | |
n = if length xs == length ys | |
then fromIntegral (length xs) | |
else error "summarise: not same sample size" | |
mean, stdDev, stdErr :: Double | |
mean = mean_X - mean_Y | |
stdDev = sqrt (var_X + var_Y - 2 * cov_X_Y) | |
stdErr = stdDev / sqrt n | |
mean_X, mean_Y :: Double | |
mean_X = sum xs / n | |
mean_Y = sum ys / n | |
var_X, var_Y :: Double | |
var_X = sumOver (\x -> (x - mean_X) ** 2) xs | |
/ (n - 1) | |
var_Y = sumOver (\y -> (y - mean_Y) ** 2) ys | |
/ (n - 1) | |
cov_X_Y :: Double | |
cov_X_Y = sumOver (\(x, y) -> (x - mean_X) * (y - mean_Y)) (zip xs ys) | |
/ (n - 1) | |
sumOver :: Num b => (a -> b) -> [a] -> b | |
sumOver f = sum . map f | |
summarise' :: [Int] -> [Int] -> Summary Int | |
summarise' xs ys = | |
round <$> summarise (map fromIntegral xs) (map fromIntegral ys) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment