Skip to content

Instantly share code, notes, and snippets.

@edsko
Last active August 5, 2021 17:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edsko/985de631c0077561359fdbe5536e012e to your computer and use it in GitHub Desktop.
Save edsko/985de631c0077561359fdbe5536e012e to your computer and use it in GitHub Desktop.
Mean and standard deviation corrected for a baseline
{-------------------------------------------------------------------------------
Mean and standard deviation, corrected for the baseline
By a slight abuse of notation we'll let X and Y range over two random
variables, as well as over two samples /drawn/ from those random variables.
We'll assume the sample size (for both variables) is N.
* Basic definitions:
> mean X = sum X_i / N (mean aka average)
> var X = sum ((X_i - mean X) ** 2) / (N - 1) (variance)
> std X = sqrt (var X) (standard deviation)
See <https://en.wikipedia.org/wiki/Standard_deviation>
* Correlation between two variables:
> cov X Y = sum ((X_i - mean X) * (Y_i - mean Y)) / (N - 1)
Notice that
> cov X X = sum ((X_i - mean X) * (X_i - mean X)) / (N - 1)
> = sum ((X_i - mean X) ** 2) / (N - 1)
> = var X
See <https://en.wikipedia.org/wiki/Covariance#Covariance_with_itself>
* Combining and scaling variables
For two random variables X and Y, we have
> mean (X + Y) = mean X + mean Y
> var (X + Y) = var X + var Y - 2 * cov X Y
Moreover
> mean (a * X) = a * mean X
> var (a * X) = (a ** 2) * var X
> cov (a * X) (b * Y) = (a * b) * cov X Y
See <https://en.wikipedia.org/wiki/Covariance#Covariance_of_linear_combinations>
* Correcting for the baseline
To correct for the baseline, we are interested in
> X - Y = X + -1 * Y
Applying the above definitions, we get
> mean (X - Y)
> = mean (X + (-1 * Y))
> = mean X + mean (-1 * Y)
> = mean X + (-1 * mean Y)
> = mean X - mean Y
> var (X - Y)
> = var (X + (-1 * Y))
> = var X + var (-1 * Y) - 2 * cov X (-1 * Y)
> = var X + var Y - 2 * (-1 * cov X Y)
> = var X + var Y - 2 * cov X Y
Notice that if we let Y = X, then
> mean (X - X) = mean X - mean X = 0
> var (X - X) = var X + var X - 2 * cov X X = var X + var X - 2 * var X = 0
as expected: if we think of X as the "baseline", then correcting the baseline
"for itself" yields an average of zero and no variance.
With many thanks to Lars Brünjes for explaining all this :)
-------------------------------------------------------------------------------}
-- | Mean value and standard deviation, both corrected for the baseline
data Summary a = Summary {
-- | Mean (average)
mean :: a
-- | Standard deviation
--
-- The standard deviation indicates how much a new measurement is likely
-- to diverge from the mean.
, stdDev :: a
-- | Standard error
--
-- The standard error indicates how much the TRUE mean is likely to be
-- away from your experimental mean.
--
-- See <https://www.investopedia.com/ask/answers/042415/what-difference-between-standard-error-means-and-standard-deviation.asp>
, stdErr :: a
}
deriving (Functor)
summarise ::
[Double] -- ^ Baseline (X)
-> [Double] -- ^ Sample (Y)
-> Summary Double
summarise ys xs = Summary {mean, stdDev, stdErr}
where
n :: Double
n = if length xs == length ys
then fromIntegral (length xs)
else error "summarise: not same sample size"
mean, stdDev, stdErr :: Double
mean = mean_X - mean_Y
stdDev = sqrt (var_X + var_Y - 2 * cov_X_Y)
stdErr = stdDev / sqrt n
mean_X, mean_Y :: Double
mean_X = sum xs / n
mean_Y = sum ys / n
var_X, var_Y :: Double
var_X = sumOver (\x -> (x - mean_X) ** 2) xs
/ (n - 1)
var_Y = sumOver (\y -> (y - mean_Y) ** 2) ys
/ (n - 1)
cov_X_Y :: Double
cov_X_Y = sumOver (\(x, y) -> (x - mean_X) * (y - mean_Y)) (zip xs ys)
/ (n - 1)
sumOver :: Num b => (a -> b) -> [a] -> b
sumOver f = sum . map f
summarise' :: [Int] -> [Int] -> Summary Int
summarise' xs ys =
round <$> summarise (map fromIntegral xs) (map fromIntegral ys)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment