Skip to content

Instantly share code, notes, and snippets.

@johnmyleswhite
Last active August 29, 2015 13:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johnmyleswhite/8702964 to your computer and use it in GitHub Desktop.
Save johnmyleswhite/8702964 to your computer and use it in GitHub Desktop.
Counterexamples in Statistics

The sample mean is never exactly equal to the true mean when the true mean is irrational

Let D be any distribution over the integers. Suppose that the first moment exists for D and is an irrational number.

In this case, the sample mean is never exactly equal to the true mean because the sample mean is always a rational number.

This is simple to prove: the sample mean is always a sum of integers divided by the number of samples, which is always an integer.

The median is not a function of all parameters of data generating process

Consider the following mixture model:

  • We draw 99 values from a uniform distribution.
  • We draw 1 value from a right-shifted exponential defined as x ~ Exp(t) + 10, where t >= 0.

In this case, the median is entirely unaffected by the value of t, in the sense that, for all t, the distribution of the median is independent of t.

The median gives no insight into binary outcome data

Let x_i be a Bernoulli variable with parameter, p. Let N be an odd number of IID observations taken of a sequence of x_i's.

In this case, the median is always 0 or 1, regardless of the value of p. It is therefore not a smooth function of p, although its distribution does vary non-trivially as p changes.

The median has very high variance for bimodal with equally probably modes

Let D be a distribution defined as a 50/50 mixture of two distributions with modes M1 and M2. Then the median is highly erratic, oscillating between values near M1 and values near M2 over different draws.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment