Skip to content

Instantly share code, notes, and snippets.

@denis-bz
Created July 12, 2016 10:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save denis-bz/44cd4623f695c416368b699b4d3a81d5 to your computer and use it in GitHub Desktop.
Save denis-bz/44cd4623f695c416368b699b4d3a81d5 to your computer and use it in GitHub Desktop.
Sums of squares can be rather non-intuitive

Signal 98 % + Noise 20 % = 100 % ?

Data is often split into "signal" + "noise", with

|data|^2 = |signal|^2 + |noise|^2
         = sum of squares, data1^2 + data2^2 + ...

Sums of squares can be rather non-intuitive. For example,

101^2    = 99^2       + 20^2
|data|^2 = |signal|^2 + |noise|^2
|signal| / |data|  = 99 / 101 = 98 %  -- sounds good
|noise|  / |data|  = 20 / 101 = 20 %  -- not so good
|signal| / |noise| = 99 /  20 = 5     -- ?

Giving only one of these ratios -- 98 %, 20 %, 5 -- is misleading. Giving all 3 numbers, though, can be confusing.

What to do ? I like to print / plot "signal" not squared, and perhaps squared too, e.g.

PCA eigenvalues %: [  2   4   6   8   9  11  12  13  15  16 ...
PCA variance %:    [  9  17  23  28  32  36  39  42  45  47 ...

R squared

Statisticians use a ratio called "R squared", which in this context is |signal|^2 / |data|^2, e.g. 99^2 / 101^2 = 96 % -- impressive ?

R^2 gives the 'percentage of variance explained' by the regression, an expression that, for most social scientists, is of doubtful meaning but great rhetorical value.

-- Wikipedia Explained variation

Least squares

For a lovely picture of the squares that least-squares minimizes, see Coefficient of determination .

("Why sums of squares ?" I don't know of a brief answer for laymen, beyond "nice math", "commonly used"; comments welcome.)

cheers
-- denis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment