Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chrishwiggins/1594c8b72a4c74bdb369 to your computer and use it in GitHub Desktop.
Save chrishwiggins/1594c8b72a4c74bdb369 to your computer and use it in GitHub Desktop.
---------- Forwarded message ----------
From: chris wiggins <chris.wiggins@[YYY].edu>
Date: Wed, Aug 1, 2012 at 7:26 PM
Subject: stats history
To: hadley@[XXX].edu
Cc: chris wiggins <chris.wiggins@[YYY].edu>
Dear Hadley:
I'd like to try to address your tweeted inquiry
( https://twitter.com/hadleywickham/status/229402238404153344 )
as to why stats is so mathy.
like all good academics, let me start by saying how woefully
unqualified I am even to ask this question, having a PhD in
neither stats, math, nor history. Instead my PhD is in Physics
which, as we discussed, means I rush in to other people's fields
whereas more qualified and wise people fear to tread.
my brief understanding of statistics is this:
- 19th c, people already looked at data in mathematical
ways, including the great ones (Gauss (1822), Legendre (1805), etc)
- pre WWII: applied problems
such as genetics and Guinness (Fisher, Student)
- also pre-WWII, Jerzey Neyman passed through London to
to hang with Karl Pearson and then moved to Berkeley
to found what would become the worlds strongest stats dept,
a mathematical and Frequentist [edit] place.
- during WWII EE, physics, and statstics were badly needed.
the role of physics is well-celebrated. less so is the role
of EE (e.g., in developing radar) and stats, each of which
had small Manhattan-project-esque developments, i.e., federal
$ were used to bring into proximity many experts in the field
working very, very hard to solve very, very applied (martial)
problems. In the case of EE it was Harvard/MIT 'radlab', led
by Fred Terman, who went on to be the father of silicon valley. in the
case of stats there were the Columbia statistical research group,
a like-minded group at Princeton (both funded via warren weaver),
and likely others I don't know.
- that said there were hardly any stats 'departments' (nor, I
presume, were there many applied math departments -- warren weaver
also seems to have seeded this field during WWII and led to
postwar department building) and the departments being created
all seem to have grown out of math departments (similarly with applied math).
- statistics seems to have existential issues both in character
and stability at this time.
* stability: stats efforts at Columbia and Princeton, which were
* good schools and
* funded by warren weaver to do stats during and i presume after WWII
yet both had their departments implode or cut back. Columbia hemorrhaged
people post WWII. Princeton lost stats in the 1980s.
* character: because stats departments grew out of math departments there
was lots of snobbery. Lehmann's book [1] has many quotes about mathematicians
looking down on stats. i think that made stats have issues WWII until
today always trying to be "real" math.
- except for singularities like Tukey. Tukey had more than enough
math credentials, having a great PhD thesis in topology. Moreover he was
eccentric, home-schooled, and in general seemed not to give a damn
how other people did it. he was also apparently incredibly smart,
and applied his math at Princeton, bell, ETS (Princeton), and in
a variety of consulting work in industrial and government contexts.
- Tukey also bucked the Berkeley math envy by pushing computation
and exploratory data analysis. If you look for example at the 1998
Neyman lecture by Chambers about computation and statistics he's quite
explicit about how Tukey's thinking in 1964-65 influenced his and thus
S and thus R.
- My opinion is that this attention to being "real" (i.e., mathy) caused stats
to miss the data boat. the clear innovative thinker here was Tukey
who influenced
- - Tufte, with whom he taught a class at Princeton
- - Cleveland, who wrote the 'data science' article in 2001
- - chambers, who created R and S
- - he also coauthored with and influenced J H freedman at Stanford,
who presumably influenced Hastie and Tibshirani.
- - probably a whole mess of other people I don't know about owing to illiteracy
- something i don't know is the history of departments which don't' feature
in Lehmann's autobiography, including
- - Harvard (who made that place?)
- - Wisconsin (seems to have a lively history/department)
- the most amazing punchline here seems to be to be how WWII turned
Tukey into a statistician who spawned what is now data science and
data visualization. he sprang fully-formed from WWII a statistician
without, as far as i can tell, inheriting the DNA from Berkeley (which
in turn came from the UK). it's as though two separate species had
spontaneously formed and then interbred. am i wrong?
- cf https://twitter.com/mshron/status/229899515690364928
- cf https://twitter.com/mshron/status/229961814685908993
==
APPENDIX: gems from Lehmann's book:
==
The basic difference between the roles of mathematical probability in 1946
and 1988 is that the subject is now accepted as mathematics, whereas in
1946, to most mathematicians, mathematical probability was to mathematics
as black marketing to marketing . . . . And the fact that probability was
intrinsically related to statistics did not improve either subject’s
standing in the eyes of pure mathematicians."
@Stanford:"
The mathematics department received me with a certain detachment,” Bowker
says. “Although he became a great supporter of statistics, Gabor Szegö was
then chairman of the mathematics department, and explained to me very nicely
that while what I did was very interesting, it wasn't’t mathematics. So we
moved rather quickly to a separate department.”
@Stanford: And thus it came about that Al Bowker, formally still a graduate
student at Columbia (although by then his thesis had been completed), in
1948 became chairman of the fledgling statistics department,
@Princeton: "He also did not try to build up a group; however, he acquired a
colleague fortuitously. This was John Tukey, a topologist in the mathematics
department since 1939. During the war Tukey became involved in statistics,
and by 1945 considered himself a statistician rather than a topologist."
@Berkeley, circa 1946, "Evans [the chair] argued forcefully against a B.A.
degree in statistics, since it would ... would be essentially nothing but an
undergraduate professional degree.”
@Berkeley: "No course on Bayesian statistics was introduced until 1969."
le cam, 1950: "at Berkeley everything was full of measure theory and other
fanciful mathematics"
crazy: "Because of her interest in the history of probability, the Berkeley
statistics department in 1970 asked David to give a course in this
subject. The course, which met for two hours on Fridays, was given by her
regularly for a number of years. It was one way to satisfy a statistics
requirement and it soon became very popular, with a steadily increasing
enrollment that eventually rose to five hundred students. There were two
reasons for this popularity. One was that David was a lively and
entertaining lecturer; the other, which I am afraid was an even more
important reason, was that she demanded very little of the students. She
assigned no homework and there were no exams. The only requirement was the
final, an essay written at home on any topic of some relevance. Toward the
end of my term as chair, I began to hear rumors that a brisk market had
developed in essays recycled from previous years. As a result, we decided
soon after to discontinue the course."
@Tukey: "By the end of late 1945, I was a statistician rather than a
topologist."
@von mises: "All of von Mises’ work was infused by his view that the task of
applied mathematics is to build mathematical models of some aspects of the
real world,"
moment of clarity: "At this point, classical statistics splits into three
branches: point estimation, which tries to pinpoint the unknown parameter
; confidence sets, which provides a set in which can be stated to lie with a
certain guaranteed probability; and hypothesis testing, where a hypothesis
about is either accepted or rejected"
!: "Larry’s Los Angeles draft board considered mathematics as a deferrable
subject but not statistics. It therefore became essential for him to be in a
mathematics rather than a statistics department. So Larry contacted Jack
Kiefer, who arranged a position for him at Cornell, where statistics was in
the mathematics department"
"Tukey went in the opposite direction: he argued that much statistical
activity should take place without the use of any models....Tukey stressed
the primacy of the data"
"The Bayesian approach, however, was violently opposed as unscientific by
both Fisher and Neyman in the 1920s and 1930s, and as a result fell into
disuse. The person bringing it to life again was Leonard J. Savage"
- "Tukey stressed exploratory data analysis without any probability or
mathematics." -E.L. on JWT
- Huber (1997) writes:
Very few people will have realized at that time (I certainly was not among
them) that Tukey, while ostensibly speaking about his personal
predilections, was in fact redefining statistics.
@Berkeley, hard to picture: "as one of her courses, she started a
statistical consulting service, staffed by the graduate students taking the
course. During the ten years that she was in charge of this course, the
service provided statistical advice to about two thousand clients,
mostly—but not exclusively—from within the university. To head this service,
Julie was appointed lecturer in 1977 and senior lecturer in 1981, a position
in which she remained until her retirement in 1994."
That others too found it difficult is illustrated by a 1987 paper by
Speed, “What Is An Analysis of Variance?” which is followed by the comments
of eleven discussants, no two of whom quite agree on its meaning.
# References
[1] Citation to Lehmann's book:
Lehmann, Erich L. Reminiscences of a statistician: The company I kept.
Springer Science & Business Media, 2007.
https://goo.gl/9I9pt1
@chrishwiggins
Copy link
Author

regarding "harvard -- who made that place" ( https://gist.github.com/chrishwiggins/1594c8b72a4c74bdb369#file-letter-cw-hw-on-stats-and-data-science-history-L91 ) I later found out it was Tukey's student Mosteller, continuing the Tukey lineage.

@chrishwiggins
Copy link
Author

An authority/former advisee of JWT corrects me, saying "I'm pretty sure Fred Mosteller
... wasn't really John Tukey's Ph.D. student but more like buddy."

@chrishwiggins
Copy link
Author

chrishwiggins commented Apr 29, 2016

according to a Special Collections Assistant at Princeton:

Upon looking at the dissertation for Mr. Mosteller i was able to find a mention of thanks in the conclusion to his work. There he stated that he thanked S.S. Wilkes who he was under the direction of, as well as J.W. Tukey for his suggestions and constructive criticisms. Mosteller also sweetly thanked his wife Virginia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment