“The answers you get depend on the questions you ask” (Kuhn)

I have been reading about the history of statistics in the world. It is most fascinating, and in fact very controversial. I first used google ngram to examine the patterns of usage of statistical terms. If one examines the term “statistics”, there was limited usage of the word up until the mid 1900s, then an acceleration beginning around 1960, with a peak around 2000, and then a 40% decline. There appears to be 3 peaks as we look at other aspects of components. The word “Median” has been in relatively continuous usage. Its first peak was in the 1890s, the in the last 1930s until 2000, then a decline. The terms ANOVA, sprang up early in the late 1930, and continue a modest rise, but then took off faster than any other statistical technique. It is the only statistic (including Chi Square, confidence interval, linear regression) that continued to rise.

The whole epidemiologic pattern is most fascinating. It appears that the concept of a Median has been around a long time. The sharp rise in the late thirties for ANOVA, standard deviation was amazing. This most likely was due to RA Fishers classic books in 1924 and 1935. What was extremely amazing was that in a very short period of time (<10 years), statistics took over almost all disciplines of science. It it not really clear why, psychology, medicine, physics, sociology, as grasp onto classical statistics, and classical statistics spread like the flu world wide. Almost every scientist agrees with the importance of statistics. Science is where it is today because of this.

The third rise was most dramatic. In the 40s and 50 the mathematical concepts and tools that formed the core of science were available. However, the scientists were drowned in data. Many of us have done ANOVA by hand, and it is not pretty. To invert a 12x12 matrix, would take weeks, and one’s arm would get very sore pulling the crank like a slot machine. Instead of money coming out, F tests were calculated.

Paradigm Shift: a fundamental change in approach or underlying assumptions.

The early 1970s brought about the third paradigm shift of statistics. It was here that computers became available, right behind the computers can statistical programs such as BMDP, SPSS, SAS. An analysis could be done in seconds, which previously would have taken days. With this the proliferation of analyses were available. Also, the final nail in the coffin for not using statistics was that journals would not accept articles unless statistical tests were done.

What I find most intriguing is about 3^{rd} phase, the rapid proliferation. Rather it is the period of time in the early 1940s where statistical analysis swept all across science. I have never seen anything like that. It is indeed a paradigm shift for all of science. Amazing.

In David Salsburg’s book “The Lady Tasting Tea” He wrote “During the 1940…The statistical revolution had completely taken over the scientific laboratory…” How and why did this occur so fast? This was a Tsunami of statistical methods with few scientists disagreeing about their importance.

The 4^{th} transition is not that clear as almost all the statistical terms demonstrated a flattening, or in fact a 30-40% decline beginning in the early 90s until now. Any ideas as to why we do not use statistical terms as much as we once did 3 decades ago?

It is exciting that once the BA Serageldin Euclid Library is finished, we can explore much more the epidemiology of statistical methods. If you think about other paradigm shift, such as Darwin, it took years for people to believe, and still there are many skeptics. However, I do not know any scientist who does not believe in statistics.

“The resolution of a revolutions is selection by conflict within the scientific community of the fittest way to produce future science. The net result of a sequence of such revolutionary selection separated by periods of normal research is the wonderful adapted set of instruments we call modern scientific knowledge”