[Dennis L. Edinger, Ph.D.] -----Original Message-----
From: Concerned with the initial learning and teaching of statistics [mailto:[log in to unmask]]On Behalf Of Saville, Dave
Sent: Monday, May 28, 2001 10:13 PM
To: [log in to unmask]
Subject: R A Fisher 'discovered' degrees of freedom !

Hi Pedro, Erich and all! My understanding is that the concept of 'degrees of freedom' became necessary in the 1915-1935 period when statistical methods were being introduced which took account of the effect of sample size (methods such as paired and independent samples t tests, regression and analysis of variance). Prior to that, methods were "large sample size" methods.

The first "small sample size" method was Student's t test, introduced by Gosset (pseudonym Student). Sir Ronald Fisher produced an elegant proof for this (paired samples) t test using n-dimensional geometry. In n-space, one direction is associated with the mean, and the remaining (n-1) perpendicular directions are associated with the variance. To estimate the mean, you project the "data vector" onto the first direction. To estimate the variance, you project the "data vector" onto each of these (n-1) directions, square the projection lengths, and average (or sum and divide by n-1). This is the concept - the arithmetic can be rearranged to the formula given by Erich. Anyway, the (n-1) is EXACT, not an approximate entity. This is very obvious when you look at the geometry, but not obvious other ways.....

Fisher found that his geometric proofs were not universally understood, so he introduced the words "degrees of freedom," "sums of squares" and so on, and gave algebraic formulae for entities which were really dimensions of subspaces, sums of squared projection lengths, and so on.

The idea of thinking in n-space may sound daunting, but it's not too bad really. I start off in 2-space, then 3-space, to get the basic ideas, then n-space follows OK. All the geometry you need is taught in the first few weeks of linear algebra at university first year level (or I teach it to agriculturalists in one 50-minute session).

Each year I run a non-geometric, heuristically-based introductory "Basic Stats" workshop which lasts for 3 whole days, for workers in agricultural research. When I cover estimation of the variance, I too use the idea that Erich mentioned, that:

"sum(xi-xbar)^2 < sum(xi-m)^2 except when xbar = m"
and hence the left side needs a smaller divisor (n-1 instead of n). However, I also mention that the n-1 comes from the geometry, and draw a right-angled triangle depicting data vector (hypotenuse), mean vector and error vector. I mention that Pythagoras' Theorem a^2 = b^2 + c^2 gives the various "sums of squares," and the fact that the n-1 is the dimension of the subspace in which the "error vector" can vary. I say that further explanation is outside the scope of the workshop - nevertheless, people really like to know that there is some decent maths behind all the methods that I teach them in an intuitive sort of way, and I always get good feedback on this aspect.

In my view, the basic maths to which I refer has not been properly described in a down to earth practical manner. So my friend Graham Wood and I wrote a couple of books about it. The introductory one is as follows:

It describes the maths behind paired and independent samples t tests, regression and analysis of variance, starting in 2-space and building up. If anyone is interested further, email me if you like.

Dave Saville
Biometrician        Phone: +64-3-983 3978
AgResearch        Fax:    +64-3-983 3946
Gerald Street      Email:   [log in to unmask]
P O Box 60
Lincoln 8152, Canterbury
New Zealand