Print

Print


Dave,

This is very interesting.  I'm an experimental psychologist by training and
came to statistics rather late (age 36) in my career as a major academic
interest.  I would be most interested in any more of these "historical
notes" you may have.  My colleague P. Michial Politano are writing a
beginner's text "Statistics and Experimental Design" for Alyon and Bacon.  I
have found in over 35 years of teaching that students remember more when
they know the historical reference for the discovery that is now a "point
and click".  I would greatly appreciate any reference you have to this
historical stuff, even if that reference is you!

However, for my personal interest, I wonder if you could tell me why 6 and
(less often) 12 show up so often in non-parametric statistics as constants
in that special field of statistics?

Looking Forward,

Dennis
[In the beginning there was the Sponse...

  [Dennis L. Edinger, Ph.D.]  -----Original Message-----
  From: Concerned with the initial learning and teaching of statistics
[mailto:[log in to unmask]]On Behalf Of Saville, Dave
  Sent: Monday, May 28, 2001 10:13 PM
  To: [log in to unmask]
  Subject: R A Fisher 'discovered' degrees of freedom !


  Hi Pedro, Erich and all!  My understanding is that the concept of 'degrees
of freedom' became necessary in the 1915-1935 period when statistical
methods were being introduced which took account of the effect of sample
size (methods such as paired and independent samples t tests, regression and
analysis of variance).  Prior to that, methods were "large sample size"
methods.

  The first "small sample size" method was Student's t test, introduced by
Gosset (pseudonym Student).  Sir Ronald Fisher produced an elegant proof for
this (paired samples) t test using n-dimensional geometry.  In n-space, one
direction is associated with the mean, and the remaining (n-1) perpendicular
directions are associated with the variance.  To estimate the mean, you
project the "data vector" onto the first direction.  To estimate the
variance, you project the "data vector" onto each of these (n-1) directions,
square the projection lengths, and average (or sum and divide by n-1).  This
is the concept - the arithmetic can be rearranged to the formula given by
Erich.  Anyway, the (n-1) is EXACT, not an approximate entity.  This is very
obvious when you look at the geometry, but not obvious other ways.....

  Fisher found that his geometric proofs were not universally understood, so
he introduced the words "degrees of freedom," "sums of squares" and so on,
and gave algebraic formulae for entities which were really dimensions of
subspaces, sums of squared projection lengths, and so on.

  The idea of thinking in n-space may sound daunting, but it's not too bad
really.  I start off in 2-space, then 3-space, to get the basic ideas, then
n-space follows OK.  All the geometry you need is taught in the first few
weeks of linear algebra at university first year level (or I teach it to
agriculturalists in one 50-minute session).

  Each year I run a non-geometric, heuristically-based introductory "Basic
Stats" workshop which lasts for 3 whole days, for workers in agricultural
research.  When I cover estimation of the variance, I too use the idea that
Erich mentioned, that:
  "sum(xi-xbar)^2 < sum(xi-m)^2 except when xbar = m"
  and hence the left side needs a smaller divisor (n-1 instead of n).
However, I also mention that the n-1 comes from the geometry, and draw a
right-angled triangle depicting data vector (hypotenuse), mean vector and
error vector.  I mention that Pythagoras' Theorem a^2 = b^2 + c^2 gives the
various "sums of squares," and the fact that the n-1 is the dimension of the
subspace in which the "error vector" can vary.  I say that further
explanation is outside the scope of the workshop - nevertheless, people
really like to know that there is some decent maths behind all the methods
that I teach them in an intuitive sort of way, and I always get good
feedback on this aspect.

  In my view, the basic maths to which I refer has not been properly
described in a down to earth practical manner.  So my friend Graham Wood and
I wrote a couple of books about it.  The introductory one is as follows:
  Saville, D. J.; Wood, G. R. (1996).  Statistical Methods:  A Geometric
Primer.  New York, Springer-Verlag, 268 pp.  ISBN 0-387-94705-1.
  It describes the maths behind paired and independent samples t tests,
regression and analysis of variance, starting in 2-space and building up.
If anyone is interested further, email me if you like.
  Dave Saville
  Biometrician        Phone: +64-3-983 3978
  AgResearch        Fax:     +64-3-983 3946
  Gerald Street      Email:   [log in to unmask]
  P O Box 60
  Lincoln 8152, Canterbury
  New Zealand