I want to sincerely thank everyone who replied to my query. Each of
your answers (below) provided me with a better understanding of the
concept and the application. Thank you so very much.
I am researching what I thought would be a simple concept: degrees of
freedom. While reading the archives, I noticed that a number of people
found errors in their data that are directly related to this topic. I'm
beginning to feel as if degrees of freedom are blindly applied by people
like me, but never fully understood. Moreover, the somewhat mysterious
concept of degrees of freedom, (typically defined as n-1 and used to
determine mean-squares in ANOVA's) is not explained very well in any of
the stats books that I've looked at. I would sincerely appreciate
everyone's input and I will compile the answers.
1. Where does it come from?
2. Why is it always 1 less than n?
3. What, exactly, is degrees of freedom?
[log in to unmask]
I can't give you the math derivations, but rather a gut feel for what it
does. See below.
Take a pen, with a pocket clip on it, and toss it in the air. It can
spin about the center in three axes - rotate aboutthe long axis, about
right angle line through the clip, and about another line at right
angles to both of those. It can fly up in the air in 3 dimensions - up
down, left right, forward & back. In mechanics, we call those the 6
desgrees of freedom of a free body.
Now hold the pen so that it can only rotate about the long axis. It can
still move up down, right left, forward & back, but it cannot rotate
through the clip (you hold it that way!). You ahve reduced the degrees
of freedom. You can then hold it so that it slides along a line, in the
direction of the long axis. Fewer degrees of freedom. Eventually, you
will have pinned the pen down completely, and it cannot move at all. Do
it yourself, so you can feel the pen being tied down.
Now for the numbers. If I tell you that I have 4 meausrmeent numbers,
those four could be anything. If I tell you the average value of 4
numbers, three of those 4 numbers can individually be anything. The
fourth one, however, will depend on the first 3. Those first 3 numbers
are 'free,' or have degrees of freedom. If I give you the value of 3
numbers, and give you the average of the 4, then you can dope out the
fourth value. It is not 'loose,' or free. Thus, 4 numbers have 4
degrees of freedom. When I state the average, I have only 3 (4-1)
degrees of freedom.
If I then tell you the estimated standard deviation, I will use up one
more degree of freedom. I know this because when I know 2 values, an
average and an est. stdev, I can dope out the other two measurements.
In the Anova, which you mentioned, you will see that each time we
provide a summary calculation of some sort, the degrees of freedom
(number of 'loose' measurmeents) is reduced. In a linear regression,
each coefficient reduces the df again.
Now for your specific questions:
> 1. Where does it come from?
Either does not compute, or demonstrated above.
> 2. Why is it always 1 less than n?
'always'? When one summary value is calculated, then df = n-1 Could be
n. - k, too.
> 3. What, exactly, is degrees of freedom?
Did I get that?
If you get a good mathematical demonstration, please send it on to me,
BTW, there was a big discussion in the 1930's on whether the stdev
should be n or n-1. So you are hardly alone. See my web site for a
discussion of which to use - the comparison of 2 groups (Student 't')
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
Ph: (414) 634-9100
FAX: (414) 681-1133
email: [log in to unmask]
Power to the data!
"Nigel Griggs" <[log in to unmask]>
Here are a couple of messages which went out a while ago via the
teaching-stats mailing list. They might be of use to you, as they were
me, in terms of thinking about the concept of DoF.
I let the class provide a small population, say of size 5. Then I let
them sample randomly from the above population. Fix the sample size to
be three. Then I let them calculate the mean of the population, then
mean of each of the samples. The average of the sample mean is
demonstrated to be equal to the population mean. The mean is the number
that forces the average of sample means to be fixed. In return the mean
of each sample forces one of the members of the sample not to be free.
Therefore when one population parameter is not known one degree of
freedom is lost. Similarly, if two parameters are unknown two degrees
of freedom are lost, etc. I would like to demonstrate the link between
the number of unknown parameters and the degree of freedom without
having to go through the extensive demonstration in class.
At 2:35 PM 6/24/97, Dr. Shahdad Naghshpour wrote:
>What is a good way of explaining the concept of degrees of freedom to
>beginning statistics students?
You have three numbers that must equal ten. Two are free to vary, but
once you know the first two, the remaining number is no longer free to
vary. So, you have two degrees of freedom in this situation. E.g., You
know you have 5 and 3 as your two numbers that are free to vary. What
must the remaining number be? 2. The last number is not free to vary
but the first two are.
Hope this helps,
"Nick Cox" <[log in to unmask]>
University of Durham
I think you are right that the idea is sometimes not well explained.
The term `degrees of freedom' comes from an analogy with degrees of
freedom in classical mechanics, and refers to the number of ways in
which a body can move. This isn't likely to help much anyone who knows
less mechanics than statistics. I suspect that now few people learning
the idea of df will have previously encountered it in mechanics, even if
they are specialising in mathematics and statistics. The reverse was
possibly true in, say, the early decades of this century.
The number of df is not always n - 1 by any means, but depends on the
number of constraints that must be satisfied.
In general, at least in the simplest situations,
number of degrees of freedom = number of data - number of linear
In one situation, the requirement that a set of n numbers add up to
a total imposes one constraint. The numbers then have n - 1 degrees of
freedom because any n - 1 can vary freely but the last is then fixed by
There is a good expository article by Helen Walker in Journal of
Educational Psychology 31, 253-69 (1940). There is good material in
George W. Cobb, Introduction to design and analysis of experiments.
A fuller answer would have to explain why there are situations in which
the degrees of freedom is not an integer. Here the original analogy is
[log in to unmask] (John Whittington)
The concept is all about how many of the items of data in your sample
are 'free to vary' for any particular value of the parameter(s) you are
estimating/testing. Hence, if you have a sample of N, and wish to
estimate the mean, N-1 of them are 'free' to take *any* values they
might like - and the mean could still take literally ANY value,
depending upon the value of the Nth one. In other words, you could give
me values for N-1 items, and I could still make the man ANYTHING, but
appropriately assigning ythe value of the last item. The number of DF
is therefore N-1 in that situation.
In the case of an unpaired t-test, say with groups of N1 and N2, we are
estimating/testing the difference between the two means. By the above
logic, N1-1 and N2-1 respectively will be 'free to vary' without
constraining the two means (hence the difference between them), so that
the total DF will be (N1-1) + (N2-1) = (N1 + N2 - 2).
If you were wishing to estimate *two* parameters (say mean and variance)
simultaneoulsy, then all but TWO of the variables would be 'free to
vary'(take any values they wished). If N-1 were allowed to vary, then
choice of the last value for allow either mean or variance (but not
both) to be made equal to ANY given value. If only N-2 were free to
vary, then both variance and mean could take any given values, according
to the values of the last two items. In this situation, DF would be N-2.
Does that help at all?
"Philippe.NIVLET" <[log in to unmask]>
The number of degrees of freedom represents exactly the number of
independent parameters of a n-variant system. This means that it equals
the number of parameters minus the number of constraints between them.
In chemistry, e. g., the number of degrees of freedom is explicitly
given by Gibbs' rule . More generally, you can fully understand this
concept geometrically :
If you study a system with n parameters x_i,i=,1...n you can represent
it in a n-dimension space. Any point of this space shall represent a
potential state of your system. If your n parameters could vary
independently, them your system would be fully described in a
n-dimension hyper-volume. Now, imagine you've got one constraint between
the parameters (an equation relying your n parameters), then your system
would be described by a (n-1)-dimension hyper-surface.
In statistics, your n parameters are your n data. To evaluate variance,
you first need to infere the mean E(X). So when you evaluate the
variance, you've got one constraint on your system (which is the
expression of the mean), and it only remains (n-1) degrees of freedom to
I hope this can be helpful to you,
Institut Francais du Petrole
Division Geophysique et Instrumentation
tel : 01-47-52-60-00 (poste 8824)
mail : [log in to unmask]
"Miland A I Joshi" <[log in to unmask]>
[log in to unmask]
University of Manchester, UK
Greetings - I am a Medical , i.e. Applied Statistician, so my
strength is not theoretical. 'Degrees of freedom' is indeed a
concept that is easy to teach 'slickly' in standard situations, but
its mathematical definition is actually very difficult. However a
simple 'working definition' is 'sample size minus the number of
estimated parameters' - so it is not always n-1. I suspect
that the best way to deal with this problem for you would be to look
at a number of good worked examples, and you will find some in D.
Altmans's Practical Statistics for Medical Research (Chapman and
Hall, ISBN 0412276305).
I hope this helps.