Dear all,
Some final words. Here is some extra feedback which I
thought may be of interest. In Jean's and my defence I
asked for a rule of thumb and I got a rule of thumb and a
reference so I am happy. You may be rest assured that I am
aware of its limitations as a guideline and it wont be
appearing in a protocol near you in the future!
Thanks
Paul
******
Actually Paul there is a sense in which if it is true for
regression it is true for ANCOVA but you need to know how
to calculate the number of independent variables.
Regression and ANOVA /ANCOVA are all just slightly
different formulations of the same kind of analysis. They
are all covered under the title General Linear Models (not
Generalized Linear Models which is a wider set). If you
want to think of ANCOVA as regression then you have to
imagine setting up dummy variables for your factors.
Thus in this case I would take the degrees of freedom due
to the model including covariates and use these instead of
Independent variables for the calculation of sample size.
This also gives you insight into when it is safe to move
towards the 5 mark rather than the 20. This is due to
evenness of spread along the Independent Variables. That is
equivalent to balance in ANOVA.
Again I repeat that all these allow is for is the tests
being valid and no clue at all for outcome.
Jean M. Russell
******
Dear Paul,
I thought I might follow up the discussion of sample sizes
in multiple regression, since I didn't personally agree
with the response you quoted from Jean Russell.
I am not personally familiar with the two books quoted by
Jean, but I find it hard to believe that anyone could
scientifically argue the case that the number of
observations needs to be 4 (or 20, or whatever) times
the number of regressors.
My answer would be in two parts:
1. The formal and theoretical answer is that it depends on
the power of the test. Let's suppose that the final outcome
of the regression analysis will be some linear test about
the parameters of the regression model. The appropriate
form of test is an F test, as known from standard theory.
When the null hypothesis is false, it's possible to
calculate the power of the test using the noncentral F
distribution. If one starts with a test of a given type I
error (say, .05) and then demands that for some
specific alternative hypothesis, the power has to be at
least a certain number (0.9, maybe), then this provides a
definite criterion by which to decide whether the sample
size is large enough. The theory of this was given
by Scheffe in his classic 1959 book, "The Analysis of
Variance" - many more recent books also cover the material,
but the basic theory of this has not changed since
Scheffe's time.
The practical difficulties of this are (a) determining a
definite alternative hypothesis at which to evaluate the
power - this is more a subject-matter consideration (e.g.
what improvement in treatment you as a doctor would really
care about) than anything statistical, (b) to apply the
method in practice still requires making assumptions about
some of the parameters (in particular sigma, the standard
deviation of the residuals - typically this would not be
known in advance and you either have to make a guess based
on past experience or do some preliminary sampling to obtain
an estimate).
2. A more ad hoc and intuitive criterion is simply to
calculate the estimates and standard errors of whatever
parameters you are interested in - if the standard errors
are too large for you to determine the desired parameters
with the required margin of accuracy, then you need
more data. For example, if the parameter being estimated
were the difference in efficacy of two drugs, and if the
standard error of that parameter were of the same order of
magnitude as the improvement in efficacy you are trying to
demonstrate, then clearly you need more data.
I hope that's of some help, if unfortunately not so
clear-cut as Jean Russell's answer. In accordance with
standard allstat practices, I'm replying to you rather than
the list, but feel free to broadcast this if other people
don't point out the same thing.
Best regards,
Richard Smith
Department of Statistics
University of North Carolina
-----------------------------------
Paul Wicks
[log in to unmask]
Senior Statistician
St. Georges Hospital Medical School
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|