1) to fit a polynomial model, you need _at least_ as many points as
you have coefficients in the model, including the constant. This
will give you no estimate of statistical error - only a fitted line
that goes through each point.
2) How many more points are needed to make "an estimate of
uncertainty"? Minimum: 1. But the uncertainty is pretty big - not a
lot of information in the fit.
3) OK, how many to make a reasonable estimate? How close is
'reasonable'? Alternatively, what are you going to do with the fit?
If you predict an outcome from the next inputs, you will want more
than it you simply want to express the 'best fit' as an analytical
result. Clearly, if the uncertainty of measurement is small, then
your linear fit may be happy with 10 points, well selected. If you
really don't' or can't control the conditions as well as you would
like, then I recommend you increase the points upward. Or soften
your conclusions appropriately.
4) Oh, and for a linear fit (at least), where those other points are
located depends on what you are looking for and what you assume. If
you _believe strongly_ that the fit is linear, then put half on one
end, half on the other for best results. If you are not _positive_
about linearity, spread them out and try to test for linearity (but
this requires some duplicate x's.)
5) Clearly, I'm getting off the original 'simple' question of how
many measurements to make a 'good' fit. But a lot depends on what
you are doing with, and what you want from, your data. Sorry 'bout
that.
Cheers,
Jay
On Oct 12, 2007, at 9:26 AM, Mike Lonergan wrote:
> Apologies for my ignorance, but:
>
> How many points are really needed to fit a regression line?
>
> I know that a line can be put through any 2 distinct points, and
> that 3
> points will usually give an estimate of uncertainty. I believe 7
> monotonic
> points is the minimum for a significant non-parametric correlation.
> I also
> have lodged in my head the notion that 5 points is about the sensible
> minimum for simple regression, though I have no proper
> justification for
> this number.
>
> The background to this is that I have fitted glms to very short
> time-series
> (<10 points) of counts of animal populations. I know that this is
> not ideal,
> but it was what we had. The data seemed overdispersed, so I used
> negative
> binomial errors and auxiliary data from areas with more information to
> estimate theta for the areas with fewest observations. I did not
> worry about
> contamination that might require robust regression, though I guess
> I could
> have done some sort of bootstrap. I believe all this was
> reasonable, and the
> referees found it acceptable.
>
> I am now wondering about the general case. Is there any definitive
> minimum?
> Have I missed a standard reference on dealing with tiny datasets?
> Would you
> accept a simple regression based on three datapoints?
>
> Thanks,
>
> Mike Lonergan.
>
Jay Warner on the road,
Working out of Racine, WI, USA
[log in to unmask] [that's letter 'a' - number '2' - letter 'q']
http://www.a2q.com
the A2Q Method (tm) -- What do you want to improve today?
|