Apologies for my ignorance, but:
How many points are really needed to fit a regression line?
I know that a line can be put through any 2 distinct points, and that 3
points will usually give an estimate of uncertainty. I believe 7 monotonic
points is the minimum for a significant non-parametric correlation. I also
have lodged in my head the notion that 5 points is about the sensible
minimum for simple regression, though I have no proper justification for
this number.
The background to this is that I have fitted glms to very short time-series
(<10 points) of counts of animal populations. I know that this is not ideal,
but it was what we had. The data seemed overdispersed, so I used negative
binomial errors and auxiliary data from areas with more information to
estimate theta for the areas with fewest observations. I did not worry about
contamination that might require robust regression, though I guess I could
have done some sort of bootstrap. I believe all this was reasonable, and the
referees found it acceptable.
I am now wondering about the general case. Is there any definitive minimum?
Have I missed a standard reference on dealing with tiny datasets? Would you
accept a simple regression based on three datapoints?
Thanks,
Mike Lonergan.
|