Dear Torben and Alexandre,
> I found an error in my previous mail, sorry. So once more:
>
> I have been looking at how similar columns in a multiple regression
> model need to be, in order not to be estimable.
>
> The design matrix i have looked at is: X=[c1 c2 c3 mu]
>
> c1 and c2 were regressors with std=1 and mean=0, c3=c2+e*r (r is a
> random vector with mean=0 ans std=1) and i have then
> looked at the degrees of freedom as a function of "e". When "e" gets
> below ~ 10^-13 the columns c2 and c3 becomes inestimable, and the number
> of freedoms increases with one.
> That is a bit late i think specially because multiple regression could
> be used for designs having low degrees of freedom. Wouldn't one expect
> that the degrees of freedom available for estimating the contrast
> belonging to c1 should be the same no matter if e=0 or e=10^-13 ?
>
My understanding is that you would like to see some "soft transition" from
df=n-4 to df=n-3 as e goes toward zero, right?
I think the crucial point is that the space that is spanned by the design
matrix will be given by [c1 c2 r mu] for any design of the form [c1 c2 c2+e*r
mu] for as long as e is above the floating point tolerance of Matlab. It
really doesn't matter how "large" (or small) a regressor is, it will still
contribute one dimension to the design space. Not until the regressor
"vanishes completely" will the dimensionality of the design space decrease.
However, when the regressor (or its variance) gets very small, the
corresponding parameter estimates will get very large.
If you look at the error with which the corresponding parameters are
estimated you will see that as they get more and more colinear the
correspoding errors will increase (as seen by the second and third element on
the diagonal of inv(X'*X)). When e (in your example) is very small your
covariance matrix of your paramter estimates will look something like
[sn sn sn sn; sn BN -BN sn; sn -BN BN sn; sn sn sn sn]
where sn denotes small number and BN denotes BIG NUMBER , i.e. the error in
the estimates of the parameters corresponding to c2 and c3 will be very
large, and negatively correlated. So, although they are "in principle"
estimable, they will have very little meaning separately.
I guess you are right though, it would be nice to have some sort of "warning"
not only when columns are unestimable, but also when the error of the
estimate of the corresponding parameter is so large as to render it "silly"
or "meaningless". Still, that would also have to involve some "secret number"
as a threshold for when the warning is to be issued. I guess common sense
will still be the best guard against silly designs.
Good luck Jesper
|