Dear all,
I would like to ask a query about the testing of model coefficients in a
multiple regression. According to Montgomery and Peck, 1992 (p138) there
are two methods of assessing significance. I would like to know the
difference between the two. Can anyone shed some light? I have provided an
example below.
1) M & P firstly use the p values associated with the T statistics. Here we
are testing the significance of any individual regression coefficient,j,
i.e. H_O: B_j =0 vs H_1:B_j ne 0 .
M &P say that "this is a partial or marginal test because the regression
coefficient B_j depends on all the other regressor variables in the model.
Thus this is a test of the contribution of x_j given the other regressors in
the model." So in my example below we could say that the coefft of my
variable 'radio' is non significant (as is magazine) and we could delete
radio from the model as it has the largest non sig p value...we could then
regenerate the model using 'TV' and 'magazine' and assess their significance
again...this procedure is like 'backward elimination'.
2) M & P also go on to talk about the 'extra sums of squares' method...which
is "determining the contribution to the regression sums of squares of x_j
given the other regressors are included in the model" (this seems similar to
what they have said in 1. above??). They test the contribution of an
additional variable using:
SeqSS for that variable / MSE of the full regression model
In the example below, for the radio variable, we would thus have 28.92/19.52
= 1.48 tested against the F distribution with 1 & 6 degress of freedom.
This is actually equivalent to the T statistic corresponding to the variable
'radio' (as T^2 = F) and it is assessing the significance of 'radio' given
that 'TV' and 'magazine' are in the model.
My questions are:
A) For method 2, could we also use the SeqSS from the *same output*
corresponding to 'TV' and the sequential SS from the *same output*
corresponding to 'magazine' to assess the significance of these two
variables...so we would have:
H_0: B_TV=0 vs B_TV ne 0 (given magazine and radio already in the model)
991.57/19.52 = 50.8 tested against F_1,6
and
H_O: H_0: B_magazine=0 vs B_magazine ne 0 (given TV and radio already in the
model)
174.04/19.52 = 8.92 tested against F_1,6
If we can do the above then 'TV' is significant and 'magazine' is
significant (the latter is non significant using the T statistics - why is
this?). Also if we can do this - which is the better to use -> T statistics
or the seqSS procedure??
If we construct a model so that the 3 variables enter the model in a
different order, the T statistics (and associated p values) do not change
but the seq SS will, of course, change thus if we can use this 'Seq SS'
method to evaluate the significance of all 3 coeffts then, on using the Seq
SS method, couldn't we see the significance of the 3 coefficients changing
depending on which order they entered the model?
or
B) Do we only use method 2 to assess the significance of the *final*
variable in the model (in my case 'radio')? In which case the SeqSS/MSE
gives a result which is equivalent to the T statistic?
(M&P only provide an example which deals with the case B scenario above).
I have always only used T-statistics as a quick way of evaluating
significance of coefficients; the SeqSS technique is puzzling me!
Many thanks for your help,
Kim.
The regression equation is
y = 266 + 6.73 TV + 3.26 magazines + 4.51 radio
Predictor Coef StDev T P
Constant 266.23 16.34 16.29 0.000
TV 6.727 1.344 5.01 0.002
magazine 3.257 1.642 1.98 0.095
radio 4.507 3.703 1.22 0.269
S = 4.418 R-Sq = 91.1% R-Sq(adj) = 86.6%
Analysis of Variance
Source DF SS MS F P
Regression 3 1194.53 398.18 20.40 0.002
Residual Error 6 117.11 19.52
Total 9 1311.64
Source DF Seq SS
TV 1 991.57
magazine 1 174.04
radio 1 28.92
|