Hi everyone,
Many thanks for those who replied to my email last week. I have listed the replies I received (they appear after my original communication - below).
Many thanks again,
Kim
Dr Kim Pearce PhD, CStat, Fellow HEA
Senior Statistician
Haematological Sciences
Room MG261
Institute of Cellular Medicine
William Leech Building
Medical School
Newcastle University
Framlington Place
Newcastle upon Tyne
NE2 4HH
Tel: (0044) (0)191 208 8142
-----Original Message-----
From: Kim Pearce
Sent: 05 June 2017 14:58
To: [log in to unmask] ([log in to unmask]) <[log in to unmask]>
Subject: ANCOVA vs multiple regression : homogeneity of regression slopes
Hi folks,
I'd just like to ask a quick query. I hope that someone can shed some light.
For an ANCOVA with one continuous dependent variable, one continuous covariate and a categorical q level independent variable (where each level represents a 'group') we are able to generate the group means which have been adjusted for the covariate and test the difference between group means adjusted for the covariate.
ANCOVA is basically a regression model. Say if we have a 3 level 'group' variable (B) with level 3 as the reference group and the covariate is called C we will have:
Y = constant + coeffc * C + coeffb1*B_level1 + coeffb2*B_level2
...and to calculate the adjusted mean for each group, we enter the average value of C.
As I understand, we are assuming (above) that the overall relationship between Y and C is the same for all groups - that is why there is only one coefficient for C. Thus one of the major assumptions of ANCOVA is that there is 'homogeneity of regression slopes' and this is tested via the B*C interaction.
If we considered this as a multiple regression (with a q level categorical predictor B coded as q-1 dummy binary variables and a continuous predictor C) rather than an ANCOVA, we would not do a test of homogeneity of regression slopes.....why is this so?
Many thanks for your views.
Kind Regards,
Kim
*****************************************
Reply A
Dear Kim
This is cultural difference between the researchers who use ANCOVA and Regression. As a rule those using regression modelling will tend to explain variance by multiple variables and steer clear of interactions. Those doing ANCOVA tend to focus very strongly on factorial designs with fewer variables. Thus the explanation that Regression would seek is are there other variables that should be in the model.
Consider a simple model which looks to predict height for girls and boys and we have age. Then ANCOVA seeks to look at remaining variance by looking at the interaction while regression wants to look at things such as father's height.
SPSS does not help with this difference as it makes it hard to do interactions in simple regression and by default tries to fit full factorial models in ANOVA.
******************************************
My Reply To A
So, if I'm understanding you correctly, those using multiple regression modelling would, in the example you used, tend to try to get the 'best fit' to the data (when modelling 'children's height') using:
Children's height = constant + coeff1*gender + coeff2*age+coeff3 * father's_height
Whereas the ANCOVA would tend to focus on a model which included the interaction term to try to get the 'best fit' to the data i.e.
Children's height = constant + coeff1*gender + coeff2*age+coeff3 * gender*age (1)
Have I correctly grasped your meaning?
Of course, if we considered ANCOVA in a regression modelling sense, then if we found that coeff3 (in (1) above) is not significantly different from zero (i.e. when we have homogeneity of slopes) then we eliminate the interaction term, and the following model (which excludes the interaction term) is a better fit to the data when compared to a model which includes the interaction term.
Children's height = constant + coeff1*gender + coeff2*age
********************************************
A's Reply:
Yes that is basically it. Regression modellers tend to use more variables and few interactions while covariance analysts draw on the ideas of ANOVA and use interactions. Basically it is where you look for your explanation
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Reply B
Hi Kim
Actually, I WOULD always do a test for homogeneity of slopes, on the basis that, if the slopes are not significantly different, we should choose the simpler regression model to avoid overfitting and provide more precision in the remaining effect estimates.
To my mind, the only difference between ANCOVA and a parallel slopes regression model is that the former arises as a generalisation of ANOVA, while the latter arises as a generalisation of multiple regression. Regardless of where you start from (ANOVA or regression) you should always settle on the simplest model that adequately represents the sources of variability in the data.
Hope this helps. I'd be interested in hearing what others have to say.
---------------------------------------------------------------------------
My Reply to B
So, say we had as our dependent variable 'children's height' then, if I'm understanding you correctly, if we consider this issue as multiple regression modelling (with potential predictors gender and age) then we should initially try to find the 'best fit' to the data using:
Children's height = constant + coeff1*gender + coeff2*age+coeff3 * gender*age (1)
And, of course, if we found that coeff3 (in (1) above) is not significantly different from zero then we eliminate the interaction term.
This is exactly what we do in ANCOVA where the elimination of a non significant interaction term signifies homogeneity of slopes and the following model (which excludes the interaction term) is a better fit to the data when compared to a model which includes the interaction term.
Children's height = constant + coeff1*gender + coeff2*age
Have I correctly grasped your meaning?
------------------------------------------------------------------------
Yes. Pretty much.
For me, the pragmatic question is always, "how well does my model describe/fit the data?". To a degree, the starting point is driven by whether you are chiefly interested in the grouping factor (ANOVA & hypothesis testing, complicated by the presence of a covariate) or the continuous predictor (regression modelling, complicated by the presence of differences between subgroups). Whether you call it an "ANCOVA model" or a "parallel slopes regression model" is largely a historical convention.
As a rule, I would also want to try fitting a quadratic term to test the assumption of linearity (or the presence of curvature, depending on your perspective), in which case, you might also want to check for interactions between the grouping factor and both the linear and quadratic terms. In your particular example of height vs. age, we know that this relationship is not linear (at least over a longer time interval) so some sort of non-linear growth curve may be a better option. However, the same principle still applies - we can test for differences between the groups by fitting an interaction between each curve parameter and the grouping factor ... provided you have sufficient degrees of freedom. Linearising transformations of either the x or y variables are also an option, keeping an eye on the assumptions of normality & homoscedasticity.
Finally, if you have replication of the any of the points on the continuous variable scale, you can test for goodness-of-fit. This is usually only possible in designed experiments where the points on the scale have been set by the experimenter.
---------------------------------------------------------------------------
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|