Many thanks to all those who replied to my question re. ANCOVA assumptions. I have summarised the replies below.
Thanks again and kindest regards,
This is just a small question regarding the reasons why we use ANCOVA as the texts that I have read seem to differ in their explanations.
Say our dependent variable was 'test score' and our factor was 'gender'. We want to see if test score differs between genders.
(A) Now some (i.e. most!) texts say that if we found that age differs from person to person in our study (regardless of gender) then, because age is likely to be (linearly) related to test score, then we use ANCOVA with age as a covariate i.e. we are saying that, in effect, males and females have the same age range in our study.
(B) Another text (Brunig and Kintz, 1987 "Computational Handbook of Statistics")) uses an example for ANCOVA along the following lines - age is still related to test score but the sample of males in the study are younger than the sample of females (this could occur e.g. if one of our groups was comprised of primary school males and the other group secondary school females).
I would say that the criteria from ANCOVA are those of (A) above as often texts specifically say that for ANCOVA "the covariate is not related to treatment [i.e. group/factor]"...but I just thought I'd ask a second opinion.
Summary of replies
Both (A) and (B) are situations where ANCOVA would be valid.
In an experimental setting you would include a covariate in a model because it is predictive of outcome *provided that the covariate is not itself affected by the treatment*. Say our dependent variable was "hypertension measured at three months", the factor was the "treatment" and "blood pressure" was the covariate - you would not use as a covariate a blood pressure measurement measured two weeks after treatment had started as blood pressure itself could be affected by treatment.
In another example, say you have a randomized trial of two treatments for some type of illness for which the chosen outcome variable is some measure of quality of life, and you also intend to measure anxiety. Anxiety is taken as the covariate, "treatment" is the factor and "quality of life" is the dependent variable. If you measure anxiety *after* the onset of treatment, and the two treatments themselves differentially affect anxiety levels, by adjusting for anxiety you may thereby 'adjust out' from the outcome variable some of the difference between the two treatments - this will have the effect of biasing the treatment effect. If, however, you measure anxiety *before* instituting treatment, the measurement of anxiety will not be influenced by the treatments themselves - including anxiety as a covariate will simply adjust for any between-group differences in anxiety.
Taking my example (B) (above), where groups differ on the covariate (age) and groups (sexes) are pre-existing rather than formed by randomization, more caution is needed, as here the 'adjusting out' problem may arise.
There are two possible reasons to include a covariate (or, more generally, to include an additional variate in a linear regression analysis, which is the generalisation of ANOVA/ANCOVA):
1.The first is to explain some of the extraneous variation, and so reduce the unexplained residual (and hence reduce the standard errors of the effects you're interested in). This may work even where your covariate distribution is *the same* in your two groups.
2.The second is where the distributions of the covariate *differ*, and you need to adjust your comparisons to a common level of the covariate. In this case, the covariate is a confounder and you need to interpret your adjusted results carefully. To be a confounder a variable must be related both to the outcome being studied (test score in my example) and to one or more predictor variables (sex in my example). A potential confounder is a real confounder if including the confounder in the model changes the relationship between outcome and the predictor to which the confounder was related.