Thank you to everyone who gave an opinion as regards my question on ANCOVA and the issue of independence of the covariate and independent variable (see foot of this email for my original queries).
I think the following document sums up the views of those who replied https://www.theanalysisfactor.com/assumptions-of-ancova/ . In particular the author quotes "Design and Analysis: A Researcher's Handbook" (Keppel & Wickens) - which, with reference to an *experimental* setting, says :
"ANCOVA is used to accomplish two important adjustments: (1) to refine estimates of experimental error and (2) to adjust treatment effects for any differences between the treatment groups that existed before the experimental treatments were administered. Because subjects were randomly assigned to the treatment conditions, we would expect to find relatively small differences among the treatments on the covariate and considerably larger differences on the covariate among the subjects within the different treatment conditions. Thus the analysis of covariance is expected to achieve its greatest benefits by reducing the size of the error term; any correction for pre-existing differences produced a random assignment will be small by comparison."
This is similar to what I said in my initial mail i.e. considering an RCT design (with Y as end of trial outcome), by including baseline Y as a covariate in ANCOVA we (i) reduce error variance (making the test on "treatment" more powerful) and (ii) "adjust" our treatment comparisons to a common value of baseline Y. However, after randomisation, we usually find only slight imbalance of baseline values of the covariate in the treatment groups...and, hence, the "greatest benefit" of ANCOVA in this situation is to reduce error variance. Thus, in answer to my question, the association between a covariate (baseline Y) and the independent variable (treatment) in an RCT scenario is likely to be weak (if randomisation has been done properly).
In addition the author quotes Keppel and Wickens as saying "The main criterion for a covariate is a substantial linear correlation with the dependent variable, Y. In most cases, the scores on the covariate are obtained before the initiation of the experimental treatment.... Occasionally the scores are gathered after the experiment is completed. Such a procedure is defensible only when it is certain that the experimental treatment did not influence the covariate....The analysis of covariance is predicated on the assumption that the covariate is independent of the experimental treatments."
As regards my question which asked about application of ANCOVA in a non RCT situation, I gave an example where we have "achievement score" as the dependent variable, "educational establishment (primary school, secondary school) " as the independent variable and "age" as the covariate. The author of the link (in relation to their own similar example) says "If however, as in our example, the main categorical independent variable is observed and not (experimentally) manipulated, the independence assumption between the covariate and the independent variable is irrelevant."
If anyone has any further views on these issues, I'd be only too happy to receive them.
Kindest Regards,
Kim
Dr Kim Pearce PhD, CStat, Fellow HEA
Senior Statistician
Faculty of Medical Sciences Graduate School
Room 3.14
3rd Floor
Ridley Building 1
Newcastle University
Queen Victoria Road
Newcastle Upon Tyne
NE1 7RU
Tel: (0044) (0)191 208 8142
-----Original Message-----
From: A UK-based worldwide e-mail broadcast system mailing list [mailto:[log in to unmask]] On Behalf Of Kim Pearce
Sent: 02 August 2019 14:32
To: [log in to unmask]
Subject: ANCOVA and its correct application: your views
Hello everyone,
If I may, I would like discuss the application of ANCOVA.
Say we are considering ANCOVA in an experimental setting i.e. in a RCT scenario. We randomly allocate n1 patients to group A and n2 patients to group B and take a measure (Y) at baseline, we then administer the treatment and placebo to group A and B respectively and measure Y again six months later .
We wish to evaluate the effect of treatment on end of trial outcome.
An ANCOVA model is essentially a regression model and, in our simple case above, we have "6th month Y value"(i.e. end-of-trial outcome) as the dependent variable, "treatment type" as the independent variable and "baseline Y" as the covariate.
We can think of ANCOVA in two ways:
(i)As baseline Y is likely to be associated with end-of-trial outcome, we can view it as acting like a "blocking factor". Thus, after conducting an ANCOVA (i.e. including baseline Y as a covariate), error variance is reduced and the test on "treatment" is more powerful (smaller P-value and narrower confidence interval).
(ii) Randomisation is expected to balance treatment groups as regards baseline Y but, in practice, it is not unusual to observe imbalances after randomisation. Thus we can also view ANCOVA as "adjusting" our treatment comparisons to a common value of baseline Y i.e. holding baseline Y constant across our two treatment groups and comparing the two "adjusted" means.
Now it is often said that there should be independence of the covariate and treatment effect. To use an example I have seen previously, say QoL is our dependent variable, "treatment type" is the independent variable and "anxiety level" is the covariate. We will take it that anxiety is associated with QoL. If anxiety is measured after administering treatment and the treatment affects anxiety levels then the adjustment for anxiety may hide or exaggerate the treatment effect. It will therefore make the treatment effect difficult to interpret.....My question is....doesn't this problem occur in situation (ii) above when we have imbalance (albeit slight imbalance) as regards baseline Y in our two treatment groups? i.e. where there is dependence between the covariate and the treatment. I guess the reason why (ii) is acceptable is that *prior to randomisation* we are saying that there is an *assumption* of independence of the covariate (baseline Y) and treatment (but, of course, after randomisation we usually find that baseline values of the covariate are not identically distributed in all treatment groups)?
Say we consider another hypothetical example (non RCT this time) where we have "achievement score" as the dependent variable, "educational establishment (primary school, secondary school) " as the independent variable and "age" as the covariate. Here, the assumption of "independence of the covariate and treatment effect (independent variable)" is clearly violated, yet I have still seen many examples which use ANCOVA in this sort of scenario where (in this case) they would use ANCOVA to compare the two educational establishments as regards (average) achievement score at a common value of age.
I would greatly appreciate your views on these issues.
Kindest Regards,
Kim
Dr Kim Pearce PhD, CStat, Fellow HEA
Senior Statistician
Faculty of Medical Sciences Graduate School Room 3.14 3rd Floor Ridley Building 1 Newcastle University Queen Victoria Road Newcastle Upon Tyne
NE1 7RU
Tel: (0044) (0)191 208 8142
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|