Hello to everyone,
Thank you to those who replied to my question (listed again foot of this email).
For those who are interested, interaction and stratification are mathematically equivalent in the context I described i.e. say we are looking at the binary outcome 1=women over 40 years old with breast cancer 0=women over 40 years old with no breast cancer. We have two binary independent variables: age 41-49 = 0, age >50 =1 and the number of times a female has carried a pregnancies to a viable gestational age (parity): 0=none, 1= one or more.
We will called the age variable A and the parity variable P.
Say the model includes an interaction and is built like this
Alpha + beta1 * A + beta2*P + beta3*A*P
For A=1
Odds of breast cancer when P=1 / Odds of breast cancer when P=0 = (exp(Alpha + beta1 + beta2 + beta3))/((exp(Alpha+beta1)) = exp(beta2 + beta3) (1)
For A=0
Odds of breast cancer when P=1 / Odds of breast cancer when P=0 = (exp(Alpha + beta2))/((exp(Alpha)) = exp(beta2) (2)
Therefore the ratio of the effect of being P=1 compared to P=0 for A=1 relative to A=0 is exp(beta2+beta3)/exp(beta2) = exp(beta3)
Now if we stratify by age to get -
Model 1 for women aged between 40-49 (A=0):
Alpha1 + beta4*P
Model 2 for women aged > 50 years (A=1):
Alpha2 + beta5*P
Then beta4 will be equal to beta 2 in (2) and beta5 will be equal to beta2 + beta3 in (1).
I also received an excellent email from one respondent who provided an excellent discussion on the 'single' versus 'separate' model approach. I will also paste this at the foot of this email for your perusal.
Kind Regards,
Kim
-----Original Message-----
From: Kim Pearce
Sent: 10 January 2018 09:51
To: [log in to unmask] ([log in to unmask]) <[log in to unmask]>
Subject: Significant interaction and stratification : your views
Hi everyone,
Can anyone throw some light on the following?
Say we are considering a binary logistic model. For argument's sake, imagine we are looking at the binary outcome 1=women over 40 years old with breast cancer 0=women over 40 years old with no breast cancer. We have two binary independent variables: age 41-49 = 0, age >50 =1 and the number of times a female has carried a pregnancies to a viable gestational age (parity): 0=none, 1= one or more.
We will called the age variable A and the parity variable P.
Say the model includes an interaction and is built like this
Alpha + beta1 * A + beta2*P + beta3*A*P
If beta3 is statistically significant then this means that the odds of having breast cancer for women who have carried at least one pregnancy to viable gestational age compared to the odds of having breast cancer for women who have not carried at least one pregnancy to viable gestational age is significantly different for those women aged between 40-49 years compared to those aged > 50 years.
Now, my question is, if we do establish that beta3 is statistically significant would this justify a stratification of the data, so that we could (for example) have two binary logistic models (one for women aged between 40-49 and one for women aged > 50 years) where the linear predictors are:
Model 1 for women aged between 40-49:
Alpha1 + beta4*P
Model 2 for women aged > 50 years:
Alpha2 + beta5*P
Hence, in the above, we could determine if there was a statistical difference between the levels of parity at each age level separately.
Many thanks for your opinion on this.
Kindest Regards,
Kim
___________________________
Hi,
You could choose to stratify by age group regardless of the result of your test of beta3. But you don't need to. Using the combined model, you can estimate an OR summarizing the effect of parity on breast cancer risk for each age stratum separately, using appropriate linear combinations of the model parameter estimates. Depending on how you parametrize the model, these might be, e.g., exp(beta2) and exp(beta2 + beta3). That is what I usually would do. Some people call these "simple effects", although I've never found that a particularly useful or intuitive name.
The choice of whether or not to stratify, i.e., to fit 2 separate models or fit a single model with an interaction term, comes down to a classic bias-variance trade-off.
If you fit separate models, your estimators will have larger variance. But the estimator for one stratum will not be influenced by potential model-misspecification bias in the other stratum. In this simple case, that misspecification could arise from extra-binomial variation (overdispersion), dependence among subjects, or non-constant breast-cancer probability within a stratum (perhaps due to, e.g., missing important covariates). This last one is not necessarily a big issue if you are happy to accept that you are targeting a probability averaged over heterogeneous subjects. But, if you have missing important covariates and they are distributed differently for the 2 age strata, that still perhaps could be a problem.
If you fit a single model, your estimators will have smaller variance, because of the larger sample size. But now model misspecification in one stratum can bias estimators, or at least influence inference using those estimators, for the other stratum.
Most often, I prefer fitting a single model to estimate and test the interaction parameter and the individual ORs of interest. If it looks like the interaction is unimportant (based on magnitude of both p-value and effect estimate!), I might even estimate the OR averaged over the 2 age groups, again using an appropriate linear combination of the model parameter estimates. There, you need to decide how you want to weight the two groups when you average: give them equal weight, or weight by the sample proportions, or weight by some group proportions estimated from external data. That judgement depends on the purpose of your analysis.
Why prefer a single model? Well, it is simpler to describe, and possibly simpler to implement. In some contexts, it is faster to execute, e.g., if I am fitting such models using thousands of different variables one at a time. More importantly, it is much easier to adjust for testing of multiple hypotheses from a single model, than to try to adjust for the model selection process you described, in which you fit separate models only if your interaction test p-value is below some arbitrary threshold. And this adjustment is important. You should not do the model selection, decide to fit 2 separate models, and then do inference with each of those models as if they were pre-specified. That easily can lead to actual Type I error rates larger than your nominal rate.
Similarly, I hate the approach, which alas seems to continue to be taught frequently, of fitting the model including the interaction term, and then refitting without the interaction if the p-value from the interaction test is sufficiently small. Why? Because (1) usually people fail to account for the model selection process in subsequent inferences; and (2) the interaction test does not address the right question for making model specification decisions. You might have a large p-value because your test lacks power but have an interaction that influences the result. Or you might have a small p-value but an interaction magnitude that has no meaningful influence on the result. Fitting the single model, including the interaction term, avoids all that. Yes, you do take some risk of loss of power and precision if you include a term that turns out to be unimportant. But generally that loss is not too bad, and I prefer that risk to the risks entailed by the other options considered.
By the way, the statistician code of conduct requires me to answer at least one unasked question. Although you don't ask this, I almost always would use in my model age as a continuous variable, not some arbitrary binary version.
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|