JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for ALLSTAT Archives


ALLSTAT Archives

ALLSTAT Archives


allstat@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ALLSTAT Home

ALLSTAT Home

ALLSTAT  January 2018

ALLSTAT January 2018

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Replies: Significant interaction and stratification : your views

From:

Kim Pearce <[log in to unmask]>

Reply-To:

Kim Pearce <[log in to unmask]>

Date:

Fri, 12 Jan 2018 10:11:47 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (100 lines)

Hello to everyone,

Thank you to those who replied to my question (listed again foot of this email).

For those who are interested,  interaction and stratification are mathematically equivalent in the context I described i.e. say we are looking at the binary outcome 1=women over 40 years old with breast cancer 0=women over 40 years old with no breast cancer.  We  have two binary independent variables:  age 41-49 = 0,  age >50 =1  and the number of times a female has  carried a pregnancies to a viable gestational age (parity): 0=none, 1= one or more. 
We will called the age variable A and the parity variable P.

Say the model includes an interaction and is built like this

Alpha + beta1 * A  + beta2*P + beta3*A*P

For A=1

Odds of breast cancer when P=1 / Odds of breast cancer when P=0  = (exp(Alpha + beta1 + beta2 + beta3))/((exp(Alpha+beta1)) = exp(beta2 + beta3)				(1)

For A=0

Odds of breast cancer when P=1 / Odds of breast cancer when P=0  = (exp(Alpha + beta2))/((exp(Alpha)) = exp(beta2)							(2)

Therefore the ratio of the effect of being P=1 compared to P=0 for A=1 relative to A=0  is exp(beta2+beta3)/exp(beta2) = exp(beta3)

Now if we stratify by age to get -

Model 1 for women aged between 40-49 (A=0):

Alpha1 + beta4*P

Model 2 for women aged > 50 years (A=1):

Alpha2 + beta5*P

Then beta4 will be equal to beta 2 in (2)  and beta5 will be equal to beta2 + beta3 in (1).

I also received an excellent email from one respondent who provided an excellent discussion on the 'single' versus 'separate' model approach.  I will also paste this at the foot of this email for your perusal.

Kind Regards,
Kim

-----Original Message-----
From: Kim Pearce 
Sent: 10 January 2018 09:51
To: [log in to unmask] ([log in to unmask]) <[log in to unmask]>
Subject: Significant interaction and stratification : your views

Hi everyone,

Can anyone throw some light on the following?

Say we are considering a binary logistic model.  For argument's sake, imagine we are looking at the binary outcome 1=women over 40 years old with breast cancer 0=women over 40 years old with no breast cancer.  We  have two binary independent variables:  age 41-49 = 0,  age >50 =1  and the number of times a female has  carried a pregnancies to a viable gestational age (parity): 0=none, 1= one or more. 
We will called the age variable A and the parity variable P.

Say the model includes an interaction and is built like this

Alpha + beta1 * A  + beta2*P + beta3*A*P

If beta3 is statistically significant then this means that the odds of having breast cancer for women who have carried at least one pregnancy to viable gestational age compared to the odds of having breast cancer for women who have not carried at least one pregnancy to viable gestational age is significantly different for those women aged between 40-49 years compared to those aged > 50 years.

Now, my question is, if we do establish that beta3 is statistically significant would this justify a stratification of the data, so that we could (for example) have two binary logistic models (one for women aged between 40-49 and one for women aged > 50 years) where the linear predictors are:

Model 1 for women aged between 40-49:

Alpha1 + beta4*P

Model 2 for women aged > 50 years:

Alpha2 + beta5*P

Hence, in the above, we could determine if there was a statistical difference between the levels of parity at each age level separately.

Many thanks for your opinion on this.

Kindest Regards,
Kim

___________________________

Hi,

You could choose to stratify by age group regardless of the result of your test of beta3. But you don't need to. Using the combined model, you can estimate an OR summarizing the effect of parity on breast cancer risk for each age stratum separately, using appropriate linear combinations of the model parameter estimates. Depending on how you parametrize the model, these might be, e.g.,  exp(beta2) and exp(beta2 + beta3). That is what I usually would do. Some people call these "simple effects", although I've never found that a particularly useful or intuitive name.

The choice of whether or not to stratify, i.e., to fit 2 separate models or fit a single model with an interaction term, comes down to a classic bias-variance trade-off.

If you fit separate models, your estimators will have larger variance. But the estimator for one stratum will not be influenced by potential model-misspecification bias in the other stratum. In this simple case, that misspecification could arise from extra-binomial variation (overdispersion), dependence among subjects, or non-constant breast-cancer probability within a stratum (perhaps due to, e.g., missing important covariates). This last one is not necessarily a big issue if you are happy to accept that you are targeting a probability averaged over heterogeneous subjects. But, if you have missing important covariates and they are distributed differently for the 2 age strata, that still perhaps could be a problem.

If you fit a single model, your estimators will have smaller variance, because of the larger sample size. But now model misspecification in one stratum can bias estimators, or at least influence inference using those estimators, for the other stratum.

Most often, I prefer fitting a single model to estimate and test the interaction parameter and the individual ORs of interest. If it looks like the interaction is unimportant (based on magnitude of both p-value and effect estimate!), I might even estimate the OR averaged over the 2 age groups, again using an appropriate linear combination of the model parameter estimates. There, you need to decide how you want to weight the two groups when you average: give them equal weight, or weight by the sample proportions, or weight by some group proportions estimated from external data. That judgement depends on the purpose of your analysis.

Why prefer a single model? Well, it is simpler to describe, and possibly simpler to implement. In some contexts, it is faster to execute, e.g., if I am fitting such models using thousands of different variables one at a time. More importantly, it is much easier to adjust for testing of multiple hypotheses from a single model, than to try to adjust for the model selection process you described, in which you fit separate models only if your interaction test p-value is below some arbitrary threshold. And this adjustment is important. You should not do the model selection, decide to fit 2 separate models, and then do inference with each of those models as if they were pre-specified. That easily can lead to actual Type I error rates larger than your nominal rate.

Similarly, I hate the approach, which alas seems to continue to be taught frequently, of fitting the model including the interaction term, and then refitting without the interaction if the p-value from the interaction test is sufficiently small. Why? Because (1) usually people fail to account for the model selection process in subsequent inferences; and (2) the interaction test does not address the right question for making model specification decisions. You might have a large p-value because your test lacks power but have an interaction that influences the result. Or you might have a small p-value but an interaction magnitude that has no meaningful influence on the result. Fitting the single model, including the interaction term, avoids all that. Yes, you do take some risk of loss of power and precision if you include a term that turns out to be unimportant. But generally that loss is not too bad, and I prefer that risk to the risks entailed by the other options considered.

By the way, the statistician code of conduct requires me to answer at least one unasked question. Although you don't ask this, I almost always would use in my model age as a continuous variable, not some arbitrary binary version.

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager