Hello Everyone,
If I may, I’d like to ask your opinion about testing for linearity in the logit for a binary logistic regression.
Texts (e.g. Agresti “Categorical Data Analysis”, 1990) often quote the scenario when we have a single continuous (or discrete) explanatory variable (e.g. “age”). The data is expressed like so (hypothetical example when we are looking at number of “remissions” out of number of “cases” at each age level):
Age=8, Number of Cases = 2, Number of Remissions =0
Age=10, Number of Cases = 3, Number of Remissions =0
.
Age=20, Number of Cases = 3, Number of Remissions =2
Etc.
If there are only a few observations at each age level then we may group age levels , for example:
Age 8-12, Number of Cases = 7, No. of remissions = 0
Age 14-18, Number of Cases = 7, No. of remissions = 1
Etc.
Say there are ni cases at the ith setting of x (where x=age (or age group) in my example) and yi is the number of events at the ith setting of x (where events=remissions) in my example. As the binary logistic regression model takes the form logit(pi) = alpha + beta*xi , we may establish if the logit is linear in x by calculating yi/ni at the ith setting of x (where yi/ni can be thought of as being the “actual” pi) and then plot logit(yi/ni)= loge((yi/ni)/(1-yi/ni)) against xi (i=1…..number of settings of x where xi=age or mid point of age group as applicable). As logit(yi/ni) is not defined when yi=0 or ni, Agrestic suggests plotting “empirical logits” against xi where an empiral logit is defined as loge((yi+0.5)/(ni-yi+0.5)).
Copas (1982, “Plotting p against x”) suggests another method to assess if there is a linear effect of x in the binary logistic model. Here, a binary logistic model is generated for the data (i.e. alpha and beta estimated in alpha + beta*xi) and also logit(pi) is again calculated at each (ith) setting of x. Logit(pi) is then plotted against the fitted values of alpha + beta*xi (this is like plotting actual value versus fitted value). If a linear trend in x is correct, then we should obtain a straight line at 45○ through the origin.
In the real world, of course, we don’t work with a single continuous (or discrete) variable. Binary logistic models may contain multiple continuous/discrete explanatory variables and/or multiple categorical explanatory variables (of a binary or nominal nature) e.g. “gender” and/or “ethnicity”. In my way of thinking when we are considering several different types of explanatory variables in a binary logistic model we would follow the method suggested by Copas. For example (hypothetical), if we have one continuous explanatory variable, A ,which, say, has many observations at each age level so does not need to be grouped, one binary explanatory variable, B, with 2 levels (I label them here B=1, B=2) and one nominal categorical variable, C ,with 3 levels (I label them here C=1, C=2, C=3) then we would have z combinations (covariate patterns)….e.g. first pattern: A=age 8, B=1, C=1; second pattern: A=age 8, B=2, C=1 etc.
I would fit a model to the data (i.e. alpha and the 4 betas [corresponding to one variable for A, one binary variable for B and 2 dummy binary variables for C] would be estimated) and also logit(pi) would calculated at each of the z combinations. Logit(pi) would then plotted against the fitted value for each of the combinations (again, this is like plotting actual value versus fitted value). If the model was satisfactory then we should obtain a straight line at 45○ through the origin.
Of course, we would also generate model diagnostics (e.g. Pearson’s residuals and deviance residuals) to assess the fit of the model, so this also gives us an idea of whether the model is “satisfactory”.
Many thanks for your views on the above. I appreciate your time greatly.
Kind Regards,
Kim
Dr Kim Pearce PhD, CStat, Fellow HEA
Senior Statistician
Faculty of Medical Sciences Graduate School
Room 3.14
3rd Floor
Ridley Building 1
Newcastle University
Queen Victoria Road
Newcastle Upon Tyne
NE1 7RU
Tel: (0044) (0)191 208 8142
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|