Hello everyone,


I am conducting a binary logistic regression using a categorical variable with 5 age categories.  The oldest age category is taken as the reference group.  Say, for argument’s sake, our binary outcome variable is “whether disease is present(1) or disease absent(0)”.  Say we get the following  output:



                               ODDS RATIO (95% CI)      p-value 

˜20-25 years      0.23           (0.07-0.76)       0.017

26-30 years         0.14           (0.03-0.65)       0.012

31-35 years         0.57            (0.19-1.69)      0.310

36-40 years         0.54             (0.19-1.53)     0.244

41-45 years         1



I know that the above means, for example, that the odds of getting disease is lower for the ‘<=20-25 group’ compared with the 41-45 group, and the odds of getting disease is lower for the ’26-30 group’ compared with the 41-45 group etc. 


Now it is the interpretation of the tests associated with the categorical variable that I am querying.  Texts often say that ‘we are comparing the significance of the effect of being in one category rather than the reference category’  - so, in the above, we can say that the ‘<=20-25 years’ category is significantly different from the 41-45 years category….but does this actually mean that the *odds* of having the disease in the  ‘<=20-25 years’ category is significant differently from the 41-45 years category?   The  book ‘Regression Analysis’ by Lewis-Beck (vol 2) page 145  imply this but I am just double checking as it’s not too clear.


Many thanks for your help on this issue in advance.


Kind Regards,




Dr Kim Pearce PhD, CStat, Assoc. Fellow HEA

Senior Statistician

Haematological Sciences

Room MG261

Institute of Cellular Medicine

William Leech Building

Medical School

Newcastle University

Framlington Place

Newcastle upon Tyne



Tel: (0044) (0)191 208 8142


You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.