Hello all,
A few weeks ago I posted a request re: logistic regression to the list and
I've attached the responses below (apologies if I've left anyone out).
The responses include using survival analysis as opposed to logistic
regression, suggestions to fine tune what the objective of the work is and
using a multi-level model approach when undertaking logistic regression.
Having just visited the BMJ site (http://www.bmj.com) and read the paper by
Spiegelhalter et. al., a predictive model using a bayesian approach may
also prove useful i.e. you already know whether someone has had a heart
attack, useful prior information?
Thanks to all those who replied.
Simon
Simon Williams
Research Fellow
Department of Anaesthesia
Level 7
Bristol Royal Infirmary
BS2 8HW
Tel. 0117 9283169
Fax. 0117 9282098
<<<<<Original request>>>>>>>>>>>
Hello all,
A colleague is investigating the possibility of
using a logisitic regression model to predict whether a
patient will have a heart attack in the future. Some of the predictive
variables are age, sex, current cholesterol level, etc.
However he would also like to include the number of previous heart
attacks as a predictive variable. Although i think it is right to
use such information would there not be a problem with the fact that
the outcome and predictive variables are 'related'?
If you have any thoughts or suggestions for papers/books dealing with
this subject i'd be very grateful if you post them directly to me.
I will post a summary of the replies to the list at a later date.
Thanks,
Simon
<<<<<<<<<<<replies>>>>>>>>>>>>>>>>>>>>
From: Linda Hunt [log in to unmask]
It may be OK to use logistic regression if you are talking about the
risk over a specified period, say for example over the first year after
some surgical procedure...otherwise the risk will be related to the
length of time the patient is being followed.
Have you considered using survival techniques, eg Cox P-H? You could
maybe include a stratification by previous heart attacks (some/any,
number??), since the hazard rates may be different.
From: "D Wright" [log in to unmask]
As I understand it, you can include previous heart attack as a
covariate. However, there may be problems of interpretation. If the
effect of other covariates, for example cholestoral level, are
closely associated with previous heart attacks then the analysis may
show no cholstoral level effect. This sort of issue arises in
survival data analysis with time dependent covariates. A good
discussion is given in the book by Kalbfleish and Prentice, The
Statistical Analysis of Survival Data, pages 124-126, under the
section on internal covariates. If your aim is prediction then this
should not be of any importance and I would think previous
history of heart attack is a very useful predictor of future heart
attacks.
From: T R Harris [log in to unmask]
The answer may depend on whether the goal of the research is primarily
"predictive" or "analytical."
If the goal is predictive (i.e., simply to predict whether or not a
patient will have a future heart attack) then I see no problem at all in
the inclusion of previous heart attacks in the model. In fact, it's
probably very desirable to include this variable if it does in fact add
substantially to the ability to predict correctly.
If the goal is analytical (to learn something about the causal mechanisms
relating age, cholesterol, etc., to heart attacks) then I would worry
about direct and indirect causal mechanisms. For example, age may affect
future heart attacks directly and also indirectly through its effects on
other predictors in the regression model. Do you want to know about the
direct effect or the indirect effect? And "direct" and "indirect" are
themselves relative to the selction of predictors in the regression model.
Thus you need to think carefully about the causal ordering among the
variables in the model (and maybe about some unmeasured variables as
well). If you were doing linear regressions, you could look at path
analysis as the conceptual framework, or perhaps structural equation
models if you want to think explicitly about latent (unmeasured variables)
(but SEMs are probably overkill in your situation). I don't know how
logistic regression changes the thinking, but I think the essential
concepts (direct and indirect causation, for example) are unchanged
although the mathematical details are no doubt different. Sorry that I
don't have references at hand, but the topics I would look for are path
analysis, causal analysis, causal models, intervening variable, antecedent
variable, mediator, spurious correlation. Earl Babbie, The Practice of
Social Research, may have an introductory discussion of the key issues
(using social science examples).
From: "Duncan Smith" [log in to unmask]
I'm not sure if I'm missing something; but aren't they are bound to be
'related'
if the model is correct? I don't see any obvious problem. Maybe your
concern
is over the possible near multicollinearity of the predictors. I wouldn't
worry
about that very much unless I was trying to get precise parameter estimates
(eg.
in econometric models). You want a predictive model, so you can probably
live
with it.
From: "Scott, Martin {TD-B~Mannheim}" [log in to unmask]
Hello Simon,
I don't think you have any problem with including the term as a variable in
the logistic regression modelling. After all, if you didn't any of the
variables were related to the dependent variable, then you wouldn't do any
modelling in the first place. The number of previous heart attacks is
indeed a candidate to be a prognostic factor for future heart attacks. I
would be tempted to include the variable as a simple binary (had previous
heart attack / no previous heart attack) response.
Hope that helps.
Martin
From: [log in to unmask]
Sounds suspiciously like you are not useing statical correlation as it
might be. If previous heart attacks are input, and future one is output,
you _want_ correlation. difficulty would be if current cholesterol level
was always high for previous HA's. - that would be an covariate relation
you _don't want.
Besides, what make syou think that previou8s HA's predict future ones?
Must be some data there already.
Good luck,
Jay Warner, Principal Scientist
From: Chris Sutton [log in to unmask]
Simon,
You might find Weinberg, C.R. 'Towards a clearer definition
of confounding', American Journal of Epidemiology, 1993, 1-8 useful
and the recent book Woodward, M,
'Epidemiology: study design and analysis', Chapman and Hall, 1999 has
a good section on the definition of confounders. Hope this is useful
From: [log in to unmask]
Dear Simon,
Using previous heart attacks might look good at first glance but you would
have to define the seriousness of the previous attack.
I have come upon many individuals, who upon have an EKG taken, show that
there has been damage to the heart previously. When questioned, they often
reply that they have never had a previous attack. It seems that many heart
attacks are never identified as a heart attack at the time they occur. If
you use 'previous damage' to the heart you can count it as a heart attack
but
you can't be sure if it represents one previous attack or a number of
previous attacks.
From: Paul Seed [log in to unmask]
A bit meaningless unless all patients are followed for a fixed time
following the measurement of predictive variables.
More common is to carry out a survival analysis using the time to event.
In either case, what is he to do about multiple heart attacks?
Surely not. If only one event is recorded per subject, it is related to
previous MIs only in that past events predict future events.
Multiple event survival analysis is also possible, but this is a more
conplicated issue.
From: Patrick McElduff [log in to unmask]
Simon,
I have done similar types of analyses. One of the purposes of logistic
regression is to describe the relationship that exits between the
independent variables and the dependent variable. The problem with your
analysis is not that the previous number of heart attacks is 'related' to
the risk of having another heart attack. Your problem is more likely to
result from the fact that your independent variables may be 'related', in
particular the number of previous heart attacks might be highly correlated
with age.
A good book on this subject is "Applied logistic regression" by Hosmer and
Lemeshow.
regards Patrick
From: [log in to unmask]
Simon I would have thought you expect the other variables like age to be
related
to the outcome variable otherwise you will not be able to fit a model that
includes them. I think what your saying is that the number of strokes are in
some way the same measurement or a function thereof. I would say you are
alright to proceed, although you might find the number of previous strokes
makes a large contribution to the explained variance. You could also
consider fitting separate models for no previous,
1 previous, more than 2 previous strokes ... this might be more informative.
Just my two cents worth. Dave Collins (Univ of Reading) has a good book on
logistic regression, I think it is called Modelling Binary Data.
From: [log in to unmask]
In response to your allstat question I don't see why there should be a
problem with the explanatory and response variables being related. If they
weren't 'related' somehow, why would you use them as explanatory factors
????
Also I suspect that you will have very few subjects having 2, 3, 4 previous
heart attacks, so might want to categorise the variable into previous heart
attacks yes/no, or failing that none, one, more than one.
From: Southworth Harry H [log in to unmask]
There are various models in existence that do this sort of thing.
Have a look at Hingorani, AD et al, BMJ, Vol 318 (1999).
They use "logistic regression", but I think it's a proportional odds
model that they use.
I've modelled the EAS risk tables (Wood, D et al, European Heart
Journal, 19, 1434-1503 (1998) using a proportional odds model,
and it works rather well.
There are several models which are referred to as "The
Framingham Equation" which also model risk. See Wilson, WF
et al, Circulation, 97, 1837-1847 (1998), and Anderson, KM et
al, Circulation, 83, 356-362 (1991).
I suggest that in the logistic regression model you try taking
logs of all explanatory variables and see if this improves the fit.
You might also find that log(age)*log(age) is significant, as
well as the interaction between log(SBP) and log(age).
I have no experience of using number of previous heart attacks
as a predictor. People who have already had heart attacks, or
who have established heart disease or a family history of
heart disease are automatically assumed by medics to be at high
risk of suffering a future coronary event. I have not seen any
statistical evidence to back this up.
Another source of info you might like to look at will be the National
Cholesterol Education Programme (NCEP - I think that's what it
stands for). You should find them on the internet quite easily. I
don't have a proper reference for you.
From: Rita Campos [log in to unmask]
Dear Simon,
Your research seems to me to be ideally suited for some multi-level
modelling. Try checking out the www page of the Institute for education
(www.ioe.ac.uk).
The adavantage of multilevel modelling for logistic regression, is that you
can control for variables that may be multicollinear with your dependent
variable (i.e. prior heart attack and subsequent probability of another).
Hence, at level 1 you can control for the patients prior state of health.
Although the work that I refer to at the IoE is not medical, it
nevertheless encounters the same related methodological obstacles (e.g. a
child's prior attainment at age 13 and their GCSE results.)
I did my master's dissertation on multilevel modelling and standard
logistic regression models for education, to compare the accurary of the
estimates, if you are interested I can send you a copy.
______________________________________________________
Get Your Private, Free Email at http://www.hotmail.com
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|