Dear All,
The following is a summary of the responses to my problem on adding
an extra variable into a prognostic model (the original message is
appended to the end of this one).
Thanks to Dietrich Alte, Tim Auton, Ian Bradbury, Tim Cole, Simon
Day, Darren Greenwood, Jane Hutton, Anna Jones, Nick Longford, Jim
Slattery, Ray Thomas for responding.
Method 2 got the most votes. (Add all the original variables into the
model using the new data set, and let their parameter estimates vary.
Then add carotid stenosis in). But there were a lot of other points
raised, which I have summarised below.
AMMUNITION:
Dietrich Alte said that in his experience coefficients can vary
considerably between models once you leave one or more parameters out
or add any. He said that somehow I need to convince the clinician
that his 'sacred numbers' are only nice little statistical estimates
of some non-existant "true model".
Tim Auton said that prognostic equations are interesting and useful,
but more often interesting than useful. "As with any such equation,
based on observational data, its predictive power will be limited
because survival is affected by a whole host of unobserved factors as
well as those included in the model, or in a dataset. There is every
chance that the best values of model coefficients will drift with
time and location due to changes in these unobserved factors. There
will probably be changes in standard care, for example".
Ian Bradbury said that he would be reluctant to fiddle with a
prognostic model developed properly, on a set of quality data, but
otherwise wouldn't view a model as sacred. He suggested Brian
Ripley's book on Pattern recognition and neural nets as a source of a
clinican friendly explanation of overfitting.
Tim Cole said that methods 3 and 4 (see original message below) were
definitely wrong, as the introduction of carotid stenosis will impact
on the other independent variables, and the inter-correlations will
alter their coefficients differentially.
Simon Day said that I "might try discussing the idea of 'training
sets' and 'validation sets' that we all *should* use when doing some
sort of variable selection but, of course, most of us don't (we often
don't have enough data). But those sorts of discussions might help
dispell the sacred nature of the former model.
Darren Greenwood suggested that the parameters of the original model
should not be considered sacred for several reasons. The original
model may not have been constructed carefully, or objectively (Tim
Cole also mentioned this). The populations may differ due to changes
over time, or some other reason. A new model would be up to date and
relevant to the local area.
Jane Hutton suugested some papers descibing a prognostic model that
she was involved with: Smith DF, Hutton JL, et al. "The prognosis of
primary intracerebral tumours presenting with epilepsy: the outcome
of medical and surgical management" J.Neurol,Neurosurg Psych
1991:54;915-920.
Hutton JL, Smith DF et al "Development of a prognostic index for use
in a trial of medical and surgical management of primary
intracellular tumours" J.Neurol,Neurosurg Psych 1992:55;271-274.
Hutton JL, Smith DF et al "Prospective evaluation of a prognostic
index for intrinsic supratentorial tumours" J.Neurol,Neurosurg Psych
1995:59;92-94. (+correspondance). She also suggested that if the new
parameter estimates were not inconsistant with the old model (by
looking at the standard errors) then this could be explained to the
clinician.
Ray Thomas gave an example: "You will recall that there were big
revisions to the Earnings Index last year. A lot of the fuss about
this came at the stage when the ONS revised the back series in the
light of the improved sample. This rewriting of history upset a
number of 'power users' i.e. organisations who used the index in
their models. Retrospective revision meant that they had to
recalibrate their models. Tim Holt gave a grovelling apology for the
original error to these 'power users' (his phrase). Unecessarily
grovelling I thought! You must appreciate of course that this
interpretation is individual and it cannot be assumed that it would
be accepted by the ONS or the RSS! But the situation does seem
analogous and I hope this way of looking at things helps."
SUGGESTIONS FOR MODELLING:
Tim Auton said "In your new dataset, you have observed a clear
(strong?) association between survival and carotid stenosis. It is
possible that some of the variables in the old model were acting as
surrogates for the unrecorded variable: carotid stenosis. This may
explain in part why the model coefficients change when you include
CS. The fact that they change when going from the old datset to the
new one - keeping the same variables - indicates a difference between
these two sets of multivariate data. How strong is the evidence that
the improvement in fit caused by changing the parameter variables is
more than would be expected by chance? If you find clear differences
between the 2 datasets, you might like to find ways to show how they
are different. You could, for example, compare the log survival
predictions of the old and new models and see where the largest
differences occur. You should also try to compare the 'footprint' of
the two datasets, ie their projection onto the set of predictor
variables. How well can you predict CS using the old set of
variables?"
Ian Bradbury said that in SAS or S/R you could use the 'offset'
feature to force the linear predictor to have a coefficient of 1.
Tim Cole said that the relative size and quality of the two data sets
should be considered.
Darren Greenwood suggested that stepwise selction procedures are
dodgy for survival analyses, and that if an expert thinks terms
should be in the model, then they should go in.
Jane Hutton suggested looking up papers by Doug Altman or Patrick
Royston.
Anna Jones suggested a two-stage model. The first model can be the
original model with the original parameter estimates. And the second
model can be just a model with carotid stenosis in. This way you can
preserve the original parameter estimates and see if carotid stenosis
adds any predictive value.
Nick Longford said the following - "A reformulation of the problem:
There is an old analysis (without carotid), which yields biased
prognosis. This analysis (probably) has a large sample size (small
st. errors). There is a new analysis (with carotid), which yields an
unbiased prognosis. This analysis has a smaller sample size (greater
st. errors). Either prognosis is based on a fitted model, with
estimated sampling variation. How to combined the two prognoses, so
as to benefit from the large sample size in the old analysis and the
better model in the new analysis. Answer: combine the two estimates
(prognoses) so as to minimize the mean squared error (or another
criterion). If the clinician regards the old formula as sacred then,
in the new analysis, instead of carotid use its orthogonal projection
on the other covariates. Example of orthogonal projection: In y = a
+ bx + e, subtract the mean of x: use x - x-bar instead of x. In
this way, you obtain a `correction' to the established model, and the
pretense of sacro(sanctity) of the old model is maintained (sort of).
Reference: (Indirect) NTL, Multivariate shrinkage estimation of ...,
JRSS B, 1999 (More direct) NTL, Synthetic estimation with moderating
influence. Statistics in Medicine, submitted." I failed to mention
in my first message that the new dataset is 4 times as big as the old
one. Nick said the following about this - "If the new dataset is 4
times bigger than the old one, then there is a serious problem with
the clinician: Consistency is preferred to precision (and donkey to
diesel). I guess that the synthesis (previous message) would almost
discard the old analysis, because the new dataset contains so much
more information."
Jim Slattery suggested combining both datasets to get new
coefficients for all the original variables, and using these in
conjuction with one of the suggested strategies to add stenosis
into the second dataset. He also suggested that it might be useful
to write a prognostic computer so that a doctor could estimate the
survival of a patient using whatever subset of prognostic indicators
that he/she had (many doctors won't know the level of carotid
stenosis as they won't have access to the appropriate technology).
---------------------------------------
Original message:-
Dear All,
I am currently working on a prognostic model to predict survival
after transient ischaemic attack and minor stroke, and I have a
problem that I don't know the answer to. The problem is this: A
prognostic model was created a while ago. One of the clinicians
here wanted to know whether the degree of carotid stenosis would
improve the model.
However, carotid stenosis wasn't measured in the original data set.
We have a new data set, with all the original prognostic variables
measured in it, and also carotid stenosis. Whatever method I use,
carotid stenosis does improve the model, so now I have to go back to
the clinician with a new 'improved' model. There seem to be various
methods of tackling this. 1. Completely redo the whole model using a
whole new set of variables (some of the original variables are not
related to prognosis in our data set). 2. Add all the original
variables into the model using the new data set, and let their
parameter estimates vary. Then add carotid stenosis in. 3. Add the
original variables in the form of the linear predictor from the
original model. Then add carotid stenosis in. The parameter
estimate for the linear predictor is nowhere near one, so although
the parameter estimates of the original variables are being kept
constant relative to one another, their actual magnitudes change
considerably. 4. Add the original variables in the form of the linear
predictor from the original model, and force its parameter estimate
to be equal to 1 (I don't know how to do this). Then add carotid
stenosis in. 5. Do something else that I haven't thought of (any
suggestions welcome). I would be happiest doing either 1), or 2), but
the clinician seems to think that the parameter estimates in the
original model are somehow sacred and not to be changed. I'd
appreciate any views, ammunition, references, etc. I'll post a
summary.
---------------------------------------------------
Dr. Stephanie C. Lewis
Medical Statistician
Bramwell Dott Building
Department of Clinical Neurosciences
Western General Hospital
Crewe Road
EDINBURGH Tel: +44 (0) 131 537 2932
EH4 2XU Fax: +44 (0) 131 332 5150
UK Email: [log in to unmask]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|