JISCMail - ALLSTAT Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
ALLSTAT Archives

allstat@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		ALLSTAT Home
		ALLSTAT 2000
Options

Subscribe or Unsubscribe
Get Password
Subject:
Summary: Adding extra variables into prognostic models
From:
"Steff Lewis" <[log in to unmask]>
Reply-To:
Steff Lewis
Date:
Mon, 17 Jul 2000 09:07:49 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (205 lines)
Dear All,

The following is a summary of the responses to my problem on adding 
an extra variable into a prognostic model (the original message is 
appended to the end of this one).

Thanks to Dietrich Alte, Tim Auton, Ian Bradbury, Tim Cole, Simon 
Day, Darren Greenwood, Jane Hutton, Anna Jones, Nick Longford, Jim 
Slattery, Ray Thomas for responding.

Method 2 got the most votes. (Add all the original variables into the 
model using the new data set, and let their parameter estimates vary. 
Then add carotid stenosis in).  But there were a lot of other points 
raised, which I have summarised below. 

AMMUNITION:
Dietrich Alte said that in his experience coefficients can vary 
considerably between models once you leave one or more parameters out 
or add any.  He said that somehow I need to convince the clinician 
that his 'sacred numbers' are only nice little statistical estimates 
of some non-existant "true model".

Tim Auton said that prognostic equations are interesting and useful, 
but more often interesting than useful. "As with any such equation, 
based on observational data, its predictive power will be limited 
because survival is affected by a whole host of unobserved factors as 
well as those included in the model, or in a dataset. There is every 
chance that the best values of model coefficients will drift with 
time and location due to changes in these unobserved factors. There 
will probably be changes in standard care, for example".

Ian Bradbury said that he would be reluctant to fiddle with a 
prognostic model developed properly, on a set of quality data, but 
otherwise wouldn't view a model as sacred.  He suggested Brian 
Ripley's book on Pattern recognition and neural nets as a source of a 
clinican friendly explanation of overfitting.

Tim Cole said that methods 3 and 4 (see original message below) were 
definitely wrong, as the introduction of carotid stenosis will impact 
on the other independent variables, and the inter-correlations will 
alter their coefficients differentially.

Simon Day said that I "might try discussing the idea of 'training 
sets' and 'validation sets' that we all *should* use when doing some 
sort of variable selection but, of course, most of us don't (we often 
don't have enough data). But those sorts of discussions might help 
dispell the sacred nature of the former model.

Darren Greenwood suggested that the parameters of the original model 
should not be considered sacred for several reasons.  The original 
model may not have been constructed carefully, or objectively (Tim 
Cole also mentioned this).  The populations may differ due to changes 
over time, or some other reason.  A new model would be up to date and 
relevant to the local area.

Jane Hutton suugested some papers descibing a prognostic model that 
she was involved with: Smith DF, Hutton JL, et al. "The prognosis of 
primary intracerebral tumours presenting with epilepsy: the outcome 
of medical and surgical management" J.Neurol,Neurosurg Psych 
1991:54;915-920.
Hutton JL, Smith DF et al "Development of a prognostic index for use 
in a trial of medical and surgical management of primary 
intracellular tumours" J.Neurol,Neurosurg Psych 1992:55;271-274.
Hutton JL, Smith DF et al "Prospective evaluation of a  prognostic 
index for intrinsic supratentorial tumours" J.Neurol,Neurosurg Psych 
1995:59;92-94. (+correspondance).  She also suggested that if the new 
parameter estimates were not inconsistant with the old model (by 
looking at the standard errors) then this could be explained to the 
clinician.

Ray Thomas gave an example: "You will recall that there were big 
revisions to the Earnings Index last year. A lot of the fuss about 
this came at the stage when the ONS revised the back series in the 
light of the improved sample. This rewriting of history upset a 
number of 'power users' i.e. organisations who used the index in 
their models. Retrospective revision meant that they had to 
recalibrate their models. Tim Holt gave a grovelling apology for the 
original error to these 'power users' (his phrase). Unecessarily 
grovelling I thought! You must appreciate of course that this 
interpretation is individual and it cannot be assumed that it would 
be accepted by the ONS or the RSS! But the situation does seem 
analogous and I hope this way of looking at things helps."

SUGGESTIONS FOR MODELLING:
Tim Auton said "In your new dataset, you have observed a clear 
(strong?) association between survival and carotid stenosis. It is 
possible that some of the variables in the old model were acting as 
surrogates for the unrecorded variable: carotid stenosis. This may 
explain in part why the model coefficients change when you include 
CS. The fact that they change when going from the old datset to the 
new one - keeping the same variables - indicates a difference between 
these two sets of multivariate data. How strong is the evidence that 
the improvement in fit caused by changing the parameter variables is 
more than would be expected by chance? If you find clear differences 
between the 2 datasets, you might like to find ways to show how they 
are different. You could, for example, compare the log survival 
predictions of the old and new models and see where the largest 
differences occur. You should also try to compare the 'footprint' of 
the two datasets, ie their projection onto the set of predictor 
variables. How well can you predict CS using the old set of 
variables?"

Ian Bradbury said that in SAS or S/R you could use the 'offset' 
feature to force the linear predictor to have a coefficient of 1.

Tim Cole said that the relative size and quality of the two data sets 
should be considered.

Darren Greenwood suggested that stepwise selction procedures are 
dodgy for survival analyses, and that if an expert thinks terms 
should be in the model, then they should go in.

Jane Hutton suggested looking up papers by Doug Altman or Patrick 
Royston.

Anna Jones suggested a two-stage model.  The first model can be the 
original model with the original parameter estimates. And the second 
model can be just a model with carotid stenosis in. This way you can 
preserve the original parameter estimates and see if carotid stenosis
adds any predictive value.

Nick Longford said the following - "A reformulation of the problem: 
There is an old analysis (without carotid), which yields biased 
prognosis. This analysis (probably) has a large sample size (small 
st. errors). There is a new analysis (with carotid), which yields an 
unbiased prognosis. This analysis has a smaller sample size (greater 
st. errors). Either prognosis is based on a fitted model, with 
estimated sampling variation.  How to combined the two prognoses, so 
as to benefit from the large sample size in the old analysis and the 
better model in the new analysis.  Answer: combine the two estimates 
(prognoses) so as to minimize the mean squared error (or another 
criterion).  If the clinician regards the old formula as sacred then, 
in the new analysis, instead of carotid use its orthogonal projection 
on the other covariates. Example of orthogonal projection: In  y = a 
+ bx + e, subtract the mean of x:  use  x - x-bar instead of x. In 
this way, you obtain a `correction' to the established model, and the 
pretense of sacro(sanctity) of the old model is maintained (sort of). 
Reference:  (Indirect) NTL, Multivariate shrinkage estimation of ..., 
JRSS B, 1999 (More direct)  NTL, Synthetic estimation with moderating 
influence.  Statistics in Medicine, submitted."  I failed to mention 
in my first message that the new dataset is 4 times as big as the old 
one.  Nick said the following about this - "If the new dataset is 4 
times bigger than the old one, then there is a serious problem with 
the clinician:  Consistency is preferred to precision (and donkey to 
diesel). I guess that the synthesis (previous message) would almost 
discard the old analysis, because the new dataset contains so much 
more information." 

Jim Slattery suggested combining both datasets to get new 
coefficients for all the original variables, and using these in 
conjuction with one of the suggested strategies to add stenosis
into the second dataset.  He also suggested that it might be useful 
to write a prognostic computer so that a doctor could estimate the 
survival of a patient using whatever subset of prognostic indicators 
that he/she had (many doctors won't know the level of carotid 
stenosis as they won't have access to the appropriate technology).
 
---------------------------------------
Original message:-
Dear All,
I am currently working on a prognostic model to predict survival 
after transient ischaemic attack and minor stroke, and I have a 
problem that I don't know the answer to. The problem is this:  A 
prognostic model  was created a while ago.  One of the clinicians 
here wanted to know  whether the degree of carotid stenosis would 
improve the model. 
However, carotid stenosis wasn't measured in the original data set. 
We have a new data set, with all the original prognostic variables  
measured in it, and also carotid stenosis.  Whatever method I use,  
carotid stenosis does improve the model, so now I have to go back to  
the clinician with a new 'improved' model.  There seem to be various  
methods of tackling this. 1. Completely redo the whole model using a
whole new set of variables (some of the original variables are not
related to prognosis in our data set). 2. Add all the original 
variables into the model using the new data set, and let their  
parameter estimates vary.  Then add carotid stenosis in. 3. Add the
original variables in the form of the linear predictor from the 
original model.  Then add carotid stenosis in.  The parameter 
estimate for the linear predictor is nowhere near one, so although 
the parameter estimates of the original variables are being kept 
constant relative to one another, their actual magnitudes change 
considerably. 4. Add the original variables in the form of the linear 
predictor from the original model, and force its parameter estimate 
to be equal to 1 (I don't know how to do this).  Then add carotid 
stenosis in. 5. Do something else that I haven't thought of (any 
suggestions welcome). I would be happiest doing either 1), or 2), but 
the clinician seems to think that the parameter estimates in the 
original model are somehow sacred and not to be changed. I'd 
appreciate any views, ammunition, references, etc.  I'll post a 
summary.

---------------------------------------------------
Dr. Stephanie C. Lewis  
Medical Statistician         
Bramwell Dott Building
Department of Clinical Neurosciences
Western General Hospital
Crewe Road
EDINBURGH         Tel: +44 (0) 131 537 2932
EH4 2XU           Fax: +44 (0) 131 332 5150
UK              Email: [log in to unmask]


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options