JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for ALLSTAT Archives


ALLSTAT Archives

ALLSTAT Archives


allstat@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ALLSTAT Home

ALLSTAT Home

ALLSTAT  2000

ALLSTAT 2000

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Summary: Adding extra variables into prognostic models

From:

"Steff Lewis" <[log in to unmask]>

Reply-To:

Steff Lewis

Date:

Mon, 17 Jul 2000 09:07:49 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (205 lines)

Dear All,

The following is a summary of the responses to my problem on adding 
an extra variable into a prognostic model (the original message is 
appended to the end of this one).

Thanks to Dietrich Alte, Tim Auton, Ian Bradbury, Tim Cole, Simon 
Day, Darren Greenwood, Jane Hutton, Anna Jones, Nick Longford, Jim 
Slattery, Ray Thomas for responding.

Method 2 got the most votes. (Add all the original variables into the 
model using the new data set, and let their parameter estimates vary. 
Then add carotid stenosis in).  But there were a lot of other points 
raised, which I have summarised below. 

AMMUNITION:
Dietrich Alte said that in his experience coefficients can vary 
considerably between models once you leave one or more parameters out 
or add any.  He said that somehow I need to convince the clinician 
that his 'sacred numbers' are only nice little statistical estimates 
of some non-existant "true model".

Tim Auton said that prognostic equations are interesting and useful, 
but more often interesting than useful. "As with any such equation, 
based on observational data, its predictive power will be limited 
because survival is affected by a whole host of unobserved factors as 
well as those included in the model, or in a dataset. There is every 
chance that the best values of model coefficients will drift with 
time and location due to changes in these unobserved factors. There 
will probably be changes in standard care, for example".

Ian Bradbury said that he would be reluctant to fiddle with a 
prognostic model developed properly, on a set of quality data, but 
otherwise wouldn't view a model as sacred.  He suggested Brian 
Ripley's book on Pattern recognition and neural nets as a source of a 
clinican friendly explanation of overfitting.

Tim Cole said that methods 3 and 4 (see original message below) were 
definitely wrong, as the introduction of carotid stenosis will impact 
on the other independent variables, and the inter-correlations will 
alter their coefficients differentially.

Simon Day said that I "might try discussing the idea of 'training 
sets' and 'validation sets' that we all *should* use when doing some 
sort of variable selection but, of course, most of us don't (we often 
don't have enough data). But those sorts of discussions might help 
dispell the sacred nature of the former model.

Darren Greenwood suggested that the parameters of the original model 
should not be considered sacred for several reasons.  The original 
model may not have been constructed carefully, or objectively (Tim 
Cole also mentioned this).  The populations may differ due to changes 
over time, or some other reason.  A new model would be up to date and 
relevant to the local area.

Jane Hutton suugested some papers descibing a prognostic model that 
she was involved with: Smith DF, Hutton JL, et al. "The prognosis of 
primary intracerebral tumours presenting with epilepsy: the outcome 
of medical and surgical management" J.Neurol,Neurosurg Psych 
1991:54;915-920.
Hutton JL, Smith DF et al "Development of a prognostic index for use 
in a trial of medical and surgical management of primary 
intracellular tumours" J.Neurol,Neurosurg Psych 1992:55;271-274.
Hutton JL, Smith DF et al "Prospective evaluation of a  prognostic 
index for intrinsic supratentorial tumours" J.Neurol,Neurosurg Psych 
1995:59;92-94. (+correspondance).  She also suggested that if the new 
parameter estimates were not inconsistant with the old model (by 
looking at the standard errors) then this could be explained to the 
clinician.

Ray Thomas gave an example: "You will recall that there were big 
revisions to the Earnings Index last year. A lot of the fuss about 
this came at the stage when the ONS revised the back series in the 
light of the improved sample. This rewriting of history upset a 
number of 'power users' i.e. organisations who used the index in 
their models. Retrospective revision meant that they had to 
recalibrate their models. Tim Holt gave a grovelling apology for the 
original error to these 'power users' (his phrase). Unecessarily 
grovelling I thought! You must appreciate of course that this 
interpretation is individual and it cannot be assumed that it would 
be accepted by the ONS or the RSS! But the situation does seem 
analogous and I hope this way of looking at things helps."

SUGGESTIONS FOR MODELLING:
Tim Auton said "In your new dataset, you have observed a clear 
(strong?) association between survival and carotid stenosis. It is 
possible that some of the variables in the old model were acting as 
surrogates for the unrecorded variable: carotid stenosis. This may 
explain in part why the model coefficients change when you include 
CS. The fact that they change when going from the old datset to the 
new one - keeping the same variables - indicates a difference between 
these two sets of multivariate data. How strong is the evidence that 
the improvement in fit caused by changing the parameter variables is 
more than would be expected by chance? If you find clear differences 
between the 2 datasets, you might like to find ways to show how they 
are different. You could, for example, compare the log survival 
predictions of the old and new models and see where the largest 
differences occur. You should also try to compare the 'footprint' of 
the two datasets, ie their projection onto the set of predictor 
variables. How well can you predict CS using the old set of 
variables?"

Ian Bradbury said that in SAS or S/R you could use the 'offset' 
feature to force the linear predictor to have a coefficient of 1.

Tim Cole said that the relative size and quality of the two data sets 
should be considered.

Darren Greenwood suggested that stepwise selction procedures are 
dodgy for survival analyses, and that if an expert thinks terms 
should be in the model, then they should go in.

Jane Hutton suggested looking up papers by Doug Altman or Patrick 
Royston.

Anna Jones suggested a two-stage model.  The first model can be the 
original model with the original parameter estimates. And the second 
model can be just a model with carotid stenosis in. This way you can 
preserve the original parameter estimates and see if carotid stenosis
adds any predictive value.

Nick Longford said the following - "A reformulation of the problem: 
There is an old analysis (without carotid), which yields biased 
prognosis. This analysis (probably) has a large sample size (small 
st. errors). There is a new analysis (with carotid), which yields an 
unbiased prognosis. This analysis has a smaller sample size (greater 
st. errors). Either prognosis is based on a fitted model, with 
estimated sampling variation.  How to combined the two prognoses, so 
as to benefit from the large sample size in the old analysis and the 
better model in the new analysis.  Answer: combine the two estimates 
(prognoses) so as to minimize the mean squared error (or another 
criterion).  If the clinician regards the old formula as sacred then, 
in the new analysis, instead of carotid use its orthogonal projection 
on the other covariates. Example of orthogonal projection: In  y = a 
+ bx + e, subtract the mean of x:  use  x - x-bar instead of x. In 
this way, you obtain a `correction' to the established model, and the 
pretense of sacro(sanctity) of the old model is maintained (sort of). 
Reference:  (Indirect) NTL, Multivariate shrinkage estimation of ..., 
JRSS B, 1999 (More direct)  NTL, Synthetic estimation with moderating 
influence.  Statistics in Medicine, submitted."  I failed to mention 
in my first message that the new dataset is 4 times as big as the old 
one.  Nick said the following about this - "If the new dataset is 4 
times bigger than the old one, then there is a serious problem with 
the clinician:  Consistency is preferred to precision (and donkey to 
diesel). I guess that the synthesis (previous message) would almost 
discard the old analysis, because the new dataset contains so much 
more information." 

Jim Slattery suggested combining both datasets to get new 
coefficients for all the original variables, and using these in 
conjuction with one of the suggested strategies to add stenosis
into the second dataset.  He also suggested that it might be useful 
to write a prognostic computer so that a doctor could estimate the 
survival of a patient using whatever subset of prognostic indicators 
that he/she had (many doctors won't know the level of carotid 
stenosis as they won't have access to the appropriate technology).
 
---------------------------------------
Original message:-
Dear All,
I am currently working on a prognostic model to predict survival 
after transient ischaemic attack and minor stroke, and I have a 
problem that I don't know the answer to. The problem is this:  A 
prognostic model  was created a while ago.  One of the clinicians 
here wanted to know  whether the degree of carotid stenosis would 
improve the model. 
However, carotid stenosis wasn't measured in the original data set. 
We have a new data set, with all the original prognostic variables  
measured in it, and also carotid stenosis.  Whatever method I use,  
carotid stenosis does improve the model, so now I have to go back to  
the clinician with a new 'improved' model.  There seem to be various  
methods of tackling this. 1. Completely redo the whole model using a
whole new set of variables (some of the original variables are not
related to prognosis in our data set). 2. Add all the original 
variables into the model using the new data set, and let their  
parameter estimates vary.  Then add carotid stenosis in. 3. Add the
original variables in the form of the linear predictor from the 
original model.  Then add carotid stenosis in.  The parameter 
estimate for the linear predictor is nowhere near one, so although 
the parameter estimates of the original variables are being kept 
constant relative to one another, their actual magnitudes change 
considerably. 4. Add the original variables in the form of the linear 
predictor from the original model, and force its parameter estimate 
to be equal to 1 (I don't know how to do this).  Then add carotid 
stenosis in. 5. Do something else that I haven't thought of (any 
suggestions welcome). I would be happiest doing either 1), or 2), but 
the clinician seems to think that the parameter estimates in the 
original model are somehow sacred and not to be changed. I'd 
appreciate any views, ammunition, references, etc.  I'll post a 
summary.

---------------------------------------------------
Dr. Stephanie C. Lewis  
Medical Statistician         
Bramwell Dott Building
Department of Clinical Neurosciences
Western General Hospital
Crewe Road
EDINBURGH         Tel: +44 (0) 131 537 2932
EH4 2XU           Fax: +44 (0) 131 332 5150
UK              Email: [log in to unmask]


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager