Many thanks for the replies which I received regarding
my recent query. I have posted the replies, along with the
query itself, in case anyone else might find the information
useful.
-----( Original query )-----
I am familiar with traditional diagnostic indices
(e.g. sensitivity and specificity) for quantifying
the efficacy of a particular test when the diagnosis
is based around TWO categories/groups (e.g. benign
versus malignant tumours).
My question is, are there any similar, established
methods/indices for quantifying diagnostic efficacy
when there are MULTIPLE categories/groups (e.g.
malignant tumours, totally benign tumours and
benign tumours regarded clinically as having an
increased chance of progressing into malignant disease
because of co-existant pathology)?
-----( Reply 1 )-----
One possibility would be to look at the agreement between your 'severity
grade' and an agreed gold standard (say, the grade as assigned by a
qualified histopathologist). One measure of this is the Kappa Statistic. You
will find a worked example on pp 116-8 of D. Altman's Statistics with
confidence. Pages 303-9 of Practical Statistics for Medical Research by the
same author is also worth consulting.
While you could get a confidence interval for the kappa for your scale as a
whole, there wouldn't be anything strictly comparable to specificity because
there is no unique way of not falling into a given category.
-----( Reply 2 )-----
I don't know of any references, but my approach would be to
follow a similar approach:
regrard the multiple categories as a series of binary categories
and do sens/spec etc. for each. ("etc." includes PVP, PVN, and
possibly ROC curves.)
For an ordered scale (as above) you coud use ordered categories
malignant vs (benign, pathology or benign, no pathology)
& (malignant or benign, pathology) vs benign, no pathology.
You can use logistic regression, & superimpose the ROC curves.
It is all more complex than 2 categories, but that reflects a complex reality.
For unordered categories (say no disease, cancer, heart disease ...)
you might use use cance (Y/N), heart disease (Y/N) etc.
-----( Reply 3 )-----
An established measure is the Likelihood Ratio (LR). You may use the LR+
and LR-
for each cutoff point, or you can use multilevel likelihood ratio which is
the
LR for each outcome of the test. The latter is not so much used.
Centor RM. Estimating confidence intervals of likelihood ratios. Med Decis
Making 1992;12:229-33.
----------------------
----------------------------------------------------
David Manton, Ph.D. (Medical Physics)
YCR Centre for MR Investigations,
Hull Royal Infirmary, Anlaby Road, Hull, UK. HU3 2JZ
mailto:[log in to unmask]
----------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|