I agree that the example I chose would be extremely uncommon.
My purpose is to point out that tests divide intitial populations into
sub-sets and that the entropy tends to go up in test positive and down in
test negative within those sub-sets. The argument is easier to follow in an
idealised situation. In the real World the test for inclusion would probably
perform much less well than my example.
What determines the change in entropy (and thus the information gain) is the
cut-off used the intitial probabilities in the unselected population and the
distribution characteristics.
Since, in the early steps of the medical decision tree, probability of any
target condition is often low one learns much more about the excluded group
than the included group. Only when one reaches probabilities of 0.5 or
greater can a test of inclusion be guaranteed to decrease entropy in the
targeted group.
The practical situation (but not the maths) gets more complex if one
considers mutliple conditions. Since entropy is the negative of the sum of
plogp one would need to know the distribution charactersitics for all
target and control groups. One can, however, make use of the concept of two
populations - those with and those without the target condition. Even if the
first were Guassian (unusual in any diseased state), the latter almost
certainly will not be. However, dividing the population in that way makes
examination of any newly proposed assay easier. No longer is one interested
in a physiologically normal control group - just the population that has not
got the condition - irrespective of what they have.
May I add one remark about Bayes? The principle that probability of (A and
B) is the product of the two individual probabilities requires, of course,
that A and B are independent. When one introduces a new analyte it is on the
basis that the target condition and the analyte are in some way causally
related. Thus any second proposed analyte will also be, in proposition at
least, related to the first and not independent. Thus when one tries to
devise rules to define sub-sets using the AND construct one must either use
the second analyte within the sub-set defined by the first, redefine (A and
B) as a new 'analyte' or use the distribution characteristics of the two
analytes to approximate to boundaries for the new sub-set.
Trevor Tickner,
Norwich
> -----Original Message-----
> From: Sten Öhman [SMTP:[log in to unmask]]
> Sent: 22 November 2001 06:34
> To: [log in to unmask]
> Subject: Decisions, decisions
>
> At 2001-11-21 13:30 -0500, Roger Bertholf wrote:
> >Trevor makes an indisputable statistical argument leading to the
> conclusion
> >that sensitivity and specificity of clinical assays are limited by a
> >Heisenberg-like uncertainty principle--one can be optimized only at the
> >expense of the other.
>
> I don't think he thought about the Heisenberg uncertainty principle, but
> his example is, from a physiological view, wrong anyhow. Real tests
> seldomly follows a Gaussian distribution. Especially for values where the
> reference population averages 100 units and its SD is 25 units (Yes, SD
> can
> always be calculated, but this does not necessarily mean that the
> distribution is Gaussian).
>
> Furthermore he assumes that the target population has a _lower_ SD than
> the
> reference population. This should be a very rare occasion. The target
> population consists of both individuals in the beginning of a pathological
>
> process and heavily diseased persons where the values usually deviate way
> off those of non-diseased persons.
>
> >I would just add that realistic estimates should be based on
> >non-parametric statistics, which inevitably require more data than
> >parametric methods to reach an acceptable level of confidence.
>
> I agree! This means that given a certain analytical result there is a
> certain probability of being non-diseased and another probability of
> having
> disease A, B, C... The more "pathological" value, the lower probability of
>
> being non-diseased and higher probability of being diseased.
>
> By tradition we only use one or two decision points, i e the reference
> limit(s). Theoretically an unlimited number of decision points can be
> established, each of which having a certain sensitivity and specificity.
>
> To get the predicivities also the Baysean approach must be considered and
> this aspect is an important part of any method evaluation.
>
> Best wishes
> Mr Sten Öhman, PhD
>
>
---------------------------------------------------------------------------
The information contained in this e-mail is confidential and is intended
only for the named recipient(s). If you are not the intended recipient you
must not copy, distribute, or take any action or reliance on it. If you have
received this e-mail in error, please notify the sender. Any unauthorised
disclosure of the information contained in this e-mail is strictly
prohibited.
|