Print

Print


Two pairs of proportions are commonly used to characterise the performance of a screening or diagnostic test.

Sensitivity and specificity are conditioned on the TRUE state of the individual - diseased or healthy.

Positive and negative predictive values (PPV and NPV) are conditioned on the OBSERVED, APPARENT state of the individual.

In this context, it is normally assumed that sensitivity and specificity relate to the inherent performance of the test. If a test doesn't involve any subjective assessment by an observer, we would normally expect that these characteristics wouldn't be particularly influenced by whether the condition was common or rare in the population under consideration, and hence can be transferred to a different context.

The PPV and NPV relate to the application of the test to a PARTICULAR POPULATION, and depend HEAVILY on the prevalence. If the sample we use to form the 2 by 2 table is representative of the population of interest, then the observed PPV and NPV are the relevant ones to report.

However, we are often interested in how a particular test will behave on a quite DIFFERENT population. Typically, the original 2 by 2 table comes from a series of patients in secondary care; we are interested to project how the test is likely to perform when applied to a series of patients in primary care, or that is targeted in a population screening programme. This series will have a MUCH LOWER PREVALENCE than the series on which the 2 by 2 table is based.

The Excel spreadsheet PPVNPV.xls, currently available from http://profrobertnewcomberesources.yolasite.com/, reads in a 2 by 2 table and calculates sensitivity and specificity, with confidence intervals by the Wilson method (J Am Stat Assoc 1927, 22, 209-212). It displays the projected PPV and 1-NPV, for when these results are applied to a DIFFERENT population with any prevalence between 0 and 1, flanked by corresponding confidence intervals.

Re your final paragraph - I can see where they're coming from, but it's a pretty useless thing to say when the real solution is quite straightforward as above - certainly not 'rocket science'. If there is a weakness, it is in the assumption that sensitivity and specificity are constant across populations and contexts. There are also problems with the inherently circularity of trying to define a 'gold standard' for assessing the true status of any individual. 

Robert Newcombe
Cardiff


-----Original Message-----
From: A UK-based worldwide e-mail broadcast system mailing list [mailto:[log in to unmask]] On Behalf Of Kim Pearce
Sent: 08 August 2018 12:47
To: [log in to unmask]
Subject: Negative and Positive Predictive Values derivation - your views

Hello everyone,



I wonder if you have any views on the following?



We shall consider the following decision matrix which crosstabulates positive and negative diagnostic test results and positive and negative observed results:






Observed


Predicted



Event (+)

Non Event (-)

Event (+)

a

b

Non Event (-)

c

d












We define:



Total = n = a+b+c+d

P=Prevalence = (a+c)/n

Sensitivity = a/(a+c)

Specificity = d/(b+d)



We also define



PPV=a/(a+b)                                                                                                                                      (1)

NPV = d/(c+d)                                                                                                                                   (2)



PPV can also be defined:



(Sensitivity x P) / ((Sensitivity x P) + ((1-Specificity)x(1-P)))                                           (3)



NPV can also be defined



(Specificityx(1-P))/( (Specificity x (1-P)) + (P x(1-Sensitivity)))                                      (4)



Now if P in (3) and (4) is (a+c)/n , then (3) and (4) are equivalent to (1) and (2) respectively.



However, what if the real prevalence of the disease (in the population) does not equal (a+c)/n ? i.e. if the value of (a+c)/n  in our *study* does not equal  the real pre-test probability (of the disease).  Am I correct in thinking that we should always use formulas (3) and (4) to calculate PPV and NPV and input P as the real (population) prevalence of the disease (from literature) (rather than assuming that (a+c)/n is an accurate estimate of  the real population prevalence of the disease)?



In one document I read that "NPV and PPV should only be used if the ratio of the number of patients in the disease group and the number of patients in the healthy control group used to establish the NPV and PPV is equivalent to the prevalence of the diseases in the studied population, or, in case two disease groups are compared, if the ratio of the number of patients in disease group 1 and the number of patients in disease group 2 is equivalent to the ratio of the prevalences of the two diseases studied".  I am not clear as to the reasoning behind this and I'd appreciate any views.



Many thanks, in advance, for your opinion on these issues.



All the best,

Kim




Dr Kim Pearce PhD, CStat, Fellow HEA
Senior Statistician
Faculty of Medical Sciences Graduate School Room 3.14 3rd Floor Ridley Building 1 Newcastle University Queen Victoria Road Newcastle Upon Tyne
NE1 7RU

Tel: (0044) (0)191 208 8142



You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.