Thank you for your interest in this discussion and pointing out the two papers.  In the Oxford Handbook of Clinical Diagnosis, I also explain by using mathematics why specificity and likelihood ratios inevitably change with prevalence.  I was particularly interested in your point that on top of the problems with identifying reasonable indices of diagnostic usefulness, there are also difficulties with applying such information based on groups of patients to a subsequent individual patient.  I hope that I will be forgiven for responding to this in some detail from the viewpoint of clinician with a special interest in mathematics.

The axioms of probability imply that the probability of an event in an individual (such as discovering a diagnostic criterion or a specified outcome) is equal to the proportion with that event in a set of all such individuals with the same conditional evidence.  So if there are 8 patients with the diagnostic criterion out of 9 who also have the conditional findings, then the probability for a member of that set is 8/9 = 0.889.  However, when a new 10th patient appears without a known diagnosis, the new proportion will be either 8/10 or 9/10.  The corresponding 'induced' probability will thus be 0.8 or 0.9.  In other words, if an 'induced' probability is based on little experience, the gap between the pair will be large, whereas for probabilities based on a large data set, the gap will approach 0 and the probability will be a single value.  Such a pair of values can be used to calculate the original proportions so that confidence limits etc. can also be calculated.

The problem is to estimate these proportions and probabilities for a group of patients identical to the one in front of us.  When I ask my colleagues and students on ward rounds to do this we come up with surprisingly similar answers, although this does not mean that we are right.  We guess them 'subjectively' as direct ‘predictive values’ from studies, word of mouth or personal experience.  I appreciate that many prefer to estimate such predictive values by combining subjective priors with observed likelihoods in a Bayesian manner.  What we do intuitively in addition however, is also to ‘learn from experience’ by adjusting our sense of certainty if our probabilities are over-confident or under-confident.  In other words, we calibrate them informally by checking how often a probability of ‘x or y’ is correct in a proportion between x and y times and if not, adjusting our guesses in future.  Perhaps this should be a part of formal medical audit.

This approach was described in my MD thesis in my younger days and currently in the Oxford Handbook of Clinical Diagnosis.  I also describe a new ‘theorem’ for estimating the proportion of times a diagnosis or other outcome is confirmed when reasoning by probabilistic elimination (which we do a lot when reasoning transparently or ‘out loud’ in wards and clinics).  This has many advantages over multiplying ‘prior odds’ by likelihood ratios based on sensitivity and ‘one minus the specificity’ (which I have never been aware of anyone doing when dealing with patients).  This approach is probably most helpful when setting dichotomising cut-off points for screening tests using ROC curves and when inventing new clinical prediction rules, which can then be calibrated too and used as tests to help predict diagnoses or other outcomes.

The diagnostic predictions made during screening and during differential diagnosis depend on having sensible diagnostic criteria based on the best combinations of findings that predict some benefit or no benefit.  These in turn are closely linked to treatment indication criteria and findings that provide the most accurate estimations of risk reduction or probability of benefit during RCTs.  These should be obtained from a more detailed analysis of clinical trials, which also should be planned accordingly.  This is also discussed in detail in the Oxford Handbook of Clinical Diagnosis.

Huw

Dr D E H Llewelyn MD FRCP

General physician and Endocrinologist

Hon Fellow in Mathematics

Aberystwyth University

Sent: 10 February 2015 13:16
Subject: Re: Genetic tests and Predictive validity

Thanks to you all for an interesting discussion.

Sensitivity, specificity, negative and positive predictive values are all group-based statistics

So are relative risks and risk differences, estimated in RCT of interventions….

As elsewhere in EBM, applying group-based statistics to the problems of an individual patient requires additional steps and assumptions, some of which are problematic.

Yes, sensitivity and specificity can (also) vary with prevalence.

Leeflang MM, Bossuyt PM, Irwig L.

Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis.

J Clin Epidemiol. 2009 Jan;62(1):5-12. doi: 10.1016/j.jclinepi.2008.04.007.

There is definitely “spin” in reporting test accuracy studies, as was noted in some contributions. Sometimes the primary outcome measure (as registered) changes to negative predictive value in the final publication. You can guess why…

Korevaar DA, Ochodo EA, Bossuyt PM, Hooft L.

Publication and reporting of test accuracy studies registered in ClinicalTrials.gov.

Clin Chem. 2014 Apr;60(4):651-9. doi: 10.1373/clinchem.2013.218149.

Patrick Bossuyt

AMC - University of Amsterdam

Sent: Monday, 9 February, 2015 23:17
Subject: Re: Genetic tests and Predictive validity

I agree that the terminology for diagnosis is ambiguous and probably confusing for those not immersed in its practical application day in day out. .

A diagnostic test in its broad sense is any test that leads to a diagnosis but also when deciding to treat (a form of diagnostic refinement) and also monitoring the outcome.

A symptom or physical sign is the 'result' of the 'test' of listening or examining the patient. Symptoms, signs and test results are all 'diagnostic findings'. The use of 'diagnostic' in this sense does not imply confirmatory (we say at times that findings are 'diagnostic', i.e. 'pathognomonic'). It is combinations of findings that usually confirm a diagnosis.

I regard a screening test result as a form of presenting complaint that also leads to a differential diagnosis. Both bring to our attention patients with a higher probability of a 'diagnosis of interest' in a big population. The subsequent reasoning may lead to changing the probabilities of the differential diagnoses and hopefully confirming one of them by showing the presence of a 'sufficient' diagnostic criterion. (It is at this stagfe that 'over-diagnosis' happens - because of faulty definitive diagnostic criteria.)

We then hope to show that the expected benefits from a treatment (e.g. avoiding metastases) outweigh the expected harms. (This can be modelled using Decision Analysis.) Some findings are better at doing this than others. As far as I can understand, it is this final stage that Teresa's data was about.

I explain how to obtain evidence for the value of 'diagnostic' findings at these different stages of the medical problem solving process in the final chapter of Oxford Handbook of Clinical Diagnosis.

Huw

Date: Mon, 9 Feb 2015 12:09:54 +0000

Subject: Re: Genetic tests and Predictive validity

As Huw recently shared evaluation of diagnostic/predictive tests can be different depending on the purpose.  Huw’s list was:

1.       For population screening

2.       For differential diagnosis

3.       For diagnostic confirmation

4.       For diagnostic exclusion

5.       For predicting outcomes (predicting future risk)

These concepts are further complicated by imprecise use of language.    Many of us use “screening” to mean testing for a diagnosis in people with no symptoms.   In this context screening differs from diagnostic testing not so much in the science/math/statistical approach but often in the baseline risk (lower prevalence/baseline risk/pretest probability in the screened population) and in the values/preferences for weighing benefits and harms – leading many to consider a higher threshold for confidence in benefit (greater demand for evidence for benefit) to recommend screening for an asymptomatic person than to recommend a diagnostic test for a symptomatic person.

But this does get confused in general language because testing is often a multi-stage process, so the terminology used could be a “screening test” and a “confirmatory test” and that language may get used for screening or diagnosis in the earlier description of the terms.

So there is a substantial problem with the terminology when the terms themselves are used in many different ways.

A diagnostic test is a test used in symptomatic persons (to distinguish from a screening test)

A diagnostic test is a test which is able to confirm the diagnosis (as distinct from earlier testing that increases or decreases our suspicion for the diagnosis)

A diagnostic test is any test that implies an increase or decrease in the likelihood of the condition (and thus includes all the other tests noted above by any term)

A diagnostic test is used to describe the result of the test rather than the test itself.  If we have certainty after testing then it was a diagnostic test.

All of this makes communication and education around diagnostic testing more challenging.

Brian S. Alper, MD, MSPH, FAAFP

Founder of DynaMed
Vice President of EBM Research and Development, Quality & Standards

dynamed.ebscohost.com

Sent: Monday, February 02, 2015 7:49 AM
Subject: Re: Genetic tests and Predictive validity

Dear All,

Brian, you said:

"But another consideration is sometimes tests are used for “diagnostic” purposes – Does the patient have or not have a certain diagnosis? – an in these cases sensitivity, specificity, PPV*, NPV*, positive likelihood ratio, and negative likelihood ratio (* with prevalence to put into perspective) are clear."

What about the screening situation, e.g. a breast cancer screening mammography leads to a biopsy and a pathology report: if the report is genuinely 'borderline' e.g. the pathologist reports seeing some kind of atypia, 'indolent changes', in-situ changes etc. (changes for which I think there is no evidence for any net benefit of treatment; ref below) How much clarity is there then? Is this a kind of 'no gold standard situation'? So the so called diagnostic tests ROC curve(s) becomes guesswork? Maybe this shouldn't be called a diagnostic test?

Owen;

(Esserman LJ, Thompson IM, Reid B. Overdiagnosis and overtreatment in cancer: an opportunity for improvement. JAMA. 2013 Aug 28; 310(8):797-8.)

Dear all,

Question is though: is the genetic test you want to evaluate actually used as a diagnostic test? From what I understand from the case mentioned by Teresa she is interested not in whether a genetic test accurately recognizes whether you have a certain genotype but whether you will in the future develop a certain phenotype. So you are not trying to find out whether the patient currently suffers from a condition but the risk of developing a condition in the future. Now unless you have a dominant gene that will always lead to the expression of a certain phenotype (like Huntingtons) you need to consider whether that genotype is not just one of many factors that can lead to a certain condition. For the examples mentioned like Mammaprint prognostic modelling seems much more appropriate to me than diagnostic accuracy though ultimately you need RCTs to prove that they improve patient reported outcomes and from what I saw last those don’t exist.

Best wishes, Heike

Heike Raatz, MD, MSc

Basel Institute for Clinical Epidemiology and Biostatistics

Hebelstr. 10

4031 Basel

Tel.: +41 61 265 31 07

Fax: +41 61 265 31 09

Sent: Monday, February 02, 2015 10:04 AM

Subject: Re: Genetic tests and Predictive validity

Dear All,

Is it, or is it not, correct that one should follow the classic teaching that (loosely and notwithstanding the false partition here): from patient's perspective, sensitivity/specificity are what is relevant; and from clinician's perspective, PPV/NPPV are what is relevant?

Also, it is, isn't it, crucial to consider the media take on outcome of research and how careful researchers need to be in selecting the way they present the outcome of their research? MRI (NMR) in diagnosing autism is one example that springs to mind - the high sensitivity was jumped on by the media presenting it as a very accurate test missing the role of the varying prevalence in certain settings.

I find this discussion trail hugely thought provoking!

Best Regards

Majid

At the same time, don’t we need to know whether  the  patient  probably  has or does not have the  condition  of  interest?   Yes, prevalence and  other  factors  affect PPV and  NPV, but in my  opinion we  need to move away from the  oversimplified notion that test interpretation depends on  a single factor.

Sent: Friday, January 30, 2015 8:08 PM
Subject: Re: Genetic tests and Predictive validity

Hi Teresa,

You are absolutely correct.  This is why we should demand that diagnostic studies ONLY present the results of Sensitivity, Specificity and Likelihood ratios.

This issue has been a serious problem for many years and it is about time that more people spoke up about it.  Also, journal editors and peer reviewers should be up in arms against the practice of reporting PPV and NPV.

Best wishes

Dan

Sent: Friday, January 30, 2015 11:59 AM
Subject: Genetic tests and Predictive validity

I’ve just started reading the literature on genetic tests, and noticing how many of them tend to focus on predictive value—that is, if a certain test accurately predicts whether a patient will or won’t get a particular phenotype (condition), the authors suggest the test should be used.  But if we’re deciding whether to order the test in the first place, shouldn’t we be focused on sensitivity and specificity instead, not PPV and NPV?  Predictive value is so heavily dependent on disease prevalence.  For example, if I want to get tested for a disease with a 2% prevalence in people like me, I could just flip a coin and regardless of the outcome, my “Coin Flip Test” would show an NPV of 98%!  So what does NPV alone really tell me, if I’m not also factoring out prevalence—which would be easier done by simply looking at sensitivity and specificity?  Someone please tell me where my thinking has gone awry!

For a concrete example, look at MammaPrint, a test which reports binary results.  In addition to hazard ratios, study authors often tout statistically significant differences between the probabilities of recurrence-free survival in the MammaPrint-High Risk vs. MammaPrint-Low Risk groups (essentially the test’s predictive values).  In the RASTER study (N = 427), 97% of the patients with a “Low Risk” test result did not experience metastasis in the next 5 years.  Sounds great, right?  But when you look at Sensitivity, you see that of the 33 patients in the study who did experience metastasis, only 23 of them were classified as “High Risk” by MammaPrint, for a 70% sensitivity.  If patients and clinicians are looking for a test to inform their decision about adjuvant chemotherapy for early stage breast cancer, wouldn’t the fact that the test missed 10 out of 33 cases be more important than the 97% NPV, an artifact of the extremely low 5-year prevalence of metastasis in this cohort (only 33 out of 427, or  0.7%)?

Drukker et al. A prospective evaluation of a breast cancer prognosis signature in the observational RASTER study. Int J Cancer 2013. 133(4):929-36. http://www.ncbi.nlm.nih.gov/pubmed/23371464

Retel et al. Prospective cost-effectiveness analysis of genomic profiling in breast cancer. Eur J Cancer 2013. 49:3773-9. http://www.ncbi.nlm.nih.gov/pubmed/23992641  (Provides actual true/false positive/negative results)

Thanks so much!

Teresa Benson, MA, LP

McKesson Health Solutions
18211 Yorkshire Ave

Prior Lake, MN  55372

Phone: 1-952-226-4033

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

----------------------------------------- CONFIDENTIALITY NOTICE: This email and any attachments may contain confidential information that is protected by law and is for the sole use of the individuals or entities to which it is addressed. If you are not the intended recipient, please notify the sender by replying to this email and destroying all copies of the communication and attachments. Further use, disclosure, copying, distribution of, or reliance upon the contents of this email and attachments is strictly prohibited. To contact Albany Medical Center, or for a copy of our privacy practices, please visit us on the Internet at www.amc.edu.

--

Dr Majid Artus PhD
NIHR Clinical Lecturer in General Practice
Arthritis Research UK Primary Care

Centre

Research Institute for Primary Care & Health Sciences

Keele University
Staffordshire, ST5 5BG
Tel: 01782 734826
Fax: 01782 733911
http://www.keele.ac.uk/pchs/

Please consider the environment before printing this email.
This email and its attachments are intended for the above named only and may be confidential. If it has come to you in error you should not copy or show it to anyone; nor should you take any action based on it, other than to notify the sender of the error by replying to the sender. Keele University staff and students are required to abide by the University's conditions of use when sending email. Keele University email is hosted by a cloud provider and may be stored outside of the UK.

--

Owen Dempsey MBBS MSc MRCGP RCGP cert II

07760 164420

GPwsi Substance Misuse Locala and Kirklees Lifeline Addiction Service

AMC Disclaimer : http://www.amc.nl/disclaimer