Print

Print


I don't think we are ready to reject sensitivity and specificity as the determinants of tests accuracy (and their derivatives- LRs).  Some tests may appear prevalence-dependent, but it often hints towards bias in the initial research, which is often based on severe part of the spectrum.

 

Nick

 

 

 

 

From: Evidence based health (EBH) [mailto:[log in to unmask]] On Behalf Of Huw Llewelyn [hul2]
Sent: Monday, February 22, 2016 3:02 PM
To: [log in to unmask]
Subject: Re: Likelihood ratios

 

Dear Xiaomei and Mike

 

I agree that the sensitivity can also changes in different settings, but usually because of disease severity, often caused by more severe patients being referred to hospital for example.  However, if severe versions of all diseases are referred, then all their 'sensitivities' may increase, so that their sensitivity ratios remain similar.  Likelihood ratios are also affected by disease severity of 'other diagnoses', but in addition to this , LRs change due to disease prevalence and this has a far more dramatic effect than anything else.  During differential diagnostic reasoning we look for big differences in 'sensitivity' (e.g. right lower quadrant tenderness occurs commonly in appendicitis but rarely in cholecystitis) and these big differences are not affected materially. 

 

Oxford University Press often allows access to parts of the Oxford Handbook of Clinical Diagnosis e.g. see:

 

 https://books.google.co.uk/books?hl=en&lr=&id=rNFLBAAAQBAJ&oi=fnd&pg=PR2&dq=oxford+handbook+of+clinical+diagnosis&ots=K27qGjmy3V&sig=ClpYhWLQtfOtDtYctA1e4IVrxz4#v=onepage&q=oxford%20handbook%20of%20clinical%20diagnosis&f=false

 

OR

 

http://www.amazon.co.uk/dp/019967986X/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=569136327&pf_rd_s=lpo-top-stripe&pf_rd_t=201&pf_rd_i=0199232962&pf_rd_m=A3P5ROKL5A1OLE&pf_rd_r=1EMVZGN41TJ2HGWVMJ74#reader_019967986X

 

 

With best wishes

 

Huw


From: Evidence based health (EBH) <[log in to unmask]> on behalf of Brown Michael <[log in to unmask]>
Sent: 22 February 2016 21:52
To: [log in to unmask]
Subject: Re: Likelihood ratios

 

Estimates for LRs, as well as sensitivity and sensitivity, may vary depending on the disease severity, not necessarily with changes in disease prevalence.

Mike

 

Michael Brown, MD, MSc
Professor and Chair, Emergency Medicine
Michigan State University College of Human Medicine

[log in to unmask]
cell: 616-490-0920


 

On Feb 22, 2016, at 4:41 PM, "Yao, Xiaomei" <[log in to unmask]> wrote:



Hi Huw,

 

I am going to teach graduate students on how to conduct clinical practice guideline on diagnostic topics. Thank you very much for your email to confirm what I thought. One additional question for you and other experts to discuss:

 

Since LR can vary with prevalence, the following nomogram would not be useful because it seems that the assumption of following nomogram is: one test’s LR doesn’t change with different prevalence of a disease, correct?

 

<image001.png>

 

 

By the way, does the Oxford Handbook of Clinical Diagnosis you mentioned have a free link for us to access, like DTA Cochrane Handbook?

 

Thanks,

Xiaomei

 

 

From: Huw Llewelyn [hul2] [mailto:hul2@aber.ac.uk] 
Sent: February-22-16 3:17 PM
To: [log in to unmask]
; Yao, Xiaomei
Subject: Re: Likelihood ratios

 

Xiaomei

Yes, the Likelihood Ratio (LR) does vary greatly with prevalence and this makes it unsuitable for estimating diagnostic probabilities during clinical diagnosis (it is OK only for calculating probabilities for single screening tests and single diagnoses).  You are correct by saying that a 2x 2 table will allow you to determine the probability of a diagnosis given a finding (the PPV), the probability of the finding given the diagnosis (the ‘sensitivity), the ‘specificity’ and likelihood ratio, etc.  Their interrelationship is clear for a single finding and a single diagnosis.  These terms work well in epidemiology when using one screening test for one diagnosis with a fixed prevalence of the diagnosis. 

When it comes to considering a differential diagnosis and estimating the probability of one of those diagnoses based on a combination of findings, the specificity, false positive rate and likelihood ratio are no good.  The problem is that if the disease is rare, then this causes the false positive rate to be very low.  By assuming statistical independence for different false positive rates and multiplying these likelihood ratios together, one gets a very inaccurate and falsely impressive combined false positive rate.  When this is used to estimate a likelihood ratio for the combination of findings, this is also very inaccurate (and falsely impressive).  As a result the probability for the diagnosis is inaccurate and falsely impressive (e.g. 0,999 when it is really 0.7).

‘Sensitivities’ and ratios of sensitivities do not suffer from this problem, and it can be avoided by not making an assumption of statistical independence at all, as I explain in the Oxford Handbook of Clinical Diagnosis.  I really think that teachers of EBM should drop this idea of advocating use of specificity, false positive rates and likelihood ratios for clinical diagnosis.

Best

Huw

Huw Llewelyn MD FRCP

Consultant Physician in endocrinology, internal and acute medicine

Honorary Fellow in Mathematics, Aberystwyth University, UK

 

 


From: Evidence based health (EBH) <[log in to unmask]> on behalf of Yao, Xiaomei <[log in to unmask]>
Sent: 22 February 2016 19:47
To: [log in to unmask]
Subject: Re: Likelihood ratios

 

Thanks Nickolas,

 

My further question is Can LR vary with prevalence for one test if sensitivity and specificity can vary with prevalence?

 

Xiaomei

 

From: Myles, Nickolas [PH] [mailto:[log in to unmask]] 
Sent: February-22-16 1:16 PM
To: Yao, Xiaomei; [log in to unmask]

Subject: RE: Likelihood ratios

 

Xiaomei,

Good question, as many people ask the same.

 

LRs are getting very useful and essentially the only way of knowing diagnostic accuracy of non-binary (i.e. multi-level results) medical test ( i.e. most biomedical tests.. )

In case of simple binary test LRs are useful, but not very helpful, as the same information is derived from sensitivity and specificity from 2x2 table. Still, LRs are way more handy when you calculate ppv/npv using Fagan nomogram (interactive one is embedded into CAT maker free app designed by Oxford CEBM, which I use a lot) . This ways you can see the change in ppv/npv instantly and interactively depending on your test LRs.

 

If you have never used it before- I highly recommend to start.

Best wishes,

 

Nickolas Myles, MD, PhD, MSc, FRCPC

Anatomical pathologist, St.Paul’s Hospital,

Clinical Associate Professor, University of British Columbia

Department of Pathology and Laboratory Medicine

1081 Burrard St, Vancouver, BC, V6Z1Y6

 

Phone (604) 682-2344 x 66038

 

From: Evidence based health (EBH) [mailto:[log in to unmask]] On Behalf Of Yao, Xiaomei
Sent: Monday, February 22, 2016 7:41 AM
To: [log in to unmask]

Subject: Likelihood ratios

 

Thanks everyone for raising the interesting question regarding pre-test probability and following discussions.

 

I always have a question about likelihood ratios. Are they really more useful than positive predictive values (PPV) and negative predictive values (NPV)? It seems that we only use likelihood ratios to calculate positive post-test probability or negative post-test probability. However, positive post-test probability = PPV, and negative post-test probability = 1 – NPV if we can draw a 2x2 table from a diagnostic study.

 

Also, thanks Patrick’s previous email regarding “Sensitivity, specificity, negative and positive predictive values are all group-based statistics… sensitivity and specificity can (also) vary with prevalence.” So, when we already know Sensitivity, specificity, PPV, and NPV from a diagnostic study, do we really need to calculate likelihood ratios?

 

Thanks,

Xiaomei

 

 

Xiaomei Yao

Health Research Methodologist

Program in Evidence-based Care, Cancer Care Ontario

Department of Oncology, McMaster University

 

 

 

From: Evidence based health (EBH) [mailto:[log in to unmask]] On Behalf Of Huw Llewelyn [hul2]
Sent: February-19-16 5:46 PM
To: [log in to unmask]

Subject: Re: Pre-test probability

 

Thank you for raising these interesting points about the problems associated with estimating post-test probabilities from pre-test probabilities.  Instead of reasoning using simple Bayes rule with likelihood ratios based on those ‘with a diagnosis’ and ‘without a diagnosis’, physicians like me reason with lists of differential diagnosis based on the extended form of Bayes rule.  For example, instead of ‘appendicitis’ or ‘no appendicitis’,we consider appendicitis or cholecystitis, salpingitis, mesenteric adenitis, ‘non-specific abdominal pain’ etc and use ratios of their ‘sensitivities’ (e.g. “guarding is common in appendicitis but less common NSAP”) to estimate probabilities.  I explain this in the Oxford Handbook of Clinical Diagnosis. 

In contrast, Bayes rule and multiplying pre-test probabilities several times with likelihood ratios often gives wildly over-confident probabilities (e.g. 0.999, when 75% are correct).  Perhaps the real answer is that it is Bayes rule with the independence assumption that is no good at estimating disease or event probabilities (not physicians)!  The mistake may be assuming that the calculated probabilities are 'correct' and that any probabilities that differ from these are 'incorrect'.  I would be grateful therefore if you could point out in your references a comparison of the calibration curves to assess the accuracy of probabilities generated using pre / post test probabilities using multiple products of likelihood ratios compared to the curves of physicians’ estimates of probabilities.

Huw Llewelyn MD FRCP

Consultant Physician in endocrinology, acute and internal medicine

Honorary Fellow in Mathematics, Aberystwyth University

 

 


From: Huw Llewelyn [hul2]
Sent: 19 February 2016 21:35
To: [log in to unmask]
; Poses, Roy
Subject: Re: Pre-test probability

 

Thank for raising these interesting points about the problems associated with estimating post-test probabilities from pre-test probabilities.  Instead of reasoning using simple Bayes rule with likelihood ratios based on those ‘with a diagnosis’ and ‘without a diagnosis’, they reason with lists of differential diagnosis based on the extended form of Bayes rule.  For example, instead of ‘appendicitis’ or ‘no appendicitis’, they consider appendicitis or cholecystitis, salpingitis, mesenteric adenitis, ‘non-specific abdominal pain’ etc and use ratios of their ‘sensitivities’ (e.g. “guarding is common in appendicitis but less common NSAP”) to estimate probabilities.  I explain this in the Oxford Handbook of Clinical Diagnosis.  In contrast, Bayes rule and multiplying pre-test probabilities several times with likelihood ratios often gives wildly over-confident probabilities (e.g. 0.999, when 75% are correct).  Perhaps the real answer is that it is Bayes rule with the independence assumption that is no good at estimating disease or event probabilities (not physicians)!  The mistake may be assuming that the calculated probabilities are 'correct' and that any probabilities that differ from these are 'incorrect'.  I would be grateful therefore if you could point out in your references a comparison of the calibration curves to assess the accuracy of probabilities generated using pre / post test probabilities using multiple products of likelihood ratios compared to the curves of physicians’ estimates of probabilities.

Huw Llewelyn MD FRCP

Consultant Physician in endocrinology, acute and internal medicine

Honorary Fellow in Mathematics, Aberystwyth University


From: Evidence based health (EBH) <[log in to unmask]> on behalf of Poses, Roy <[log in to unmask]>
Sent: 19 February 2016 18:19
To: [log in to unmask]
Subject: Re: Pre-test probability

 

This is a fairly good bibliography, but it's from 2009...

Cognitive Barriers to Evidence-Based Practice

Judgment and Decision Making
Bushyhead JB, Christensen-Szalanski JJ. Feedback and the illusion of validity in a medical clinic. Med Decis Making 1981; 1: 115-123.
Coughlan R, Connolly T. Predicting affective responses to unexpected outcomes. Org Behav Human Decis Proc 2001; 85: 211-225.
Dawes RM, Faust D, Mechi PE.  Clinical versus actuarial judgment.  Science 1989;243:1668-74.
Dawson NV, Arkes HR.  Systematic errors in medical decision making: judgment limitations. J Gen Intern Med 1987;2:183-7.
Hammond KR, Hamm RM, Grassia J, Pearson T. Direct comparison of the efficacy of intuitive and analytical cognition in expert judgment. IEEE Trans Systems Man Cybernetics 1987; SMC-17: 753-770.
Kern L, Doherty ME. "Pseudo-diagnosticity" in an idealized medical problem-solving environment. J Med Educ 1982; 57: 100-104.
Lyman CH, Balducci L. Overestimation of test effects in clinical judgment. J Cancer Educ 1993; 8: 297-307.
MacKillop WJ, Quirt CF. Measuring the accuracy of prognostic judgments in oncology. J Clin Epidemiol 1997; 50: 21-29
Mitchell TR, Beach LR. "Do I love thee? let me count ..." toward an understanding of intuitive and automatic decision making. Org Behav Human Decis Proc 1990; 47: 1-20.
Payne JW, Johnson EJ, Bettman JR, Coupey E. Understanding contingent choice: a computer simulation approach. IEEE Trans Systems Man Cybernetics 1990; 20: 296-309.
Poses RM, Cebul RD, Collins M, Fager SS.  The accuracy of experienced physicians' probability estimates for patients with sore throats.  JAMA 1985; 254:925-929.
Poses RM, Anthony M. Availability, wishful thinking, and physicians' diagnostic judgments for patients with suspected bacteremia. Med Decis Making 1991;11:159-168.
Poses RM, Bekes C, Copare F, Scott WE.  The answer to "what are my chances, doctor?"  depends on whom is asked: prognostic disagreement and inaccuracy for critically ill patients.  Crit Care Med 1989; 17: 827-833.
Poses RM, McClish DK, Bekes C, Scott WE, Morley JN. Ego bias, reverse ego bias, and physicians' prognostic judgments for critically ill patients. Crit Care Med 1991; 19: 1533-1539.
Reyna VF, Brainerd CJ. Fuzzy-trace thoery and framing effects in choice: gist extraction, truncation, and conversion. J Behav Decis Making 1991; 4: 249-262.
Shulman KA, Escarce JE, Eisenberg JM, Hershey JC, Young MJ, McCarthy DM, Williams SV.  Assessing physicians' estimates of the probability of coronary artery disease: the influence of patient characteristics.  Med Decis Making 1992;12:109-14.
Tetlock PE, Kristel OV, Elson SB, Green MC, Lerner JS. The psychology of the unthinkable: taboo trade-offs, forbidden base rates, and heretical counterfactuals. J Pers Social Psych 2000; 78: 853-870
Wallsten TS. Physician and medical student bias in evaluating diagnostic information. Med Decis Making 1981; 1: 145-164.

Stress
Ben Zur H, Breznitz SJ. The effect of time pressure on choice behavior. Acta Psychol 1981; 47: 89-104.
Harrison Y, Horne JA. One night of sleep loss impairs innovative thinking and flexible decision making. Org Behav Human Decis Proc 1999; 78: 128-145.
Kienan G. Decision making under stress: scanning of alternatives under controllable and uncontrollable threats. J Pers Social Psych 1987; 52: 639-644.
Koelher JJ, Gershoff AD. Betrayal aversion: when agents of protection become agents of harm. Org Behav Human Decis Porc 2003; 90: 244-261.
Zakay D, Wooler S. Time pressure, training and decision effectiveness. Ergonomics 1984; 27: 273-284.

Improving Judgments and Decisions
Arkes HR. Impediments to accurate clinical judgment and possible ways to minimize their impact. In Arkes HR, Hammond KR, editors. Judgment and Decision Making: An Interdisciplinary Reader. Cambridge: Cambridge University Press, 1986. pp. 582-592.
Arkes HR, Christensen C, Lai C, Blumer C. Two methods of reducing overconfidence. Org Behav Human Decis Proc 1987; 39: 133-144.
Clemen RT. Combining forecasts: a review and annotated bibliography. Int J Forecast 1989; 5: 559-583.
Coomarasamy A, Khan KS. What is the evidence that postgraduate teaching in evidence based medicine changes anything?: a systematic review. Brit Med J 2004; 329: 1017-1019.
Corey GA, Merenstein JH. Applying the acute ischemic heart disease predictive instrument. J Fam Pract 1987; 25: 127-133.
Davidoff F, Goodspeed R, Clive J. Changing test ordering behavior: a randomized controlled trial comparing probabilistic reasoning with cost-containment education. Med Care 1989; 27: 45-58.
de Dombal FT, Leaper DJ, Horrocks JC, Staniland JR, McCann AP. Human and computer-aided diagnosis of abdominal pain: further report with emphasis on performance of clinicians. Brit Med J 1974; 1: 376-380.
Doherty ME, Balzer WK. Cognitive feedback.  In Brehmer B, Joyce CRB, editors. Human Judgment: the SJT View. Amsterdam: Elsevier Science Publishers,, 1988.  pp. 163-197. 
Fryback DG, Thornbury JR. Informal use of decision theory to improve radiological patient management. Radiology 1978; 129: 385-388.
Gigerenzer G. How to improve Bayesian reasoning without instruction: frequency formats. Psychol Rev 1995; 102: 684-704.
Gigerenzer G. The psychology of good judgment: frequency formats and simple algorithms. Med Decis Making 1996; 16: 273-280.
Green ML. Evidence-based medicine training in internal medicine residency programs: a national survey. J Gen Intern Med 2000; 15: 129-133.
Hansen DE, Helgeson JG. The effects of statistical training on choice heuristics under uncertainty. J Behav Decis Making 1996; 9: 41-57.
Koriat A, Lichtenstein S, Fischoff B.  Reasons for confidence.  J Exp Psychol Human Learn Memory 1980;6:107-118.
Kray LJ, Galinksky AD. The debiasing effect of counterfactual mind-sets: increasing the search for disconfirmatory information in group decisions. Org Behav Human Decis Proc 2003; 91: 69-81.
Lloyd FJ, Reyna VF. A web exercise in evidence-based medicine using cognitive theory. J Gen Intern Med 2001; 16: 94-99.
Nisbett R, editor. Rules for Reasoning.  Hillsdale, NJ: Lawrence Erlbaum Associates, 1993.
Poses RM, Bekes C, Winkler RL, Scott WE, Copare FJ. Are two (inexperienced) heads better than one (experienced) head? - averaging house officers' prognostic judgments for critically ill patients.  Arch Intern Med 1990; 150: 1874-1878.
Poses RM, Cebul RD, Wigton RS, Centor RM, Collins M, Fleischli G. A controlled trial of a method to improve physicians' diagnostic judgments: an application to pharyngitis. Acad Med 1992; 67: 345-347.
Schulz-Hardt S, Jochims M, Frey D. Productive conflict in group decision making: genuine and contrived dissent as strategies to counteract biased information seeking. Org Behav Human Decis Proc 2002; 88: 563-586.
Selker HP, Besharsky JR, Griffith JL, Aufderheide TP, Ballin DS, Bernard SA, et al.  Use of the acute cardiac iscehmia time-insensitive predictive instrument (ACI-TIPI) to assist with triage of patients with chest pain or other symptoms suggestive of acute cardiac ischemia: a multicenter, controlled clinical trial.  Ann Intern Med 1998;129:845-855.
Siegel-Jacobs K, Yates JF. Effects of procedural and outcome accountability on judgment quality. Org Behav Human Decis Proc 1996; 65: 1-17.
Spiegel CT, Kemp BA, Newman MA, Birnbaum PS, Alter Cl. Modification of decision-making behavior of third-year medical students.  J Med Educ 1982; 57: 769-777.
Stewart TR, Heideman KF, Moninger WR, Reagan-Cirincione P. Effects of improved information on the components of skill in weather forecasting. Org Behav Human Decis Proc 1992; 53: 107-134.
Todd P, Benbasat I. Inducing compensatory information processing through decision aids that facilitate effort reduction: an experimental assessment. J Behav Decis Making 2000; 13: 91-106.
Tape TG, Kripal J, Wigton RS. Comparing methods of learning clinical prediction from case simulations. Med Decis Making 1992; 12: 213-221.

Useful Texts on Judgment and Decision Psychology
Cooksey RW. Judgment Analysis: Theory, Methods, and Applications.  San Diego: Academic Press, 1996.
Hogarth RM. Judgment and Choice: The Psychology of Decisions, 2nd edition.  New York, John Wiley and Sons: 1988.  pp. 62-86.
Kahneman D, Slovic P, Tversky A. Judgment Under Uncertainty: Heuristics and Biases. Cambridge, UK: Cambridge University Press, 1982.
Wright G, Ayton P. Subjective Probability. Chichester, UK: John Wiley and Sons, 1994.
Yates JF. Judgment and Decision Making. Englewood Cliffs, NJ: Prentice Hall, 1990.

 

On Fri, Feb 19, 2016 at 10:44 AM, Cristian Baicus <[log in to unmask]> wrote:

Thank you very much, Roy, for your excellent comment!

 

Yes, I'm interested by a few references!

 

Best wishes,

Cristian.

dr. Cristian Baicus

 

 from my iPad


On 19 feb. 2016, at 5:36 p.m., Poses, Roy <[log in to unmask]
> wrote:

The simple answer is that physicians are not good at estimating disease or event probabilities.  There is a large literature on this, going back to the 1970s. 

This is the Achilles heel of the attempt to promote rational decision making based on simple mathematical models.  It is not that there is doubt about Bayes Therorem.  There should be lots of doubt about the data plugged into it, though.


Cognitive psychologists have been studying human limitations in making judgments such as probability estimates for even longer, and most of what they have found probably applies to physicians.

 

It is not clear how physicians actually make such estimates in paticular cases.  It could be anything from pure intuition, to pattern recognition, to multivariate processes (one point for this, two for that, etc), to formal Bayesian calculation, use of prediction/ diagnostic rules, etc.  (But keep in mind that many such rules do not perform well when applied to new populations.

There are quite a few studies, some of which I did a long time ago, to show that physicians' probabilistic diagnostic or prognostic judgments are not very accurate, and physicians have been shown to be subject to judgment biases, to misuse judgment heuristics, and to rely on non-diagnostic or non-predictive variables and/or fail to take into account predictive or diagnostic variables in specific cases.

 

If anyone is really interested, I could drag out a host of references, many not so new. 

 

On Fri, Feb 19, 2016 at 10:02 AM, Brown Michael <[log in to unmask]> wrote:

Whether physicians are aware of it or not, they use a Bayesian approach in their daily practice when they estimate the patient's probability of having condition X based on elements of the history and physical (i.e., pretest probability) before ordering any diagnostic tests. If available for condition X, a clinical prediction rule may be used. Although this process is very far from an exact science, it is often good enough to move the clinician's suspicion above the treatment threshold or below the diagnostic threshold (alternative diagnoses considered). Although most of us would like to see things fit a more more exact mathematical formula, it is rare (at least in emergency medicine) to be able to make very precise probability estimates at the individual patient-level.

Mike

Michael Brown, MD, MSc
Professor and Chair, Emergency Medicine
Michigan State University College of Human Medicine

[log in to unmask]

cell: 616-490-0920





On Feb 19, 2016, at 5:47 AM, Kevin Galbraith <[log in to unmask]
> wrote:

> Hi there
>
> Can anyone advise: when calculating post-test probability of a diagnosis using the likelihood ratio for a diagnostic test, how do we make our best estimate of pre-test probability?
>
> I understand that prevalence is often taken as a pragmatic estimate of pre-test probability. But I assume a patient who presents with symptoms of the condition has, by definition, a pre-test probability that is greater than the prevalence in the wider (or preferably age/sex specific) population.
>
> To estimate pre-test probability, are we reliant on finding an estimate from an epidemiological study whose subjects most closely reflect the characteristics of our individual patient? This would seem a serious limitation to the utility of the Bayesian approach.
>
> Thanks
>
> Kevin Galbraith






--

Roy M. Poses MD FACP
President
Foundation for Integrity and Responsibility in Medicine (FIRM)
[log in to unmask]

Clinical Associate Professor of Medicine
Alpert Medical School, Brown University
[log in to unmask]

"He knew right then he was too far from home." - Bob Seger