Thank you for raising these interesting points about the problems associated with estimating post-test probabilities from pre-test probabilities. Instead of reasoning using simple Bayes rule with likelihood ratios based on those ‘with a diagnosis’ and ‘without a diagnosis’, physicians like me reason with lists of differential diagnosis based on the extended form of Bayes rule. For example, instead of ‘appendicitis’ or ‘no appendicitis’,we consider appendicitis or cholecystitis, salpingitis, mesenteric adenitis, ‘non-specific abdominal pain’ etc and use ratios of their ‘sensitivities’ (e.g. “guarding is common in appendicitis but less common NSAP”) to estimate probabilities. I explain this in the Oxford Handbook of Clinical Diagnosis.

In contrast, Bayes rule and multiplying pre-test probabilities several times with likelihood ratios often gives wildly over-confident probabilities (e.g. 0.999, when 75% are correct). Perhaps the real answer is that it is Bayes rule with the independence assumption that is no good at estimating disease or event probabilities (not physicians)! The mistake may be assuming that the calculated probabilities are 'correct' and that any probabilities that differ from these are 'incorrect'. I would be grateful therefore if you could point out in your references a comparison of the calibration curves to assess the accuracy of probabilities generated using pre / post test probabilities using multiple products of likelihood ratios compared to the curves of physicians’ estimates of probabilities.

Huw Llewelyn MD FRCP

Consultant Physician in endocrinology, acute and internal medicine

Honorary Fellow in Mathematics, Aberystwyth University

From: Huw Llewelyn [hul2]
Sent: 19 February 2016 21:35
To: [log in to unmask]; Poses, Roy
Subject: Re: Pre-test probability

Thank for raising these interesting points about the problems associated with estimating post-test probabilities from pre-test probabilities. Instead of reasoning using simple Bayes rule with likelihood ratios based on those ‘with a diagnosis’ and ‘without a diagnosis’, they reason with lists of differential diagnosis based on the extended form of Bayes rule. For example, instead of ‘appendicitis’ or ‘no appendicitis’, they consider appendicitis or cholecystitis, salpingitis, mesenteric adenitis, ‘non-specific abdominal pain’ etc and use ratios of their ‘sensitivities’ (e.g. “guarding is common in appendicitis but less common NSAP”) to estimate probabilities. I explain this in the Oxford Handbook of Clinical Diagnosis. In contrast, Bayes rule and multiplying pre-test probabilities several times with likelihood ratios often gives wildly over-confident probabilities (e.g. 0.999, when 75% are correct). Perhaps the real answer is that it is Bayes rule with the independence assumption that is no good at estimating disease or event probabilities (not physicians)! The mistake may be assuming that the calculated probabilities are 'correct' and that any probabilities that differ from these are 'incorrect'. I would be grateful therefore if you could point out in your references a comparison of the calibration curves to assess the accuracy of probabilities generated using pre / post test probabilities using multiple products of likelihood ratios compared to the curves of physicians’ estimates of probabilities.

Huw Llewelyn MD FRCP

Consultant Physician in endocrinology, acute and internal medicine

Honorary Fellow in Mathematics, Aberystwyth University

From: Evidence based health (EBH) <[log in to unmask]> on behalf of Poses, Roy <[log in to unmask]>
Sent: 19 February 2016 18:19
To: [log in to unmask]
Subject: Re: Pre-test probability

This is a fairly good bibliography, but it's from 2009...

Cognitive Barriers to Evidence-Based Practice

Judgment and Decision Making
Bushyhead JB, Christensen-Szalanski JJ. Feedback and the illusion of validity in a medical clinic. Med Decis Making 1981; 1: 115-123.
Coughlan R, Connolly T. Predicting affective responses to unexpected outcomes. Org Behav Human Decis Proc 2001; 85: 211-225.
Dawes RM, Faust D, Mechi PE. Clinical versus actuarial judgment. Science 1989;243:1668-74.
Dawson NV, Arkes HR. Systematic errors in medical decision making: judgment limitations. J Gen Intern Med 1987;2:183-7.
Hammond KR, Hamm RM, Grassia J, Pearson T. Direct comparison of the efficacy of intuitive and analytical cognition in expert judgment. IEEE Trans Systems Man Cybernetics 1987; SMC-17: 753-770.
Kern L, Doherty ME. "Pseudo-diagnosticity" in an idealized medical problem-solving environment. J Med Educ 1982; 57: 100-104.
Lyman CH, Balducci L. Overestimation of test effects in clinical judgment. J Cancer Educ 1993; 8: 297-307.
MacKillop WJ, Quirt CF. Measuring the accuracy of prognostic judgments in oncology. J Clin Epidemiol 1997; 50: 21-29
Mitchell TR, Beach LR. "Do I love thee? let me count ..." toward an understanding of intuitive and automatic decision making. Org Behav Human Decis Proc 1990; 47: 1-20.
Payne JW, Johnson EJ, Bettman JR, Coupey E. Understanding contingent choice: a computer simulation approach. IEEE Trans Systems Man Cybernetics 1990; 20: 296-309.
Poses RM, Cebul RD, Collins M, Fager SS. The accuracy of experienced physicians' probability estimates for patients with sore throats. JAMA 1985; 254:925-929.
Poses RM, Anthony M. Availability, wishful thinking, and physicians' diagnostic judgments for patients with suspected bacteremia. Med Decis Making 1991;11:159-168.
Poses RM, Bekes C, Copare F, Scott WE. The answer to "what are my chances, doctor?" depends on whom is asked: prognostic disagreement and inaccuracy for critically ill patients. Crit Care Med 1989; 17: 827-833.
Poses RM, McClish DK, Bekes C, Scott WE, Morley JN. Ego bias, reverse ego bias, and physicians' prognostic judgments for critically ill patients. Crit Care Med 1991; 19: 1533-1539.
Reyna VF, Brainerd CJ. Fuzzy-trace thoery and framing effects in choice: gist extraction, truncation, and conversion. J Behav Decis Making 1991; 4: 249-262.
Shulman KA, Escarce JE, Eisenberg JM, Hershey JC, Young MJ, McCarthy DM, Williams SV. Assessing physicians' estimates of the probability of coronary artery disease: the influence of patient characteristics. Med Decis Making 1992;12:109-14.
Tetlock PE, Kristel OV, Elson SB, Green MC, Lerner JS. The psychology of the unthinkable: taboo trade-offs, forbidden base rates, and heretical counterfactuals. J Pers Social Psych 2000; 78: 853-870
Wallsten TS. Physician and medical student bias in evaluating diagnostic information. Med Decis Making 1981; 1: 145-164.

Stress
Ben Zur H, Breznitz SJ. The effect of time pressure on choice behavior. Acta Psychol 1981; 47: 89-104.
Harrison Y, Horne JA. One night of sleep loss impairs innovative thinking and flexible decision making. Org Behav Human Decis Proc 1999; 78: 128-145.
Kienan G. Decision making under stress: scanning of alternatives under controllable and uncontrollable threats. J Pers Social Psych 1987; 52: 639-644.
Koelher JJ, Gershoff AD. Betrayal aversion: when agents of protection become agents of harm. Org Behav Human Decis Porc 2003; 90: 244-261.
Zakay D, Wooler S. Time pressure, training and decision effectiveness. Ergonomics 1984; 27: 273-284.

Improving Judgments and Decisions
Arkes HR. Impediments to accurate clinical judgment and possible ways to minimize their impact. In Arkes HR, Hammond KR, editors. Judgment and Decision Making: An Interdisciplinary Reader. Cambridge: Cambridge University Press, 1986. pp. 582-592.
Arkes HR, Christensen C, Lai C, Blumer C. Two methods of reducing overconfidence. Org Behav Human Decis Proc 1987; 39: 133-144.
Clemen RT. Combining forecasts: a review and annotated bibliography. Int J Forecast 1989; 5: 559-583.
Coomarasamy A, Khan KS. What is the evidence that postgraduate teaching in evidence based medicine changes anything?: a systematic review. Brit Med J 2004; 329: 1017-1019.
Corey GA, Merenstein JH. Applying the acute ischemic heart disease predictive instrument. J Fam Pract 1987; 25: 127-133.
Davidoff F, Goodspeed R, Clive J. Changing test ordering behavior: a randomized controlled trial comparing probabilistic reasoning with cost-containment education. Med Care 1989; 27: 45-58.
de Dombal FT, Leaper DJ, Horrocks JC, Staniland JR, McCann AP. Human and computer-aided diagnosis of abdominal pain: further report with emphasis on performance of clinicians. Brit Med J 1974; 1: 376-380.
Doherty ME, Balzer WK. Cognitive feedback. In Brehmer B, Joyce CRB, editors. Human Judgment: the SJT View. Amsterdam: Elsevier Science Publishers,, 1988. pp. 163-197.
Fryback DG, Thornbury JR. Informal use of decision theory to improve radiological patient management. Radiology 1978; 129: 385-388.
Gigerenzer G. How to improve Bayesian reasoning without instruction: frequency formats. Psychol Rev 1995; 102: 684-704.
Gigerenzer G. The psychology of good judgment: frequency formats and simple algorithms. Med Decis Making 1996; 16: 273-280.
Green ML. Evidence-based medicine training in internal medicine residency programs: a national survey. J Gen Intern Med 2000; 15: 129-133.
Hansen DE, Helgeson JG. The effects of statistical training on choice heuristics under uncertainty. J Behav Decis Making 1996; 9: 41-57.
Koriat A, Lichtenstein S, Fischoff B. Reasons for confidence. J Exp Psychol Human Learn Memory 1980;6:107-118.
Kray LJ, Galinksky AD. The debiasing effect of counterfactual mind-sets: increasing the search for disconfirmatory information in group decisions. Org Behav Human Decis Proc 2003; 91: 69-81.
Lloyd FJ, Reyna VF. A web exercise in evidence-based medicine using cognitive theory. J Gen Intern Med 2001; 16: 94-99.
Nisbett R, editor. Rules for Reasoning. Hillsdale, NJ: Lawrence Erlbaum Associates, 1993.
Poses RM, Bekes C, Winkler RL, Scott WE, Copare FJ. Are two (inexperienced) heads better than one (experienced) head? - averaging house officers' prognostic judgments for critically ill patients. Arch Intern Med 1990; 150: 1874-1878.
Poses RM, Cebul RD, Wigton RS, Centor RM, Collins M, Fleischli G. A controlled trial of a method to improve physicians' diagnostic judgments: an application to pharyngitis. Acad Med 1992; 67: 345-347.
Schulz-Hardt S, Jochims M, Frey D. Productive conflict in group decision making: genuine and contrived dissent as strategies to counteract biased information seeking. Org Behav Human Decis Proc 2002; 88: 563-586.
Selker HP, Besharsky JR, Griffith JL, Aufderheide TP, Ballin DS, Bernard SA, et al. Use of the acute cardiac iscehmia time-insensitive predictive instrument (ACI-TIPI) to assist with triage of patients with chest pain or other symptoms suggestive of acute cardiac ischemia: a multicenter, controlled clinical trial. Ann Intern Med 1998;129:845-855.
Siegel-Jacobs K, Yates JF. Effects of procedural and outcome accountability on judgment quality. Org Behav Human Decis Proc 1996; 65: 1-17.
Spiegel CT, Kemp BA, Newman MA, Birnbaum PS, Alter Cl. Modification of decision-making behavior of third-year medical students. J Med Educ 1982; 57: 769-777.
Stewart TR, Heideman KF, Moninger WR, Reagan-Cirincione P. Effects of improved information on the components of skill in weather forecasting. Org Behav Human Decis Proc 1992; 53: 107-134.
Todd P, Benbasat I. Inducing compensatory information processing through decision aids that facilitate effort reduction: an experimental assessment. J Behav Decis Making 2000; 13: 91-106.
Tape TG, Kripal J, Wigton RS. Comparing methods of learning clinical prediction from case simulations. Med Decis Making 1992; 12: 213-221.

Useful Texts on Judgment and Decision Psychology
Cooksey RW. Judgment Analysis: Theory, Methods, and Applications. San Diego: Academic Press, 1996.
Hogarth RM. Judgment and Choice: The Psychology of Decisions, 2nd edition. New York, John Wiley and Sons: 1988. pp. 62-86.
Kahneman D, Slovic P, Tversky A. Judgment Under Uncertainty: Heuristics and Biases. Cambridge, UK: Cambridge University Press, 1982.
Wright G, Ayton P. Subjective Probability. Chichester, UK: John Wiley and Sons, 1994.
Yates JF. Judgment and Decision Making. Englewood Cliffs, NJ: Prentice Hall, 1990.

On Fri, Feb 19, 2016 at 10:44 AM, Cristian Baicus <[log in to unmask]> wrote:

Thank you very much, Roy, for your excellent comment!

Yes, I'm interested by a few references!

Best wishes,

Cristian.

dr. Cristian Baicus
www.baicus.ro

from my iPad

On 19 feb. 2016, at 5:36 p.m., Poses, Roy <[log in to unmask]> wrote:

The simple answer is that physicians are not good at estimating disease or event probabilities. There is a large literature on this, going back to the 1970s.

This is the Achilles heel of the attempt to promote rational decision making based on simple mathematical models. It is not that there is doubt about Bayes Therorem. There should be lots of doubt about the data plugged into it, though.

Cognitive psychologists have been studying human limitations in making judgments such as probability estimates for even longer, and most of what they have found probably applies to physicians.

It is not clear how physicians actually make such estimates in paticular cases. It could be anything from pure intuition, to pattern recognition, to multivariate processes (one point for this, two for that, etc), to formal Bayesian calculation, use of prediction/ diagnostic rules, etc. (But keep in mind that many such rules do not perform well when applied to new populations.

There are quite a few studies, some of which I did a long time ago, to show that physicians' probabilistic diagnostic or prognostic judgments are not very accurate, and physicians have been shown to be subject to judgment biases, to misuse judgment heuristics, and to rely on non-diagnostic or non-predictive variables and/or fail to take into account predictive or diagnostic variables in specific cases.

If anyone is really interested, I could drag out a host of references, many not so new.

On Fri, Feb 19, 2016 at 10:02 AM, Brown Michael <[log in to unmask]> wrote:

Whether physicians are aware of it or not, they use a Bayesian approach in their daily practice when they estimate the patient's probability of having condition X based on elements of the history and physical (i.e., pretest probability) before ordering any diagnostic tests. If available for condition X, a clinical prediction rule may be used. Although this process is very far from an exact science, it is often good enough to move the clinician's suspicion above the treatment threshold or below the diagnostic threshold (alternative diagnoses considered). Although most of us would like to see things fit a more more exact mathematical formula, it is rare (at least in emergency medicine) to be able to make very precise probability estimates at the individual patient-level.

Mike

Michael Brown, MD, MSc
Professor and Chair, Emergency Medicine
Michigan State University College of Human Medicine

[log in to unmask]
cell: 616-490-0920

On Feb 19, 2016, at 5:47 AM, Kevin Galbraith <[log in to unmask]> wrote:

> Hi there
>
> Can anyone advise: when calculating post-test probability of a diagnosis using the likelihood ratio for a diagnostic test, how do we make our best estimate of pre-test probability?
>
> I understand that prevalence is often taken as a pragmatic estimate of pre-test probability. But I assume a patient who presents with symptoms of the condition has, by definition, a pre-test probability that is greater than the prevalence in the wider (or preferably age/sex specific) population.
>
> To estimate pre-test probability, are we reliant on finding an estimate from an epidemiological study whose subjects most closely reflect the characteristics of our individual patient? This would seem a serious limitation to the utility of the Bayesian approach.
>
> Thanks
>
> Kevin Galbraith

Roy M. Poses MD FACP
President
Foundation for Integrity and Responsibility in Medicine (FIRM)
[log in to unmask]
Clinical Associate Professor of Medicine
Alpert Medical School, Brown University
[log in to unmask]

"He knew right then he was too far from home." - Bob Seger