A Misuse of Bayes's Theorem MICHAEL LEVIN City College, City University o/New York Abstract: A standard analysis of probabilistic reasoning in the legal and psychological literature implies that peo- ple commonly overestimate the reliabaility of witnesses. This paradoxi- cal result arises from a misexplication of reliability. "Witness is right n% of the time" ordinarily means P (pIWitness says p) = n, not its converse-but the standard analysis takes reliability as the converse. When this misunderstanding is cleared away, the air of paradox dissipates. Resume: Une analyse courante dans les ecrits en droit et en psychologie sur la raisonnement sur les probabilites impJique que les gens surestiment d'habitude la surete des temoins. Ce resultat paradoxical se produit d'une explication fausse de la sQrete. "Temoin a raison en n% des cas" veut dire normalement P (pfTemoin ditp) '" n, ne pas la converse-mais selon I'analyse courante, la surete est la con- verse. Quand on a dHriche ce malentendu, I'air du paradoxe disperse. Keywords: probabilistic reasoning, witness reliabilility, Tversky and Kahneman, Bayes's Theorem, reliability, conditional probability To say someone is right n% of the time does not mean that he is right m% of the time, m « n, nor does it imply that his accuracy depends on the fre- quency of his guesses. Yet these seemingly evident truths are flouted by an analysis of "being right n% of the time" which has attained wide currency in the pedagogical and legal literature. I In the paradigm case discussed in this literature, taken from Tversky and Kahneman (see n. I), 15% of the taxicabs in a town are Blues and the rest are Greens. Walter Witness is said to be good at telling Greens from Blue; in fact, he is said to be right 80% of the time about when a cab is a Blue and when it is a Green. There is an accident involving a cab, and Witness maintains that the offending vehicle was Blue. According to common sense, the offending vehicle probably was Blue, and in fact the probability is .8. According to the analysis in question, the cab was probably not Blue. That paradoxical result is reached as follows. Let w be "Witness said the cab was Blue," h be "The cab was Blue," pcp) the background probability of p-where p = h, P(h) is the "base rate"-and P(P\q) the probability of p given q. Then (1) P(h) = .15; 15% of the cabs are Blue; (2) PC-h) = .85, since P(-p) = I-P(P); (3) pew/h) = .8; 80% of the time a cab is Blue, Witness says it is Blue; (4) P(w/ -h) = .2, assuming Witness always has an opinion. © Informal Logic Vol. 19, No.1 (1999): pp.63-66. 64 Michael Levin Note also (5) P(~w/ ~h) = .8 and (6) P(-wlh) = .2, where -w = "Witness says the cab was Green." (We again assume Witness is fully opinionated.) The probability that the cab was Blue given that Witness says it was Blue, P(hlw), can be derived from one case of Bayes's Theorem: (7) P(hlw) = P(wlh) x P(h) [pew/h) x P(h)] + [P(w/ -h) x PC-h)] Using (1) - (4), the tenn on the right becomes .8 X .15 = .12/.29 = .41. (.8 X .15) + (.2 X .85) Amazingly, if Witness says the cab was Blue, the probability exceeds 58% that it was Green. This paradigm is expanded to the conclusion that in a wide variety of cases the declarations of reliable witnesses should be viewed with suspicion. L.J. Cohen has also urged that there is something fishy about this argu- ment,2 but his diagnosis, or rather diagnoses, are somewhat obscure. At some points he seems to accuse the received analysis of conflating causal propen- sity with statistical frequency, or what is true of a concrete individual with what is true of a population in the long run; at other points he seems to accuse it of ignoring the need to narrow the reference class, or of insufficient atten- tion to the principle of indifference. This confusion was mirrored in the vari- ety of objections brought about by his critics.3 I suggest the culprit here is much more easily identified: it lies in initially explicating "the probability of Witness being right about the cab being Blue" as P( wlh), and, generally, in taking the probability of someone' s being right about a world-state to be the probability of his saying that that state obtains given that it does. As used in ordinary discourse, a phrase such as "the probability that Witness was right about the cab being Blue" is doubtless somewhat vague, and there may be no such thing as the idea it conveys. However, it seems to me that what is nonnally meant by it is simply P(h/w), the probability that the Cab is blue conditional on Witness saying it is, and the probability of some- one's being right about a world-state is the probability of that state obtaining given that he says it does. Overall, the probability that Witness is right about the color of the car, whatever color he says it is, is [P(hlw) + P(-hl-w)]/2, and, where {hJi = 1, ... , n is a set of disjoint hypotheses about the color, the probability of Witness being right is £t(h/w)ln. In other words, when Witness is said to be right 80% of the time, there is no need to calculate P(hlw), by Bayes's Theorem or any other means, be- cause P(hlw) is what has been stipulated. If Witness says the errant cab was Blue, we have already been told how likely it is that it was Blue: .8. Using the A Misuse of Bayes's Theorem 65 terminology of statistics, someone is right most of the time when his remarks are a specific symptom of truth, and the foregoing "proof' that the errant cab was Green errs in treating Witness's words as sensitive to truth. Further evidence for this diagnosis is that estimates of chances of success are usually thought to be independent of the frequency of opportunities. To say that Mark Marksman's reliability with a rifle is .55, i.e. that he hits the bullseye a commendable 55% of the time, makes no reference to the fre- quency with which he or anyone else shoots at it. Likewise, "Witness is right about h n% of the time" should be invariant under changes in P(h), and a term for evaluating his reliability should not-as the Bayesian quotient does-de- pend on P(h). Of course, the reliability of an estimate of Marksman's reliabil- ity does depend on how many shots he has taken, but that is a different statistic. Marksman's and Witness's reliability may vary with circumstances, in which case a variable, c, ranging over circumstances must be introduced, and Witness's reliability be reckoned as P (hlw&c). But that too is independent of P(h). If the odds of Marksman's hitting the bullseye the next time he tries are calculated as the standard analysis calculates the odds of the cab Witness saw being Blue, a few reasonable assumptions show that he will almost surely miss. For if m is "Marksman shoots" and s is "A bullseye is scored," let "Marksman hits the bullseye 55% of the time" be interpreted as P(mls) = .55, i.e., that Marksman is the shooter 55 times out of every 100 times a bullseye is scored. Finally, suppose the rest of Marksman's regiment are such poor shots that the regimental average is .2. In other words, the background prob- ability that a shot will hit the bullseye, or pes), is .2. The probability that Marksman will hit the bullseye the next time he shoots at it, explicated as pes! m), is then P(m!s) x pes) = .55 x.2 [P(m!s) x pes)] + [P(m! ~s) x P(-s)] (.55 x .2) + (.45 x .8) a feeble .23. Marksman's chances of a bullseye, on this reckoning, will improve if the rest of his regiment improve their aim. But an accurate shot should not be inaccurate because his regiment is, nor, if his accuracy remains constant, should he improve by being transferred to a unit of sharpshooters. As ordinar- ily conceived, the probability of Marksman hitting the target his next time out is team-independent, and is already given by his average, .55. "Reliability" should be explicated so as to preserve the apparent truism that someone equally reliable at two tasks-such as shooting for two different regiments, or identifying cabs of different colors-is equally likely to succeed at both. This principle is violated by the "Bayesian" analysis I have criticized. For let us assume, as does the received analysis, that Witness is precisely as reliable about Greens as about Blues, i. e., (5) and (6). To evaluate the prob- ability that the errant cab was Green if Witness says it was, switch h with -h and w with -w in (7); P(-h!-w) is then (.8 x .85) + [(.8 x .85) + .2 x .15)] 66 Michael Levin = .95. That P(-hl-w) »P(hlw)-the cab is more likely to have been Green if Witness says Green than to have been Blue if Witness says Blue-shows that, whatever we are discussing, it is not the probability that Witness is right. What we are discussing, when Bayes's Theorem comes into play, is the cab's likely color when we do not know the probability that a cab is the color Witness says it is. Background infonnation, including base rates, then be- comes pertinent. If most cabs are Green, the cab Witness saw very likely was Green, all else equal. If in addition most of the time Witness will say a cab is Green when it is, and say it is Blue when it is, the cab he saw is almost certain to have been Green ifhe says Green-but less certain to have been Blue ifhe says Blue. Many situations, like this one, involve an indicator of unknown trustworthiness. We know the odds that a subject with clogged arteries will feel fatigue, and the odds that a subject with nonnal arteries will feel fatigue. What we would like to know is the specificity of fatigue, the probability that someone feeling fatigue has clogged arteries. In such cases we should not say we know how well fatigue predicts clogged arteries. Did we know that, fur- ther infonnation would be superfluous. Indeed, knowing an idicator's trust- worthiness and what the received analysis calls "trustworthiness" would us to solve for the base rate. To repeat, I bear no iII-will toward Bayes's Theorem or base rates. I urge only that, when they are relevant, we use the descriptor "reliability" of the conditional probability we are calculating, not of the converse conditional prob- ability, which generally has a much higher value. This "right ordering of names" (as Hobbes would have called it) disperses air of paradox, and the suggestion of stubborn human irrationality.4 Notes 1 See: Paulos, J., Innumeracy (New York, 1988), p. 123; Epstein, R., Forbidden Grounds (Cambridge, MA, 1992), pp. 40-41; Kaye, D., "The Law of Probability and the Law of the Land," University ojChicago Law Review 34 (1979); Koehler, J., and Shaviro, D., "Veridi- cal Verdicts: Increasing Verdict Accuracy through the Use of Overtly Probabilistic Evi- dence and Methods," Cornell University Law Review 247 (1990); Tversky, A., and Kahneman, D. , "Causal Schemas in Judgments under Uncertainty," in Fishbein, ed., Progress in Social Psychology 117(1980). 2 Cohen, L.J. and commentators, "Can Human Irrationality be Experimentally Demon- strated?" The Behavioural and Brain Sciences 4 (1981). 3 See particularly the contributions ofDiaconis and Freedman, Krantz, Mackie, Margalit and Bar-Hillel, Niiniluoto, Skyrms and Zabel in the Behavioural and Brain Sciences sympo- sium cited in n. 2. Some of these critics, particularly Diaconis and Freeman and Mackie, mention the distinction I go on to stress. 4 I wish to thank Jonathan Adler for helpful criticism and suggestions. Michael Levin, City College and the Graduate Center, CUNY 138th St. and Convent Avenue. New York, NY 10031 U.S.A.