The Thomas-Nolt Dispute: Some Lessons about Induction DAVID HITCHCOCK McMaster University Abstract: I resolve an apparently unresolved dispute about how probable uniform expe- rience makes an extrapolation from it, and draw some general lessons about such enumerative induction. Uniform experience does not necessarily confer a high probabil- ity on an extrapolation of or generalization from that experience. Rational extrapolation or generalization typically involves a lot of specific background information, though not necessarily a general assumption that nature is uniform or that the future will resemble the past. And new evidence which is highly likely on one hypothesis but highly unlikely on any of its competitors does not necessar- ily make the former hypothesis highly prob- able. Resume: Je resous une controverse apparamment irresolue concernant la probabilite des extrapolations fondees sur I 'uniformite de I' experience, et je tire quelques conclusions generales sur I'induction enumerative. L'uniformite des evenements n'accorde pas necessairement une haute probabilite Ii une extrapolation ou Ii une generalisation basee sur cette uniformite. Une extrapolation ou une generalisation rationnelle s'etaye typiquement sur beaucoup d'information specifique de fond, mais ne s'appuie pas necessairement sur une suppostion que la nature est uniforme ou que l'avenir ressemblera au passe. Et lorsqu'une hypothese rend tres probables des nouvelles preuves et que des hypotheses qui lui font concurrence rendent peu probable ces memes preuves, celle-Ia n'est pas necessairement tres probable. Keywords: Induction, enumerative induction, inductive extrapolation, inductive extrapo- lation, inductive generalization, Bayes' theorem, epistemic probability, Stephen Thomas, John Nolt. Consider balls drawn from an urn at random. Suppose there are fifty balls in the urn, and that the first forty-nine, drawn at random, all have been blue. ... If one calculates the probability that the remaining ball is blue given that the first forty-nine drawn at random have been blue, the probability is well in excess of 80%. (Thomas 1984: 32) This claim [that the probability is well in excess of 80%] seems plausi- ble, but in fact there is no way to calculate such a probability from the information Thomas gives. '" if we do not know the number of blue balls in the urn initially and we make only the assumptions that Thomas gives, then no calculation will yield the probability that the remaining ball is blue. We can, without violating any mathematical law, assign that proposition any probability we like. (Nolt 1985: 56) © Informal Logic Vol. 19, Nos. 2&3 (1999): pp.201-212. 202 David Hitchcock This dispute apparently remains unresolved: Thomas in the latest edition of his textbook (1997: 131) continues to claim a high conditional probability for a similar case. Though the example is artificial, it raises a fundamental question about the legitimacy of all extrapolation and generalization from uniform experi- ence. I propose therefore to resolve the dispute, and to draw from this resolution some general lessons about inductive reasoning. 1. Analysis of the example We can set out the example in the form of an argument: There were fifty balls in the urn. The first forty-nine, drawn at random, all have been blue. Therefore, probably, the remaining ball is blue. The dispute concerns how probable the premisses ofthis argument make its con- clusion. The probability in question cannot be construed, at least in any obvious way, as any sort of relative frequency, either actual or hypothetical, since it does not con- cern the probability that an arbitrarily chosen member of a class will have a certain property. It rather concerns the probability, relative to specified information, that a definite proposition is true, that the remaining ball in the urn is blue. This is an epistemic probability, the degree of confidence that it is reasonable to attach to a proposition given certain information. An initial approximation to understanding this kind of probability is to think of it in terms of the fair betting odds that the proposition is true. Given that the first 49 balls selected from the urn are blue, what would be fair odds if one person bet another person a dollar that the remain- ing ball will also be blue? If it is fair to accept such a bet at even odds, then the rational degree of confidence, given this information, that the remaining ball is blue is .5, or 112. In general, if the fair odds, given background information K, on a bet that a proposition is true are x:y, then the rational degree of confidence in the proposition is roughly x + (x + y). To make the rational degree of confidence in a proposition an exact function of the fair odds in a bet that it is true, however, we have to make some patently absurd assumptions: that the fairness of the odds is not influenced by the potential bettor's enjoyment of or aversion to gambling, that the only thing of value is the currency of the potential bet, that the value of that currency increases exactly proportionately with the amount, and so on. The ab- surdity of such assumptions vitiates the usual Dutch book argument that the de- grees of confidence it is rational to assign to propositions must conform to the axioms of the classical probability calculus. For more sophisticated arguments, see Kaplan (1996), Ramsey (1990/1925-29) and Savage (1972/1954). On the ba- sis of these more sophisticated arguments, I shall assume in what follows that rational degree of confidence is a probability function in the sense of the classical probability calculus. This assumption is compatible with various theories about rational degree of confidence, e.g., that it is a logical probability (Carnap 1962/ The Thomas-Nolt Dispute: Some Lessons about Induction 203 1950), a rationally constrained subjective degree of confidence (Howson and Urbach 1989, Kaplan 1996), or a mixed physical and epistemic probability (Pollock 1990). In form, the argument based on Thomas' example is what we might call an inductive extrapolation; it extrapolates a property found in all observed members of a class to an unobserved member of that class. Russell (1948: 401) called such reasoning "particular induction by simple enumeration". In substance, however, since there is only one unobserved member left, the argument is an inductive generalization; it amounts to generalizing from the presence of a property in all observed members of a class to its presence in all members of the class. Russell (1948: 401) called such reasoning "general induction by simple enumeration". To prove that the conditional probability at issue is the same regardless of whether we express the conclusion as a singular statement or a universal generali- zation, we can appeal to the definition of conditional probability: the probability that a hypothesis H is true given certain evidence E, written "p(HIE)", is the result of dividing the probability that both the hypothesis and the evidence are true ("p(H & E)") by the probability that the evidence obtains ("p(E)"), provided that this latter probability is not 0; if p(E) "" 0, then the conditional probability is undefined. I In what follows, I shall assume, since we are given that E actually occurs, that p(E)"* O. That is, it was not rational to have absolute confidence in advance that E would not occur, given that it actually did occur; a rational thinker would never think it fair, I suppose, to accept infinite odds against a proposition which later turns out to be true. If we let HE" stand for the premisses of our example, then the question is whether p(remaining ball is bluelE) = p(a1l50 balls blueIE). Applying the definition of conditional probability, this amounts to the question whether p(remaining ball is blue & E) + p(E) "" p(all 50 balls blue & E) + p(E). Since p(E)"* 0, our question is whether p(remaining ball is blue & E) = p(all 50 balls blue & E). But "remaining ball is blue & E" entails "aliSO balls blue & E", and vice versa; the two conjunc- tions are logically equivalent. Assuming that logically equivalent propositions have the same probability, we can conclude that the conditional probability at issue is the same regardless of whether we express the conclusion as a singular statement or a universal generalization. In what follows I shall take the conclusion to be a universal generalization. 2. Preliminary solution Bayes' theorem allows us to calculate the epistemic probability that a hypothesis H is true given certain new evidence E, a probability generally referred to as the posterior probability, provided we are given three other epistemic probabilities, each construed as a rational degree of confidence in a proposition. First, we need the prior probability ofthe hypothesis, that is, the probability that the hypothesis is true, given our background information independently of the new evidence. (I shall call this "p(HIK)", where p is the probability function, H is the hypothesis and 204 David Hitchcock K is our background information apart from the new evidence. The posterior probability we are looking for is thus given by "p(HIE & K)", where E is the new evidence.) Second, we need the posterior likelihood, the likelihood of the evidence on the assumption that the hypothesis is true, again assuming the same back- ground information which we have independently of the new evidence. (I shall call this "p(EIH & K)", where E is the new evidence.) Third, we need the prior likeli- hood of the evidence, the likelihood that the evidence is true on the assumption of our background information, without assuming the truth of the hypothesis under investigation. (I shall call this "p(EIK)".) Bayes' theorem tells us that, if the prior likelihood is not zero, the posterior probability of a hypothesis on new evidence is its prior probability multiplied by the ratio of the posterior likelihood of the evi- dence to its prior likelihood: p(HIE & K) = p(HIK) x p(EIH & K) + p(EIK). The proof of the theorem rests on the above-mentioned definition of a condi- tiona� probability p(AIB) as the result of dividing the probability that both A and B obtain by the probability that B obtains, provided that this latter probability is not zero. If one replaces the conditional probabilities in Bayes' theorem according to this definition, one sees that the theorem is correct, provided that the prior likeli- hood of the evidence [p(EIK)] is not zero. Our hypothesis H is that all the balls in the urn are blue. The new evidence E is that there were 50 balls in the urn and the first 49 drawn at random from the urn are blue. The relevant background information K is difficult to specify completely, but it would include knowledge about the general properties of balls (that they retain their individuality over time, neither evaporating like mothballs nor merging like drops of water; that they do not spontaneously change colour as chameleons do; etc.), as well as assumptions about the particular situation which are not ex- plicit in Thomas' description of it (that the balls drawn from the urn were not put back in it, that nobody else put balls in the urn or took them out after the drawing of the 49 blue balls began, that nobody came along after the 49 balls were drawn and repainted the remaining one, etc.). It is important to realize that such back- ground information is always present when we extrapolate or generalize from instances; philosophical discussions of the problem of justifying induction distort the practice when they treat it as if it were just a matter of extrapolating or gener- alizing from observed instances, without any other information. Since the evidence of the first 49 balls drawn being blue is a logical conse- quence of the hypothesis that all 50 balls are blue (given implicit background assumptions such as those mentioned in the preceding paragraph), the posterior likelihood of the evidence is 1.2 Hence, in this case the posterior probability of the hypothesis will simply be the prior probability of the hypothesis divided by the prior likelihood of the evidence. The Thomas-Nolt Dispute: Some Lessons about Induction 205 1. p(allSO balls bluelfirst 49 blue & K) = p(allSO balls bluelK) + p(first 49 bluelK) Of course, we are not given information which would allow us to assign defi- nite numbers to this prior probability and prior likelihood. What can we do? One strategy is to notice that, once 49 blue balls have been drawn from the urn, there is only one alternative hypothesis about the colour of the 50 balls which has not been refuted. That is the hypothesis that 49 balls in the urn were blue and the other one was not blue. The posterior likelihood of the evidence, given this hypothesis and background information, is 1150. For, given the fact that the 49 balls were drawn at random, and assuming that exactly 49 of the 50 balls originally in the urn were blue, there was a probability of 49/50 that the first ball drawn would be blue, of 48/49 that the second ball would be blue given that the first one was blue, and so on, up to a probability of 2 that the 49th ball would be blue given that the first 48 drawn were blue. These 49 probabilities are probabilities of inde- pendent events: that the first ball drawn is blue is for example independent of the event that the second ball drawn is blue given that the first one is blue. Hence, by the multiplication rule of the classical probability calCulus for independent events, the probability of the first 49 balls being drawn at random being blue is the product of these 49 probabilities, or 1150. This is of course a frequency probability, but it can be used to justify an epistemic probability of the same magnitude in a step of direct inference (Pollock 1990, Bacchus 1990). Hence the posterior probability that 49 of the 50 balls are blue, given relevant background information along with the evidence that the first 49 balls drawn at random were blue, is 1/50 of the prior probability of the hypothesis divided by the prior likelihood ofthe evidence. That is, 2. p(49 balls bluelfirst49 blue & K)= p(49 balls bluelK)+ SOp(first49 bluelK). But the evidence, along with the background information, entails that either all 50 balls in the urn were blue or 49 of them were. Hence, by the reasoning of note 2, we can conclude that: p(allSO balls blue or 49 balls bluelfirst 49 blue & K) = 1. The classical probability calculus tells us that the probability of a disjunction of two mutually exclusive propositions is the sum of the probabilities of the two disjuncts. And the propositions that all 50 balls in the urn are blue and that 49 of them are blue are mutually exclusive. Hence: 3. p(ailSO balls bluelfirst 49 blue & K) + p( 49 balls bluelfirst 49 blue & K) 1. Applying some algebraic transformations to the above three equations, we can express the posterior probability as a function of the ratio r of the prior probability that all 50 balls are blue to the prior probability that 49 balls are blue: 4. p(allSO balls bluelfirst 49 blue & K) = SOr + (SOr + 1), provided that p( 49 blue balls bluelK) '* 0.3 206 David Hitchcock Alternatively, by reasoning parallel to that of note 3, we can express the poste- rior probability of the 50 blue balls hypothesis as a function of the inverse ratio rl ofthe prior probability that 49 balls are blue to the prior probability that all 50 balls are blue: 5. p(50 blue ballslfirst 49 balls blue & K) = 50 + (50 + ,...1), provided that p(a1l50 balls bluelK) '* O. 3. A logical approach Of course, we are not given the ratio of the prior probabilities of our two remain- ing hypotheses, let alone absolute values of each prior probability. We might try, however, to calculate the ratio on the basis of an a priori weighting of the logical possibilities. Prior to any inspection of the 50 balls in the urn, the information given by Thomas, along with our background information, leaves open all the logical possibilities about their colours. Each ball may be any possible colour, or any combination of those possible colours. Putting together such a possibility for each of the 50 balls gives rise to what Carnap (196211950: 70-72) called a "state de- scription", which in this case would be a conjunction of 50 singular propositions postulating a particular colour or combination of colours for each of the 50 balls. (For simplicity, we ignore the other components of a complete state description, on the ground that they are irrelevant to the present problem.) Carnap proposed to assign logically prior probabilities not to those state descriptions but to structural descriptions which each covered all the structurally isomorphic state descriptions; for example the structural description "49 balls are blue and one ball is red" covers 50 state descriptions, each of which makes a different one of the 50 balls the red one. The hypothesis that 49 balls are blue and one is non-blue is in fact a conjunc- tion of structural descriptions, one for each of the colours other than blue. How many colours other than blue are there? Since there are many shades of blue, the logical parallels to blue are general colours like red, yellow, orange, green, purple, brown, black, grey, white, transparent. Let us take this list as the complete list of alternatives to blue, while recognizing some arbitrariness in where the lines are drawn between one colour and another and how many colours to put on the list. This leaves 10 structural descriptions in which 49 balls are blue and the other is a single colour other than blue. But notice that our background information does not exclude the possibility that a ball in the urn is multi-coloured; any ball may have two, three, four, up to 11 colours. Adding in these possibilities gives us 2,036 additional structural descriptions in which 49 balls are blue and the other is not blue, for a total of 2,046.4 If we assign an equal prior probability to all the struc- tural descriptions not ruled out by the first 49 drawings, we find that the ratio r is 1/2,046. Putting this ratio into our formula for the posterior probability of the 50 blue balls hypothesis, we get a value of 50/2,046 .;- 2,09612,046, i.e., 5012,096, or .024. On this calculation the premisses of the argument make it highly probable (.976) that the 50th ball is not blue. The Thomas-Noll Dispute: Some Lessons about Induction 207 Of course, we are not obliged to give each structural description the same logically prior probability as any other one. Carnap' s attempt (196211950) to base an "inductive logic" on this sort of probability foundered on the inability to single out a unique measure function which would assign absolute prior probabilities to each set of structurally isomorphic state descriptions (Carnap and Jeffrey 1971, Jeffrey 1980). We might, for some reason, suppose that the absolute prior prob- ability of an n-coloured ball decreases as n increases. Thus the logical approach does not give us a definite answer to our problem. 4. Some hypothetical answers We can however use equations 4 and 5 above to draw some hypothetical conclu- sions about the posterior probability of the 50 blue balls hypothesis. First, the assertion by Stephen Thomas that this probability is "well in excess of 80%" is logically equivalent to the claim that the ratio r is well in excess of .08.5 In other words, his claim assumes that, before any balls are drawn from the urn, the hypothesis that all 50 are blue is at least one-tenth as probable as the hypothesis that 49 are blue and one is non-blue. 6 This seems like a rather conservative as- sumption, but nothing in the information Thomas provides entitles him to make it. Second, Noll's claim that "we can, without violating any mathematical law, assign that proposition [sc. the 50 blue balls hypothesis-DH] any probability we like" assumes that no mathematical law prevents us from assuming any ratio we like of the prior probabilities, or in the limiting cases setting one or the other of them at O. Ifwe assign the 50 blue balls hypothesis a posterior probability of 0, we are assuming that our background information rules out that hypothesis from the very beginning, even before any balls are drawn from the urn; in other words, r = O. For example, we may know in advance that the "balls" are a set of Moslem prayer beads, which (I am told) always have one bead of a different colour than the others, symbolizing that nothing on earth is perfect. 7 If at the opposite extreme we assign the 50 blue balls hypothesis a posterior probability of 1, we are assum- ing that our background information rules out the 49 blue balls hypothesis from the very beginning; in other words rl = O. For example, we may know in advance that the balls are all of the same colour, a specific version of a "uniformity of nature" assumption. Notice, however, that we can calculate a value for the poste- rior probability without making any general assumption that nature is uniform, contrary to the claims of some philosophers unduly influenced by Hume. To as- sign some intermediate value for the posterior probability of the 50 blue balls hypothesis, we simply calculate the required assumed ratio ofthe prior probabili- ties; a posterior probability of.5, for example, requires a ratio of the prior prob- abilities of .02; in other words, in advance of drawing any balls from the urn, the prior probability that 49 are blue is 50 times the prior probability that all 50 are blue. We might suppose, for example, that the "balls" are a set of Moslem prayer beads dawn at random from the assembly line of a factory which by mistake produces a 208 David Hitchcock set of 50 beads of the same colour once for every 50 times it correctly produces a set with 49 beads of one colour and the 50th of a different colour. By varying the defect rate in this imaginary factory, we can produce any posterior probability we like. None of these scenarios appears to involve any violation of a mathematical law. Thus, as far as one can see, Nolt's claim is correct. In the absence of specific information and in default of a well-grounded induc- tive logic, we can at best make qualitative judgements about the prior degree of confidence it is rational to place in each of the subsequently unrefuted hypotheses. It is not rational to be absolutely certain in advance that all 50 balls are blue, or absolutely certain in advance that not all 50 balls are blue; the background informa- tion and new evidence are consistent both with this hypothesis and with its nega- tion, and it is not rational to place absolute confidence in a proposition which later turns out to be false, as either of these might. For the same reason, it is not rational to be absolutely certain in advance that 49 balls are blue, or that the number of blue balls is not 49. That is, it would not be rational, given the information supplied and the background information available to us, to assume that either relevant prior probability was I, or to assume that either relevant prior probability was O. Beyond that, it seems difficult to go. How do we express the fact that our background information, before drawing any balls from the urn, neither makes it absolutely certain that 50 balls are blue, nor makes it absolutely certain that not all 50 balls are blue? Such vague qualitative judgements can be expressed by indicating that the rational degree of confidence lies in a range. If we letj be p(50 blue ballsIK), the prior rational degree of confidence in the 50 blue balls hypothesis, then we can say that 0