C:/Documents and Settings/Profesor/Escritorio/Cubo/Cubo/Cubo/Cubo 2011/Cubo2011-13-01/Vol13 n\2721/Art N\2607 .dvi CUBO A Mathematical Journal Vol.13, No¯ 01, (103–123). March 2011 Engineering Design under Imprecise Probabilities: Computational Complexity Vladik Kreinovich Department of Computer Science, University of Texas at El Paso, 500 W. University, El Paso, Texas 79968, USA. email: vladik@utep.edu ABSTRACT In engineering design problems, we want to make sure that a certain quantity c of the designed system lies within given bounds – or at least that the probability of this quantity to be outside these bounds does not exceed a given threshold. We may have several such requirements – thus the requirement can be formulated as bounds [F c(x), F c(x)] on the cumulative distribution function Fc(x) of the quantity c; such bounds are known as a p-box. The value of the desired quantity c depends on the design parameters a and the pa- rameters b characterizing the environment: c = f (a, b). To achieve the design goal, we need to find the design parameters a for which the distribution Fc(x) for c = f (a, b) is within the given bounds for all possible values of the environmental variables b. The problem of computing such a is called backcalculation. For b, we also have ranges with different probabilities – i.e., also a p-box. Thus, we have backcalculation problem for p-boxes. For p-boxes, there exist efficient algorithms for finding a design a that satisfies the given constraints. The next natural question is to find a design that satisfies additional 104 Vladik Kreinovich CUBO 13, 1 (2011) constraints: on the cost, on the efficiency, etc. In this paper, we prove that that in general, the problem of finding such a design is computationally difficult (NP-hard). We show that this problem is NP-hard already in the simplest possible linearized case, when the dependence c = f (a, b) is linear. We also provide an example when an efficient algorithm is possible. RESUMEN En los problemas de diseño de ingenieŕıa, en los que quiere asegurarse de que una deter- minada cantidad c del sistema se encuentra dentro de los ĺımites dados – o por lo menos que la probabilidad de esa cantidad fuera de estos ĺımites no superen un determinado umbral. Es posible que haya varios requisitos – la exigencia puede formularse como ĺımites [F c(x), F c(x)] en la función de distribución acumulada Fc(x) de la cantidad de c, esos ĺımites son conocidos como p-caja. El valor de la cantidad deseada c depende de los parámetros de diseño a y los parámetros b caracterizar el medio ambiente: c = f (a, b). Para lograr el objetivo de diseño, tenemos que encontrar los parámetros de diseño a para que la distribución de Fc(x) para c = f (a, b) esté dentro de los ĺımites dados por todos los valores posibles de las variables ambientales b. El problema de la informática se llama a retrocálculo. Por b, también tienen rangos con diferentes probabilidades, es decir, también una p-box. Por lo tanto, tenemos un problema de retrocálculo para p-cajas. Para p-cajas, existen algoritmos eficientes para encontrar un diseño a que satisface las restricciones dadas. La pregunta lógica es encontrar un diseño que satisfaga re- stricciones adicionales: el coste, la eficiencia, etc. En este trabajo, demostramos, en general, el problema de encontrar un diseño que es computacionalmente dif́ıcil (NP- hard). Se demuestra que este problema es NP-hard ya en el caso lineal más simple posible, cuando la dependencia c = f (a, b) es lineal. También ofrecemos un ejemplo, cuando un algoritmo eficiente es posible. Keywords: Engineering design, imprecise probability, computational complexity, p-boxes, NP- hard AMS Subject Classification: 65K10, 65J99, 49M15, 49J53, 47J20, 47H04, 90C30, 90C33. 1 Engineering Design Problems and the Notion of Backcal- culation: Deterministic Case One of the main objective of engineering design is to guarantee that the value of a certain quantity (or several quantities) c is within a given range [c, c]. For example, when we design a car engine, CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 105 we must make sure: • that its power is at least as much as needed for the loaded car to climb the steepest mountain roads, • that the concentration of undesirable substances in the exhaust does not exceed the required threshold, etc. The value of the quantity c usually depends on the parameters a describing the design and on the parameters b of the environment: c = f (a, b). For example, the concentration of a substance in a car exhaust depends: • on the parameter(s) that describe the design of the car exhaust filters, and • on the concentration of the chemicals in the original fuel. We need to select a design a in such a way that c = f (a, b) ∈ [c, c] for all possible values of the environmental parameter(s) b. In this paper, we consider the simplest case when: • the design of each system is characterized by a single parameter a, and • the environment is also characterized by a single parameter b. We will show that already in this simple case, the design problem is, in general, computationally difficult (NP-hard). We also show that in some cases, a feasible algorithm is possible. (Some of our results first appeared in a conference paper [5].) To be able to find a design that satisfies the given constraint on c for all possible values of the environmental parameter b, we need to know which values of b are possible, i.e., we need to know the range [b, b] of possible values of b. Thus, we arrive at the following problem: • we know the desired range [c, c]; • we know the dependence c = f (a, b); • we know the range [b, b] of possible values of b; • we want to describe the set of all values of a for which f (a, b) ∈ [c, c] for all b ∈ [b, b]. This design-related problem is sometimes called a backcalculation problem, to emphasize its differ- ence from the forward calculation problem, when • we are given a design a and • we want to estimate the value of the desired characteristic c = f (a, b). 106 Vladik Kreinovich CUBO 13, 1 (2011) 2 Linearized Problem In many engineering situations, the intervals of possible values of a and b are reasonably narrow: a ≈ ã and b ≈ b̃ for some ã and b̃. In such situations, we can expand the dependence c = f (a, b) in Taylor series and ignore terms which are quadratic and higher order in terms of ∆a def = a − ã and ∆b def = b − b̃. As a result, we get a simple linear dependence c ≈ c0 + ka · a + kb · b. (2.1) We can simplify this expression even further if we take into account that the numerical value of each of the quantities a and b depends on the choice of the starting point and on the choice of a measuring unit. If we change the starting point and the measuring unit, then the new numerical value can be obtained from the original one by an appropriate linear transformation. For example, if we know the temperature tC in Celsius, then we can compute the temperature tF in the Fahrenheit scale as tF = 32 + 1.8 · tC . We can use this possibility to simplify the above expression for c. Specifically, we can change the starting points and the measuring units in such a way that: • the new numerical value for a is described by the linear expression c0 + ka · a, and • the new numerical value for b is described by the linear expression kb · b. In these new scales, the dependence of c on a and b takes the simplest form c = a + b. We will show that the design problem becomes computationally difficult (NP-hard) already for this simplest case. 3 From Guaranteed Bounds to p-Boxes Ideally, it is desirable to provide a 100% guarantee that the quantity c never exceeds the threshold c. In practice, however, too many unpredictable factors affect the performance of a system and thus, such a guarantee is not realistically possible. What we can realistically guarantee is that the probability of exceeding c is small enough. In other words, we set some threshold εc > 0 and we require that Prob(c ≤ c) ≥ 1 − εc. In addition to this requirement, we can also require that the excess of c over c be not too large. This can be done, e.g., by requiring that for some value c1 > c, the probability Prob(c ≤ c1) is bounded from below by the value 1−ε1 for some smaller ε1 < ε. We can several such requirements for different values ci and εi. Similarly, instead of the idealized exact inequality c ≥ c, in practice, we can only require that Prob(c ≥ c) ≤ δ for some small probability δ > 0. CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 107 From the mathematical viewpoint, all such constraints are lower or upper bounds on the values of the cumulative distribution function Fc(x) def = Prob(c ≤ x). By combining the bounds corresponding to all the constraints, we can thus conclude that the cdf Fc(x) must satisfy, for every x, the inequalities F c(x) ≤ Fc(x) ≤ F c(x), (3.1) where F c(x) is the largest of all the lower bounds on Fc(x) and F c(x) is the smallest of all the upper bounds on Fc(x). In other words, for every x, the corresponding value Fc(x) must belong to the interval [F c(x), F c(x)]. This x-dependent interval is known as a probability box, or a p-box, for short; see, e.g., [2]. Similarly, for the environmental parameter b, we rarely know guaranteed bounds b and b. At best, we know that for a given bound b, the probability of exceeding this bound is small, i.e., that Prob(b ≤ b) ≥ 1 − εb for some small εb. So here too, instead of a single bound, in effect, we have a p-box [F b(x), F b(x)]. 4 Towards Formulating the Design (Backcalculation) Prob- lem for p-Boxes In the deterministic approach to design, we assume that we can manufacture an object with the exact value a of the corresponding parameter – or at least the value which is guaranteed to be within the given bounds [a, a]. In manufacturing, however, it is not practically possible to always guarantee that the value a is within the given interval. At best, we can guarantee that, e.g., the probability of a ≤ a is greater than or equal to 1 − εa for some small value εa. In other words, the design restriction on a can also be formulated in terms of p-boxes. Thus, we arrive at the following problem. 5 Backcalculation Problem for p-Boxes We are given: • the desired p-box [F c(x), F c(x)] for c; • the dependence c = f (a, b); and • the p-box [F b(x), F b(x)] describing b. 108 Vladik Kreinovich CUBO 13, 1 (2011) Our objective is to find a p-box [F a(x), F a(x)] for which: • for every probability distribution Fa(x) ∈ [F a(x), F a(x)], • for every probability distribution Fb(x) ∈ [F b(x), F b(x)], and • for all possible correlations between a and b, the distribution of c = f (a, b) is within the given p-box [F c(x), F c(x)]. 6 Reminder: Forward Calculation for p-Boxes In order to analyze the backcalculation problem for p-boxes, let us first describe how the corre- sponding forward calculation problem is solved. Let us assume that we know: • the p-box [F a(x), F a(x)] for a; and • the p-box [F b(x), F b(x)] describing b. The objective of forward calculation is to find the range [F c(x), F c(x)] of possible values of Fc(x) for c = f (a, b) for all possible distributions Fa(x) ∈ [F a(x), F a(x)] and Fb(x) ∈ [F b(x), F b(x)] and all possible correlations between a and b. It turns out that these calculations are best done in terms not of the original cdfs and p-boxes, but rather in terms of their inverses – quantile functions. For a cdf Fa(x), quantiles a0, . . . , an are described as values for which Fa(ai) = i n . Since the cdf is monotonic, the quantiles are also monotonic: if i < j, then ai ≤ aj . When instead of the exact cdf, we only know a p-box [F a(x), F a(x)], then instead of the exact quantiles ai we only know interval bounds [ai, ai] for these quantiles: • the lower bounds ai are quantiles of the function F c(x), and • the upper bounds ai are quantiles of the function F c(x). These bounds are also monotonic: if i < j, then ai ≤ aj and ai ≤ aj . For the function c = f (a, b) = a + b, forward calculation can be easily described in terms of the quantile bounds: once we know the bounds [ai, ai] and [bi, bi] corresponding to a and b, we can compute the quantile bounds for c as follows: ci = max j (aj + bi−j ); (6.1) ci = min j (aj−i + bn−j ). (6.2) These formulas were first described in [8]. CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 109 7 Formulation of the Problem In terms of quantile bounds, the backcalculation problem takes the following form: • we know the quantile intervals [bi, bi] corresponding to the environmental variable b; • we are given the intervals [̃ci, c̃i] that should contain the quantiles for c = a + b; • we must find the bounds ai and ai for which, for the values ci and ci are determined by the formulas (6.1) and (6.2), we have [ci, ci] ⊆ [̃ci, c̃i]. 8 In Effect, We Have Two Different Problems: Finding ai and Finding ai An important consequence of the formulas (6.1) and (6.2) is that: • the lower bounds ci for c are determined only by the lower bounds ai and bi for a and b, and • the upper bounds ci for c are determined only by the upper bounds ai and bi for a and b. Thus, in effect, we can formulate the problem of finding the values ai and the problem of finding the values ai as two separate yet similar problems. Without losing generality, in the following text, we will only consider the following problem of finding ai: • we know the values bi; • we are given the values c̃i; • we must find the values a0 ≤ . . . ≤ an for which c̃i ≤ max j (aj + bi−j ). (8.1) 9 A Simple Example Before we start describing how to solve the general problem, let us first describe a simple example. In the general p-box case, to describe the uncertainty of a variable x, we use n + 1 quantile intervals [xi, xi] for i = 0, 1, . . . , n. P-box uncertainty is a generalization of the case of interval uncertainty, in which the uncer- tainty in each variable x is characterized by a single interval [x, x]. This case corresponds to n = 0. 110 Vladik Kreinovich CUBO 13, 1 (2011) In this case, the inequality for a0 takes the form c̃0 ≤ b0 + a0 or, equivalently, c̃0 − b0 ≤ a0 – the same form as for interval uncertainty. We are looking for the simplest possible example which is different from interval uncertainty. After a single interval, the next simplest example is when we use two intervals, i.e., when n = 1. In this case, we need to find two values a0 ≤ a1 that satisfy the following inequalities: c̃0 ≤ b0 + a0; (9.1) c̃1 ≤ max(b0 + a1, b1 + a0). (9.2) Depending on which term in the max expression is the largest, we have two cases: b0 + a1 ≤ b1 + a0 and b0 + a1 ≥ b1 + a0. In the first case, the following inequalities must be satisfied: c̃0 ≤ b0 + a0; b0 + a1 ≤ b1 + a0; (9.3) c̃1 ≤ b1 + a0. The first of these three inequalities from (9.3) is equivalent to c̃0 − b0 ≤ a0. (9.4) Similarly, the third inequality from (9.3) is equivalent to c̃1 − b1 ≤ a0. (9.5) Thus, these two inequalities are equivalent to max(̃c0 − b0, c̃1 − b1) ≤ a0. (9.6) The second inequality from (9.3) is equivalent to a1 − a0 ≤ b1 − b0. (9.7) Thus, in the first case, the values a0 and a1 must satisfy the following two inequalities: a0 ≥ max(̃c0 − b0, c̃1 − b1); 0 ≤ a1 − a0 ≤ b1 − b0. (9.8) Graphically, the values that satisfy these inequalities fill the following area: CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 111 - a0 6 a1 � � � � � � � � � � � � � � � � v0 v0 v1 where v0 def = max(̃c0 − b0, c̃1 − b1) and v1 def = v0 + b1 − b0. In the second case, the following inequalities must be satisfied: c̃0 ≤ b0 + a0; b0 + a1 ≥ b1 + a0; (9.9) c̃1 ≤ b0 + a1. By moving the unknowns to one side and all the other terms to the other side, we conclude that we must satisfy the following inequalities: a0 ≥ c̃0 − b0; a1 − a0 ≥ b1 − b0; (9.10) a1 ≥ c̃1 − c̃0. (Since b1 − b0 ≥ 0, the second inequality automatically implies that a1 ≥ a0.) Graphically, we have the following representation: 112 Vladik Kreinovich CUBO 13, 1 (2011) - a0 6 a1 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � �� � � � � � � �� � � � � ��� ���� w0 w1 w2 where w0 def = c̃0 − b0, w1 def = w0 + (b1 − b0), and w2 def = c̃1 − c̃0. Overall, the set of all possible design values is the union of these two sets. Let us illustrate this union on a simple numerical example when b0 = c̃0 = 0, b1 = 1, and c̃1 = 2. In this case, the first case corresponds to the following inequalities a0 ≥ 1; 0 ≤ a1 − a0 ≤ 1. (9.11) and the second case leads to the following inequalities: a0 ≥ 0; a1 − a0 ≥ 1; (9.12) a1 ≥ 2. Thus, the union of the two corresponding sets has the following form: - a0 6 a1 � � � � � � � � � � � � � � �� � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � �� � � � � � � �� � � � � ��� ���� 1 1 2 CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 113 10 A Designed System Usually Consists of Several Subsys- tems A designed system usually consists of several subsystems. So, instead of selecting a single p-box for a single design parameter a, we need to design p-boxes corresponding to all these subsystems. Let S denote the number of these subsystems, and let a (s) i , b (s) i , and c̃ (s) i denote quantile bounds corresponding to the s-th subsystem, s = 1, 2, . . . , S. Thus, we arrive at the following problem: • we know the values b (s) i ; • we are given the values c̃ (s) i ; • we must find, for each s = 1, . . . , S, the values a (s) 0 ≤ . . . ≤ a (s) n for which c̃ (s) i ≤ max j (a (s) j + b (s) i−j ). (10.1) 11 There Exist Effective Algorithms for Backcalculation For p-boxes, there are efficient algorithms for solving the backcalculation problem; see, e.g., [1, 3, 4, 7]. 12 Need for Additional Cost Constraints In general, as we can see, the backcalculation problem has many possible solutions. Some design solutions require less efforts, some require more efforts. It is therefore desirable not just to find a solution, but rather to find a solution which satisfies given constraints on the manufacturing efforts such as cost, energy expenses, etc. The values a (s) i are the lower bounds on the design parameters. The smaller the lower bounds, the easier it is to maintain them. Thus, the cost of maintaining a lower bound increases with the value a (s) i . In this paper, we show that the problem is NP-hard even for the simplest case when the corre- sponding effort is simply proportional to the value a (s) i , and thus, the overall effort of maintaining all the characteristics a (s) i is equal to the weighted linear combination E = S∑ s=1 n∑ i=0 w (s) i · a (s) i . (12.1) 114 Vladik Kreinovich CUBO 13, 1 (2011) The corresponding constraint is that this effort should not exceed a given value e. As we have mentioned, we may have several (C) constraints corresponding to different type of effort – cost, energy consumption, etc. Thus, in general, the constrained backcalculation problem takes the following form. 13 Formulation of the Problem in Precise Mathematical Terms We are given: • positive integers n, S, and C; • the values b (s) i corresponding to different s = 1, . . . , S and i = 0, . . . , n; • the values c̃ (s) i corresponding to different s = 1, . . . , S and i = 0, . . . , n; • the values ec corresponding to different c = 1, . . . , C, and • the values w (s) c,i corresponding to different s = 1, . . . , S, c = 1, . . . , C, and i = 0, . . . , n. We must find for each s = 1, . . . , S, the values a (s) 0 ≤ . . . ≤ a (s) n for which the following two sets of inequalities are satisfied: c̃ (s) i ≤ max j (a (s) j + b (s) i−j ); (13.1) S∑ s=1 n∑ i=0 w (s) c,i · a (s) i ≤ ec. (13.2) 14 Our Main Result Our main result is that the above problem is NP-hard. 15 Proof Main idea. Formally, NP-hard means that an arbitrary problem from a certain class NP can be reduced to this problem; see, e.g., [6]. Thus, to prove that a problem is NP-hard, it is sufficient to prove that a known NP-hard problem can be reduced to it. Indeed, CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 115 • by definition of NP-hardness, every problem with the class NP can be reduced to the known NP-hard problem; • since this known problem can be reduced to our problem, • we can therefore conclude that every problem form the class NP can be reduced to our problem; • in other words, we can conclude that our problem is NP-hard. In our proof, as such a known NP-hard problem, we take the knapsack problem; see, e.g., [6]. In this problem, we are given a set of S objects, for each of which we know its volume vs > 0 and its price ps > 0. We also know the total volume V of a knapsack and the threshold price P . Within the restriction on the volume, we must select some of the S objects in such a way that the total price of all the selected objects is at least P . To describe this problem in precise terms, for each object i, we define a new variable xs such that xs = 1 if the s-th object is taken and xs = 0 if the s-th object is not taken. In terms of these new variables, the overall volume of all the selected objects is equal to S∑ s=1 vs · xs, and the overall price of all the selected objects is equal to S∑ s=1 ps · xs. Thus, the knapsack problem takes the following form: find the values xs ∈ {0, 1} for which S∑ s=1 vs · xs ≤ V (15.1) and S∑ s=1 ps · xs ≥ P. (15.2) We will prove that this problem can be reduced to the above backcalculation problem, i.e., that for each instance v1, . . . , vS , p1, . . . , pS , V, P of the knapsack problem there is an instance of the backcalculation problem whose solution can effectively lead to the solution of the original knapsack problem. Towards reduction: selection of p-boxes. In our reduction, we will use the same pair of p-boxes b, c for all S subsystems, the only difference will be in the weights. Let us denote the common value of b (s) i for all s = 1, . . . , S by bi and the common value of c̃ (s) i by ci. In these notations, the constrains on the unknowns a (s) i take a simplified form ci ≤ max j (a (s) j + bi−j ). (15.3) In our reduction, we take n = 1. For n = 1, for every s, the inequalities (15.3) lead to the following constraints on the corresponding two unknown a (s) 0 ≤ a (s) 1 : c0 ≤ a (s) 0 + b0; (15.4) 116 Vladik Kreinovich CUBO 13, 1 (2011) c1 ≤ max(a (s) 0 + b1, a (s) 1 + b0). (15.5) Specifically, we take b0 = c0 = 0, b1 = 1, and c1 = 2. For these values, the above inequalities take the following form: a (s) 0 ≥ 0; (15.6) 2 ≤ max(a (s) 0 + 1, a (s) 1 ). (15.7) The largest of the two values is greater than or equal to 2 if and only if (at least) one of these two values is greater than or equal to 2. Thus, the second constraint (15.7) means that: • either 2 ≤ a (s) 0 + 1 and thus a (s) 0 ≥ 1 • or a (s) 1 ≥ 2. Analysis of the selected p-boxes. Let us show that out of all possible solutions a (s) 0 ≤ a (s) 1 satisfying these two inequalities, only two solutions satisfy the additional constraint a (s) 0 + a (s) 1 ≤ 2: • a (s) 0 = a (s) 1 = 1 and • a (s) 0 = 0 and a (s) 1 = 2. Indeed, we know that for each solution, we have: • either a (s) 0 ≥ 1 • or a (s) 1 ≥ 2. In the first case a (s) 0 ≥ 1, we have a (s) 0 + a (s) 1 = 2a (s) 0 + (a (s) 1 − a (s) 0 ). (15.8) The first term in the right-hand side is ≥ 2, the second term is always non-negative – since a (s) 1 ≥ a (s). Thus, the only possibility for the right-hand side sum to be ≤ 2 is when the value 2a (s) 0 is exactly 2, and the difference a (s) 1 − a (s) 0 is exactly 0. In this case, we have a (s) 0 = a (s) 1 = 1. In the second case a (s) 1 ≥ 2, due to a (s) 0 ≥ 0 (condition (15.6)), the only possibility for the sum a (s) 0 + a (s) 1 to be ≤ 2 is when a (s) 0 = 0 and a (s) 1 = 2. Reduction and the final part of the proof. We will reduce the given instance of the knapsack problem to the following system with 3 constraints: • In the first constraint, we take w (s) 1,i = 1 for all s and i, and we take e1 = 2 · S. • In the second constraint, we take w (s) 2,0 = vs, w (s) 2,1 = 0, and e2 = V . CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 117 • In the third constraint, we take w (s) 2,0 = 0, w (s) 2,1 = ps, and e3 = P − 2 · S∑ s=1 ps. (15.9) Let us first analyze the first constraint S∑ s=1 (a (s) 0 + a (s) 1 ) ≤ 2 · S. (15.10) As we have shown in the previous section, each simple sum a (s) 0 + a (s) 1 is at least 2, and it is only equal to 2 when either a(s) = 0 or a (s) 0 = 1. Thus, the only possibility for the sum S∑ s=1 (a (s) 0 + a (s) 1 ) of S such simple sums to not exceed 2S is when each of these sums is equal to exactly 2, i.e., when for every s, • either either a(s) = 0 or a (s) 0 = 1, and • a (s) 1 = 2 − a (s) 0 . For these values a (s) i , the second constraint takes the form S∑ s=1 vs · a (s) 0 ≤ V, (15.11) and the third constraint takes the form S∑ s=1 ps · a (s) 1 ≤ 2 · S∑ s=1 ps − P. (15.12) Substituting the expression a (s) 1 = 2 − a (s) 0 into this inequality, we get S∑ s=1 ps · (2 − a (s) 0 ) ≤ 2 · S∑ s=1 ps − P, (15.13) i.e., equivalently, 2 S∑ s=1 ps − S∑ s=1 ps · a (s) 0 ≤ 2 S∑ s=1 ps − P, (15.14) which, in its turn, is equivalent to S∑ s=1 ps · a (s) 0 ≥ P, (15.15) Thus, for every solution to the constrained backcalculation problem, the values xs def = a (s) 0 form a solution to the knapsack problem: each of them is equal to 0 or 1, and they satisfy the corresponding inequalities (15.11) and (15.15). 118 Vladik Kreinovich CUBO 13, 1 (2011) Vice versa, one can easily check that if the values xs form a solution to the knapsack problem, then the corresponding values a (s) 0 = xs and a (s) 1 = 2 − xs form a solution to the constrained backcalculation problem. The reduction is proven, so the backcalculation problem is indeed, in general, NP-hard. 16 Situation in Which a Feasible Algorithm Is Possible Let us describe an example of an optimization problem in which a feasible algorithm is possible. Optimal backcalculation: towards the formulation of the problem in precise terms. In general, there are many combinations of the values ai that satisfy the desired constraints. Which combination should we choose? In the design case, a natural requirement is to select the values ai which will be the easiest to implement. To describe this idea in precise terms, we must analyze how easy is it to implement different values of ai. In general, the value ai is the value for which F (ai) = i n , i.e., we which we must guarantee that Prob(a ≤ ai) = F (ai) ≤ F (ai) = i n . (16.1) In particular: • once we select the value a0, we must guarantee that the probability Prob(a ≤ a0) of the actual value a being below a0 is 0: Prob(a ≤ a0) = 0; • once we select a1, we must guarantee that the probability Prob(a ≤ a1) of the actual value a being below a1 does not exceed 1/n: Prob(a ≤ a1) ≤ 1/n; • once we select a2, we must guarantee that the probability Prob(a ≤ a2) of the actual value a being below a2 does not exceed 2/n: Prob(a ≤ a2) ≤ 2/n; • etc. When we motivated the need to take into account partial information about the probabilities, we have mentioned that the most difficult task is to guarantee that the actual a never gets below a CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 119 threshold. The corresponding restriction is related to the value a0: we must guarantee that a ≥ a0. The smaller this value a0, the weaker this constraint and thus, the easiest to satisfy. Thus, it is reasonable to select the smallest possible value a0. Once this value is selected and the corresponding bound is guaranteed, we must guarantee that the inequalities a ≥ ai be guaranteed with the probabilities ≥ 1 − i/n. The closer this probability to 1, the more stringent is the corresponding requirement, and thus, the more difficult the corresponding task. So, after we have selected a0, the most difficult of the remaining tasks is to select the value a1, the value for which the guaranteed probability of violating the restriction a ≥ a1 is the smallest (probability = 1/n). Thus, to make the design as easy to implement as possible, we should make this restriction the least difficult to implement – i.e., we should select the value a1 as small as possible. Once we have fixed a0 and a1, the most difficult of the remaining tasks is to to guarantee that a ≥ a2 with probability ≥ 1 − (2/n). Thus, we must select the corresponding threshold a2 to be as small as possible. As a result, we arrive at the following formulation of the backcalculation problem. Formulation of the optimal backcalculation problem in precise mathematical terms. Out of all the tuples a0 ≤ . . . ≤ an that satisfy the inequalities (8.1), • we first select all the tuples for which the value a0 is the smallest possible; • out of the selected tuples, we select all the tuples for which the value a1 is the smallest possible; • then, out of the newly selected tuples, we select those for which the value a2 is the smallest possible; • etc. In mathematical terms, we can say that a tuple a = (a0, . . . , an) is better than a tuple a ′ = (a′0, . . . , a ′ n) if one of the following conditions hold: • either a0 < a ′ 0; • or a0 = a ′ 0 and a1 < a ′ 1; • or a0 = a ′ 0, a1 = a ′ 1, and a2 < a ′ 2; • . . . • or a0 = a ′ 0, . . . , ai−1 = a ′ i−1, and ai < a ′ i; • . . . 120 Vladik Kreinovich CUBO 13, 1 (2011) • or a0 = a ′ 0, . . . , an−1 = a ′ n−1, and an < a ′ n. In computer science, this relation is known as a lexicographic (alphabetic) order, since this is exactly how words are placed in a dictionary or in a lexicon: a word w = ℓ0ℓ1 . . . consisting of the letters ℓ0, ℓ1, . . . , is placed before a word w ′ = ℓ′0ℓ ′ 1 . . . consisting of the letters ℓ ′ 0, ℓ ′ 1, . . . if one of the following conditions hold: • either the letter ℓ0 precedes the letter ℓ ′ 0 (e.g., ∅ apple goes before ∅ zebra); • or ℓ0 = ℓ ′ 0 and the next letter ℓ1 precedes the corresponding letter ℓ ′ 1 (e.g., ∅ abuse goes before ∅ alpha); • . . . • or ℓ0 = ℓ ′ 0, . . . , ℓi−1 = ℓ ′ i−1, and the next letter ℓi precedes the corresponding letter ℓ ′ i; • . . . In these terms, we must select the tuple which is the smallest in the lexicographic order. Towards a solution to the optimal backcalculation problem. To find the optimal solution, let us start with the value a0. For i = 0, the condition (8.1) becomes a0 + b0 ≤ c̃0, i.e., equivalently, a0 ≥ c̃0 −b0. The smallest real number that satisfies this inequality is the value a0 = c̃0 −b0. Thus, we should take a0 = c̃0 − b0. (16.2) Let us now assume that we have already selected the values a0, . . . , ai−1, and that we are now selecting the value ai. The i-th condition (8.1) has the form c̃i ≤ max j (bi−j + aj ) = max [ max j≤i−1 (bi−j + aj ), b0 + ai ] . (16.3) If c̃i ≤ max j≤i−1 (bi−j + aj ), (16.4) then the condition (16.3) is already satisfied. In this case, the only restriction on ai is that ai ≥ ai−1; thus, the smallest possible value of ai is ai = ai−1. If the inequality (16.4) is not satisfied, then to satisfy (16.3), we must satisfy the inequality c̃i ≤ b0 + ai, i.e., equivalently, ai ≥ c̃i − b0. Thus, the smallest possible value here is ai = c̃i − b0. So, we arrive at the following algorithm: CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 121 Algorithm for optimal backcalculation. ([1, 3, 4, 7]) We know: • the interval quantiles [bi, bi] for the environmental variable b; and • the interval quantiles [̃ci, c̃i] for c. We want to find the lexicographically optimal interval quantiles [ai, ai] for a for which c satisfies the given constraints. In this algorithm, we compute the values a0, a1, . . . one by one. • First, we compute a0 = c̃0 − b0. • Once the values a0, . . . , ai−1 are computed, we check whether c̃i ≤ max j≤i−1 (bi−j + aj ). (16.5) If this inequality is satisfied, we take ai = ai−1; otherwise, we take ai = c̃i − b0. Comment. In the derivation of the algorithm, we, in fact, proved that the result a of applying this algorithm always satisfies the inequalities (8.1): indeed, we have selected each value ai in such a way that the i-th inequality (8.1) is satisfied. We have also shown that no tuple satisfying (8.1) can be (lexicographically) better than the result a of using this algorithm – and thus, this result a is indeed optimal. Algorithm illustrated on the above simple example. In the above example when b0 = c̃0 = 0, b1 = 1, and c̃1 = 2, we do the following: • First, we compute a0 = c̃0 − b0 = 0 − 0 = 0. • Next, we check whether c̃1 ≤ max j≤0 (bi−j + aj ) = b1 + a0. (16.6) Here, we have 2 = c̃1 6≤ b1 + a0 = 1 + 0 = 1, so we take a1 = c̃1 − b0 = 2 − 0 = 2. One can easily check that the resulting tuple (a0, a1) = (0, 2) is indeed lexicographically optimal: • it has the smallest possible value of a0 = 0, and • out of all the tuples with this value of a0, it has the smallest possible value of a1. 122 Vladik Kreinovich CUBO 13, 1 (2011) - a0 6 a1 � � � � � � � � � � � � � � �� � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � �� � � � � � � �� � � � � ��� ���� 1 1 2 t Acknowledgments This work was supported in part by NSF grant HRD-0734825 and by Grant 1 T36 GM078000-01 from the National Institutes of Health. Received: June 2009. Revised: November 2009. References [1] S. Ferson, Using approximate deconvolution to estimate cleanup targets in probabilistic risk analyses, In: P. T. Kostecki, E. J. Calabrese, and M. Bonazountas (eds.), Hydrocarbon Contaminated Soils, Amherst Scientific Publishers, Amherst, Massachusetts, 1995, pp. 245– 254. [2] S. Ferson. RAMAS Risk Calc 4.0. CRC Press, Boca Raton, Florida, 2002. [3] S. Ferson, V. Kreinovich, and W. T. Tucker, Untangling equations involving uncer- tainty: deconvolutions, updates, and backcalculations, Proceedings of the NSF Workshop on Reliable Engineering Computing, Savannah, Georgia, September 15–17, 2004. [4] S. Ferson and T. F. Long, Deconvolution can reduce uncertainty in risk analyses, In: M. Newman and C. Strojan (eds.), Risk Assessment: Measurement and Logic, Ann Arbor Press, Ann Arebor, Michigan, 1997. [5] V. Kreinovich, Expert knowledge is needed for design under uncertainty: for p-boxes, back- calculation is, in general, NP-hard, Proceedings of the 2009 Annual Conference of the North American Fuzzy Information Processing Society NAFIPS’09, Cincinnati, Ohio, June 14–17, 2009. [6] C. H. Papadimitriou, Computational Complexity, Addison Wesley, 1993. CUBO 13, 1 (2011) Engineering Design under Imprecise Probabilities: Computational Complexity 123 [7] W. T. Tucker and S. Ferson, Setting cleanup targets in a probabilistic assessment, In: S. Mishra (ed.), Groundwater Quality Modeling and Management under Uncertainty, American Society of Civil Engineers, Reston, Virginia, 2003. [8] R. C. Williamson and T. Downs, Probabilistic arithmetic I: Numerical methods for calcu- lating convolutions and dependency bounds, International Journal of Approximate Reasoning, 1990, Vol. 4, pp. 89–158.