Ratio Mathematica ISSN: 1592-7415 Vol. 31, 2016, pp. 3--24 eISSN: 2282-8214 3 Information and Intertemporal Choices in Multi-Agent Decision Problems 1Mariagrazia Olivieri, 2Massimo Squillante, 3Viviana Ventre 1 DEMM, Università of Sannio, Benevento, Italy mgolivieri@unisannio.it 2 DEMM, Università of Sannio, Italy prorettore@unisannio.it 3 DEMM, Università of Sannio, Italy ventre@unisannio.it Received on: 15-12-2016. Accepted on: 15-01-2017. Published on: 28-02-2017 doi: 10.23755/rm.v31i0.316 © Olivieri et al. Abstract Psychological evidences of impulsivity and false consensus effect lead results far from rationality. It is shown that impulsivity modifies the discount function of each individual, and false consensus effect increases the degree of consensus in a multi-agent decision problem. Analyzing them together we note that in strategic interactions these two human factors involve choices which change equilibriums expected by rational individuals. Keywords: Consensus, Intertemporal choice, Decisions 2010 AMS subject classification: 90B50,91B06,91B08 M. Olivieri, M. Squillante, V. Ventre 4 1. Introduction In 1937, to compare future alternatives, Samuelson introduced the Discounted Utility Model (DU model), which assumes an exponential delay discount function, with a constant discount rate that implies dynamic consistency and stationary intertemporal preferences. Contrary to this normative economic theory, it has been established that human and animal intertemporal choice behaviors are not rational (i.e., inconsistent). For this reason, recent behavioral decision theory on intertemporal choice has adopted a hyperbolic discount model, in which result preference reversal as time passes (Takahashi, 2009) (Section 2). Neurobiological and psychological factors have determined individual differences in intertemporal choice and have been explored in recent neuroeconomic and econophysical studies. Takahashi (2007) attempts to dissociate impulsivity and inconsistency in their econophysical studies proposing the Q-exponential Delay Discount Function. Other behavioral economists propose Multiple Selves Models attempting to measure the strength of the internal conflict within the decision maker, best known as Quasi- hyperbolic discount model first introduced by Laibson (1997) (Section 3). Thaler and Shefrin (1981), in the field of Multiple Selves Models, consider that the concept of self-control is incorporated in a theory of individual intertemporal choice by modeling the individual as an organization. The individual is treated as if he contained two distinct psyches denoted as planner and doer. This model can be compared with the principal-agent problem present in any organization, so the individual may adopt many of the same strategies to solve self-control problems in intertemporal choice (Section 4). In a multi-agent decision context the objective for a group decision is to choose a common decision, among each choice, that is to say an alternative which is judged the best by the majority of the decision makers. So in most strategic decisions, it is important to be able to estimate the characteristics and behavior of others. If the characteristics of other players are unknown, estimating them is a critical task. Moreover, psychological evidence suggests people’s own beliefs, values, and habits tend to bias their perceptions of how widely they are shared (false consensus effect). This effect demonstrates an inability of individuals to process information rationally (Section 5). Therefore when we use the aggregation of the agents’ preferences to assess consensus, we obtain a coefficient which includes the false consensus effect that Information and Intertemporal Choices in Multi-Agent Decision Problems 5 depends on the subjectivity and also increases the degree of consensus. To eliminate this aspect of human judgment vagueness we can use a model defined by ordered weighted averaging (OWA) operators introduced in Yager (1988) (Section 6). Many decision problems are characterized by interplay between intertemporal considerations and strategic interactions. Two or more agents could have to take a common decision for a future time, in that process they are influenced by false consensus effect and by impulsivity that reveals inconsistency. Finally in order to consider intertemporal choices in a multi-agent decision process needs to study the problem of each agent and the influence of false consensus effect (Section 7). A strategic interaction is mathematical developed with the use of the theory of games, then it is possible to demonstrate the difference of psychological influence between a cooperative interaction (Section 8) and non-cooperative one (Section 9). 2. Intertemporal Discounting Standard discount model. The standard economic model of discounted utility (DU model) assumes that economic agents make intertemporal choices over consumption profiles (𝑐𝑡 , … , 𝑐𝑇 ) and such preferences can be represented by an intertemporal utility function 𝑈𝑡 (𝑐𝑡 , … , 𝑐𝑇 ), which can be described by the following form: 𝑈𝑡 (𝑐𝑡 , … , 𝑐𝑇 ) = ∑ 𝐷(𝑘)𝑢(𝑐𝑡+𝑘 𝑇−𝑡 𝑘=0 ) where 𝐷(𝑘) = ( 1 1+𝜌 ) 𝑘 So the DU model assumes an exponential temporal discounting function and a constant discount rate (𝜌). An important implication of these two features is that a person’s intertemporal preferences are time-consistent: if in period t a person prefers 𝑐2 at t+2 to 𝑐1 at t+1, then in period t+1 she must prefer 𝑐2 at t+2 to 𝑐1 instantly. However, several empirical studies, mainly arisen from the field of psychology, have documented various inadequacies of the DU model as a descriptive model of behavior. The first anomaly found to contradict discounted utility was that, instead of remaining constant over time, observed discount rates appear to decline with M. Olivieri, M. Squillante, V. Ventre 6 time, this reveal decreasing impatience, or hyperbolic discounting: a later outcome is discounted less per unit of time than an earlier one (delay effect). Furthermore, other anomalies derive from the fact that, even for a given delay, discount rates vary across different types of intertemporal choices: - larger outcomes are discounted at a lower rate than smaller outcomes (magnitude effect); - gains are discounted at a higher rate than losses of the same magnitude (sign effect); - increasing sequences of consumption are preferred over decreasing ones even if the total amount is the same (improving sequence effect). Hyperbolic discount model. A hyperbolic discount model can represent the tendency of the individuals to increasingly choose a smaller-sooner reward over a larger-later reward as the delay occurs sooner in time (delay effect). Many authors proposed different hyperbolic discount functions, in which δ (temporal discount function) increases with the delay to an outcome. In 1992 Loewenstein and Prelec proposed this form: 𝑑(𝑡) = ( 1 1 + 𝛼𝑡 ) 𝛽 𝛼⁄ where β > 0 is the degree of discounting and α > 0 is the departure from exponential discounting. A second type of empirical support for hyperbolic discounting comes from experiments on dynamic inconsistency. Several studies report systematic preference reversals between two rewards as the time-distance to these rewards diminishes. A hyperbolic discount model can demonstrate this; in fact, non- exponential time-preference curves can cross (Strotz, 1955/56) and consequently the preference for one future reward over another may change with time. Information and Intertemporal Choices in Multi-Agent Decision Problems 7 3. Neuroeconomics: two model to consider impulsivity and inconsistency in intertemporal choice Behavioral economist have found that there is a number of behavior patterns that violate the rational choice theory (Kahneman et al., 1982; Thaler, 1991); the most important is inconsistent preference, which represent behavior typically seen in psychiatric disorders (alcoholism, drug abuse), but also in more ordinary phenomena (overeating, credit card debt). Neuroeconomics has found that addicts are more myopic (have large time- discount rates) in comparison to non-addicted populations (Ainslie, 1975; Bickel, et al. 1999), so hyperbolic discounting may explain various human problematic behaviors (Laibson, 1997): loss of self-control, failure in planned abstinence from addictive drugs, etc. Recently, behavioral neuroeconomic and econophysical studies have proposed two discount models, in order to better describe the neural and behavioral correlates of impulsivity and inconsistency in intertemporal choice. Q-exponential discount model. Takahashi et al. (2007) have proposed and examined this function for subjective value V(D) of delayed reward: 𝑉(𝐷) = 𝐴 𝑒𝑥𝑝𝑞 (𝑘𝑞 𝐷) = 𝐴/[1 + (1 − 𝑞)𝑘𝑞 𝐷] 1 1−𝑞 where D denotes a delay until receipt of a reward, A the value of a reward at D = 0, and kq a parameter of impulsivity at delay D = 0 (q-exponential discount rate) and the q-exponential function is defined as: 𝑒𝑥𝑝𝑞 (𝑥) = (1 + (1 − 𝑞)) 1 1−𝑞 This function can distinctly parametrized impulsivity and inconsistency. If q < 0, the intertemporal choice behavior is more inconsistent than hyperbolic discounting (Ventre and Ventre, 2012). M. Olivieri, M. Squillante, V. Ventre 8 Quasi-hyperbolic discount model. Behavioral economists have proposed that the inconsistency in intertemporal choice may be attributable to an internal conflict between “multiple selves” within a decision maker. As a consequence, there are (at least) two exponential discounting selves (with two exponential discount rates) in a single human individual; and when delayed rewards are at the distant future (>1 year), the self with a smaller discount rate wins, while delayed rewards approach to the near future (within a year), the self with a larger discount rate wins, resulting in preference reversal over time. This intertemporal choice behavior can be parametrized in a quasi-hyperbolic discount model (also as a β-δ model) (Laibson 1997; O’Donoghue and Rabin, 1999). For discrete time τ (the unit assumed is one year) it is defined as (Laibson, 1997): 𝐹(𝜏) = 𝛽𝛿 𝑡 (for τ=1,2,3,…) and 𝐹(0) = 1 (0 < 𝛽 < 𝛿 < 1). A discount factor between the present and one-time period later (β) is smaller than that between two future time-periods (δ). In the continuous time, the proposed model is equivalent to the linearly- weighted two-exponential functions (generalized quasi-hyperbolic discounting): 𝑉(𝐷) = 𝐴[𝑤 exp(−𝑘1𝐷) + (1 − 𝑤) exp(−𝑘2𝐷)] where w, 0 < w < 1, is a weighting parameter and k1 and k2 are two exponential discount rates (k1 < k2). Note that the larger exponential discount rate of the two k2, corresponds to an impulsive self, while the smaller discount rate k1 corresponds to a patient self (Ventre and Ventre, 2012). These economists proposed different Multiple Self Models, which often draw analogies between intertemporal choice and a variety of different models of interpersonal strategic interactions. 4. Self-control in intertemporal choices In many cases a dynamic inconsistent behavior is attributed to the existence of contingent “temptations” that increase impulsivity and induce a deviation from the desirable behavior. What the person knows to be his best long run interest conflict with his short run desires. Information and Intertemporal Choices in Multi-Agent Decision Problems 9 Stroz’s model. To represent this incoherent purpose, Strotz (1955) proposed two strategies that might be employed by a person who foresees how her preferences will change over time. The “strategy of pre-commitment”: a person can commits to some plan of action. For example, consider a consumer with an initial endowment K0 of consumer goods which has to be allocated over the finite interval (0, T). At time period t he wishes to maximize his utility function: 𝐽0 = ∫ 𝜆(𝑡 − 0)𝑈[ 𝑇 0 𝑐̅(𝑡), 𝑡]𝑑𝑡 subject to ∫ 𝑐(𝑡)𝑑𝑡 = 𝐾0 𝑇 0 where [𝑐̅(𝑡), 𝑡], is the instantaneous rate of consumption at time period t, and λ(t − 0) is a discount factor, the value of which depends upon the elapse of time between a past or future date and present. And this implies that the discounted marginal utility of consumption should be the same for all periods. But, at a later date, the consumer may reconsider his consumption plan. The problem then is to maximize 𝐽0 = ∫ 𝜆(𝑡 − 𝜏)𝑈[ 𝑇 0 𝑐(𝑡), 𝑡]𝑑𝑡 subject to ∫ 𝑐(𝑡)𝑑𝑡 = 𝐾𝜏 = 𝐾0 − 𝑇 𝜏 ∫ 𝑐(𝑡)𝑑𝑡 𝜏 0 The optimal pattern of consumption will change with changes in τ and if the original plan is altered, the individual is said to display dynamic inconsistency. Strotz showed that individuals will not alter the original plan only if 𝜆(𝑡, 𝜏) is exponential in |t − τ|. The “strategy of consistent planning”: since pre-commitment is not always a feasible solution to the problem of intertemporal conflict, an individual may adopt a different strategy: take into account future changes in the utility function and reject any plan that he will not follow through. His problem is then to find the best plan among those he will actually follow. Thaler and Shefrin’s model. In the setting of Multiple Selves Models, to control impulsivity, Thaler and Shefrin (1981) proposed a “planner-doer” model which draws upon principal-agent theory. They treat an individual as if he contained two distinct psyches: one planner, which pursue longer-run results, and multiple doers, which are concerned only with short-term satisfactions, so M. Olivieri, M. Squillante, V. Ventre 10 they care only about their own immediate gratification (and have no affinity for future or past doers). For example, consider an individual with a fixed income stream 𝑦 = [𝑦1, 𝑦2, … , 𝑦𝑇 ],where ∑ 𝑦𝑡 = 𝑌𝑡 which has to be allocated over the finite interval (0, T). The planner would choose a consumption plan to maximize his utility function 𝑉(𝑍1, 𝑍2, … , 𝑍𝑇 ) subject to ∑ 𝑐𝑡 ≤ 𝑌 𝑡 𝑡=1 in which such 𝑍𝑡 is a function of utility of level consumption in t (𝑐𝑡). On the other hand, an unrestrained doer 1 would borrow 𝑌 − 𝑦1 on the capital market and therefore choose c1 = Y; the resulting consequence is naturally 𝑐2 = 𝑐3 = ⋯ = 𝑐𝑇 = 0. Such action would suggest a complete absence of psychic integration. Then the model focuses on the strategies employed by the planner to control the behavior of the doers, and it proposes two instruments he can use. (a) He can impose rules on the doers’ behavior, which operate by altering the constraints imposed on any given doer. Pure rules, like pre-commitment, can be a very effective self-control strategy because they eliminate all choice. The advantage of these strategies is that once in place they require little or no self-enforcement. However, they may be unavailable or too expensive. (b) He can use discretion accompanied by some method of altering the incentives or rewards to the doer without any self-imposed constraints. One planner can alter the doer’s utility function directly introducing a modification parameter 𝜃 = 𝜃1, 𝜃2, … , 𝜃𝑇 . Z is assumed to be a function of two arguments, ct and θT. If θT = 0, then the doer is completely unrestrained. As θt increases, both Z and (δZt)/(δct) are reduced. θ might be thought of as a guilt parameter. The higher is θt , the more guilt the doer feels for any level of ct (Ventre and Ventre, 2012). In conclusion, the essential insight that Multi Selves Model capture is that, much like cooperation in a social dilemma, self-control often requires the cooperation of a series of temporally situated selves. When one “self” defects by opting for immediate gratification, the consequence can be a kind of unraveling or “falling off the wagon” whereby subsequent selves follow the precedent (Frederick, Loewenstein, and O’Donoghue, 2002). Information and Intertemporal Choices in Multi-Agent Decision Problems 11 5. Multi-agent decision problem: consensus and false consensus effect In a multi-agent decision problem an individual needs to take his intertemporal choice considering others’ preferences, to the purpose of achieving a consensus on a common decision. Group decision problems, indeed, consist in finding the best alternative(s) from a set of feasible alternatives 𝐴 = {𝑎1, … , 𝑎𝑛} according to the preferences provided by a group of agents 𝐸 = {𝑒1, … , 𝑒𝑚}. The objective is to obtain the maximum degree of agreement among the agents’ overall performance judgements on the alternatives. Once the alternatives have been evaluated, the main problem is to compare agents’ judgements to verify the consensus among them; in the case of unanimous consensus, the evaluation process ends with the selection of the best alternative(s). However, in real situations humans rarely come to a unanimous agreement: this has led to evaluate not only crisp degrees of consensus (degree 1 for fully and unanimous agreement) but also intermediate degrees between 0 and 1 corresponding to partial agreement among all agents. Furthermore, full consensus (degree = 1) can be considered not necessarily as a result of unanimous agreement, but it can be obtained ever in the case of agreement among a fuzzy majority of agents (Fedrizzi M, Kacprzyk J, Nurmi H., 1992/1993). The judgements of each agent are frequently based, in part, on intuition or subjective beliefs, rather than detailed data on the preferences of the people being predicted. Such intuitive judgements become more pervasive judgements when people lack necessary data to base their judgements. Research in others areas of social judgement has revealed that people are egocentric: they judge others in the same way that they judge themselves. Consequently, as pointed out in several experiments, each decision maker overestimates his own opinion. Social psychology has founded that people with a certain preference tend to make higher judgements of the popularity of that preference in others, compared to the judgements of those with different preferences. This empirical result has been termed the false consensus effect (Ross et al., 1977; Mullen, et al., 1985). It states that individuals overestimate the number of the people who possess the same attributes as they do. People often believe that others are more like themselves than they really are. Thus, their predictions about others’ beliefs or behaviors, based on casual observation, M. Olivieri, M. Squillante, V. Ventre 12 are very likely to err in the direction of their own beliefs or behavior. For example, college students who preferred brown bread estimated that over 50% of all other college students preferred brown bread, while white-bread eaters estimated that 37% showed brown bread preference (Ross et al., 1977). As the consequence, in multi-agent decision problem we often have to deal with different opinions, different importance of criteria and agents, who are not fully impartial objective. In this sense, the false consensus effect produces partial objectivity and incomplete impartiality, which perturbs the agreements over the evaluation. 6. Assessment of consensus and false consensus effect In the literature, different methods to compute a degree of a consensus in fuzzy environments have been defined, and some approaches have been proposed to measure consensus in the context of fuzzy preference relations (Fedrizzi, Kacprzyk, Nurmi, 1992-1993). But, as we have seen, the false consensus effect can lead to an absence of objectivity in the evaluation process. Indeed, there may be cases where an agent would not be able to objectively express any kind of preference degree between two or more of the available options caused by the presence of the false consensus effect. Then just a numerical indication seems not to be sufficient to synthesize the degree of consensus of agents. To put in evidence the lack of objectivity and, consequently, synthesized judgements, a description of the individual opinion should incorporate both the true knowledge generated agent opinion and the subjective component that produces false consensus outputs. The opinion of each agent is decomposed into two components: a vector, made of the ranking of the alternatives, built by means of a classical procedure, e.g., a hierarchical procedure, and a fuzzy component that represents the contribution of the false consensus effect, which we assume to be fuzzy in nature. This allows us to consider aggregation operators, such as OWA operators, useful when synthesis among fuzzy variables is to be built (Squillante and Ventre, 2010). The formal model considers the set 𝑁 of decision makers, the set 𝐴 of the alternatives, and the set 𝐶 of the criteria. Let any decision maker 𝐼 ∈ 𝑁 be able to assess the relevance of each criterion. Precisely, for every 𝑖, a function ℎ𝑖 : 𝐶 → [0,1] with ∑ ℎ𝑖 (𝑐) = 1𝑐∈𝐶 Information and Intertemporal Choices in Multi-Agent Decision Problems 13 denoting the evaluation or weight that the decision maker assigns to the criterion 𝑐, is defined. Furthermore, the function 𝑔𝑖 : 𝐴×𝐶 → [0,1] is defined, such that 𝑔𝑖 (𝑎, 𝑐) is the value of the alternative 𝑎 with respect to the criterion 𝑐, in the perspective of 𝑖. Let 𝑛, 𝑝,and 𝑚 denote the (positive integer) numbers of the elements of the sets 𝑁, 𝐶, and 𝐴, respectively. The value ℎ𝑖 (𝑐)𝑐∈𝐶 denotes the evaluation of the 𝑝-tuple of the criteria by the decision maker 𝑖 and the value 𝑔𝑖 (𝑐, 𝑎)𝑐∈𝐶,𝑎∈𝐴 denotes the matrix 𝑝×𝑚 whose elements are the evaluations, made by 𝑖, of the alternatives with respect to each criterion in 𝐶. Function: 𝐴 → [0,1] , defined by (𝑓𝑖 (𝑎))𝑎∈𝐴 = ℎ𝑖 (𝑐)𝑐∈𝐶 ⋅ 𝑔𝑖 (𝑐, 𝑎)𝑐∈𝐶,𝑎∈𝐴 is the evaluation, made by 𝑖, of the alternative 𝑎 ∈ 𝐴. An Euclidean metric that acts between couples of decision makers 𝑖 and 𝑗, i.e., between individual rankings of alternatives, is defined by 𝑑(𝑓𝑖 , 𝑓𝑗 ) = √ 1 ǀ𝐴ǀ ∑(𝑓𝑖 (𝑎) − 𝑓𝑗 (𝑎)) 2 𝑎∈𝐴 If the functions ℎ𝑖 , 𝑔𝑖 range in [0, 1], then also 0 ≤ 𝑑(𝑓𝑖 , 𝑓𝑗 ) ≤ 1. If we set 𝑑∗ = 𝑚𝑎𝑥{𝑑(𝑓𝑖 , 𝑓𝑗 )ǀ𝑖, 𝑗 ∈ 𝑁}, then a degree of consensus 𝛿 ∗ can be defined as the complement to one of the maximum distance between two positions of the agents: 𝛿 ∗ = 1 − 𝛿 ∗ = 1 − 𝑚𝑎𝑥{𝑑(𝑓𝑖 , 𝑓𝑗 )ǀ𝑖, 𝑗 ∈ 𝑁}. Now to identify the portion of the false consensus effect internal to the consensus-reaching process we have to consider a vector that represents the components of the consensus = 𝑝(𝑎)𝑃 + 𝑞(𝑎)𝑄 . This polynomial representation of the measure of the effect is composed by a numeric component M. Olivieri, M. Squillante, V. Ventre 14 𝑝(𝑎)𝑃 that contains all quantitative information available derived from the consensus-reaching process, and 𝑞(𝑎)𝑄 that reflects the false consensus effect. Then the measure of the effect is: 𝑞(𝑎) = 1 𝑁(𝑑∗)2 ∑(𝑓𝑖 − 𝑓𝑗 ) 2 𝑁 𝑖=1 with 0 ≤ 𝑞(𝑎) ≤ 1, ∀𝑖, 𝑗 ∈ 𝑁 . This component can be estimate with OWA operators (a large class of decision support tools for providing heuristic solution to situations where several trade-offs should be taken into consideration). In Yager (1988) is introduced an approach for multiple criteria aggregation, based on ordered weighted averaging (OWA) operators. By ranking the alternatives, the operators provide an enhanced methodology for evaluating actions on a qualitative basis. 7. False consensus effect and intertemporal choice in a multi-agent context Many decisions are made in condition of strategic interaction, i.e. situations in which consequences of our choices depend on decisions of others interactive. For example, in bidding in auctions or in a bargaining the choice depends not only on one’s evaluation of the good but also on the evaluation of other individuals. Mathematical instrument used to describe these situations is the theory of games. Indeed, a strategic game is considered as an interactive situation where two or more rivals interact and try to obtain an advantage from this interdependence. In this perspective, the theory of games can be considered as a tool for understanding and forecasting the decision-making processes; according to this theory the outcome of the game coincides with the decision of equilibrium, it occurs when each agent adopts the best strategy, which is the one selected on the basis of rational choice. Rationality is one of the most important assumptions made in theory of games. It implies that every player always maximizes his utility, thus being able to perfectly calculate the probabilistic result of every action. So they have http://www.gametheory.net/dictionary/Utility.html Information and Intertemporal Choices in Multi-Agent Decision Problems 15 consistent preferences on the final outcome of the decision-making process and their aim is to maximize these preferences. However, first of all we have showed that intertemporal choices of each individual are influenced by impulsivity and show inconsistency; furthermore we have seen that in a group decision problem each individual tends to overestimate the extent to which other people share one’s beliefs, attitudes and behaviors. This means that in a strategic interaction people are not rationales; their choices are not solely a function of the objective response but of their subjective structure. The consequence is that in a strategic interaction, the equilibrium of the decision is the result of an internal process (which not reveals rationality). Rational choice and equilibrium decision coincide only if decision makers (alone or in group) succeed to fight loss of self-control and to keep out false consensus effect. So these psychological evidences involve new equilibriums in strategic games, which are not justified with rational behaviors. The consequences are different according to the nature of the interactions; indeed, in theory of games the basic classification of interactions is between non-cooperative games and cooperative ones, consequently we have non- cooperative decision problems and cooperative decision problems too. The first group summarizes the dynamics by which each person pursues his own interests without regard to gains / losses of others. In the second group, subjects form a coalition and assume mutual commitments to share the surplus generated by cooperation. Psychological aspects of impulsivity and false consensus effect influence in different way these two kinds of interactions. A way to analyze these effects is to identify the portion of the false consensus effect in the equilibrium point (Section 6), and to consider influence of doers in each individual choice (Thaler and Shefrin, 1981). 8. Cooperative decision problems In a cooperative game a group of players (coalitions) may enforce cooperative behavior; hence the game is a competition between coalitions of players, rather than between individual players. An example is a coordination game, when players choose the strategies by a consensus decision-making process. Indeed, coordination games are a class of games with multiple pure strategy Nash equilibria in which players choose the http://en.wikipedia.org/wiki/Coordination_game http://en.wikipedia.org/wiki/Consensus_decision-making http://en.wikipedia.org/wiki/Pure_strategy http://en.wikipedia.org/wiki/Nash_equilibrium M. Olivieri, M. Squillante, V. Ventre 16 same or corresponding strategies. The classic example of coordination game is the “battle-of-the sexes”, where an engaged couple must choose what to do in the evening: the man prefers to attend a baseball game and the women prefers to attend an opera. In term of utility the payoff for each strategy is: Man Opera (O) Baseball (B) W o m a n Opera (O) 3, 1 0, 0 Baseball (B) 0, 0 1, 3 In this example there are multiple outcomes that are equilibriums: (B,B) and (O,O). However both players would rather do something together than go to separate events, so no single individual has an incentive to deviate if others are conforming to an outcome: the man would attend the opera if he thinks the woman will be there even though he prefers the other equilibrium outcome in which both attend the baseball game. One of the most commonly suggested criteria for the analysis of games with multiple equilibria is to select the one with the highest payoffs for all, if such a “Paretodominant” outcome exists. In this context, a consensus decision-making process can be considered as an instrument to choose the best strategy in a coordination game. The final decision is often not the first preference of each individual in the group and they may not even like the final result. But it is a decision to which they all consent because it is the best for the group. If we follow the Thaler and Shefrin’s model, we can analyze choices in a cooperative game in this way: at period-one the planner of each agent states his preference, which is the best strategy because the planner wants maximize his utility function; indeed planners are rational part of each player. However, the period-one doers of each agent want obtain an immediate gratification, so they drive each agent to act differently from rational program of own planner, thinking that the others make the same by effect of false consensus. But each agent have a different utility function, so each one will select a different choice with degree = 1, and this make impossible the aggregation of the preferences with OWA operators to obtain a common decision. In fact according the model to measure consensus proposed in Section http://en.wikipedia.org/wiki/Bijection http://en.wikipedia.org/wiki/Strategy Information and Intertemporal Choices in Multi-Agent Decision Problems 17 6 a certain consensus degree 𝛿 ∈ (0,1] is required in advance, consensus is reached if the constraint 𝛿 ∗ ≥ 𝛿 is satisfied. Nevertheless, in cooperative decision problem, the influence of doers can be avoid, indeed agents can enforce contracts through parties at period-one, which eliminates the problem of loss of self-control, because it eliminates all choices. As a consequence the consensus is obtained with the aggregation of preferences of each planner. The planners are rationales, so the final common choice is the best strategy according to the theory of games. However, the result of this aggregation includes a part of the coefficient called the false consensus effect that depends on the subjectivity and also increases the degree of the opinions (Squillante and Ventre, 2010): with cooperation the group utility is higher than real utility of each one derived from strategy chosen. So they have to extract from the degree of consensus the measure of false consensus effect according the model analyzed in Section 6. This means that at the best solution corresponds an improvement in terms of utility that is overrated as a result of the false consensus. Then in a cooperative decision problem the influence of false consensus effect is present at period-one, while the loss of self-control of each agent is fought by the imposition of a rule (Thaler and Shefrin, 1981). The rationality of the equilibrium choice of the game is saved by the possibility of making an arrangement among agents, which represents a pure rule to control the behavior of the doers and maintain self-control at later time (Section 4); nevertheless the final decision has a higher consensus degree because it is influenced by the false consensus effect. However this effect acts only on planners, so we can eliminate it in planners’ utility functions: the false consensus effect directly influence the discount function of each agent. For example, consider two person who live together and put in common a part of their monthly income to do the common expenses, this part of each salary form a fixed income stream 𝑦 = [𝑦1, 𝑦2, … , 𝑦𝑇 ], where ∑ 𝑦𝑡 = 𝑌𝑡 which has to be allocated over the finite interval (0, T). The two agents must agree on how to spend this money. We can eliminate the influence of the doers because both are obliged to deposit in common fund a fixed amount of money, and also because they made the plan of consumption of common expenses at period-one, so they can not use this money for other http://en.wikipedia.org/wiki/Contract M. Olivieri, M. Squillante, V. Ventre 18 purpose. In this way we can take into account only each planner and get the consensus about the common choice through the process of evaluation of a multi-agent decision problem. The planner’s preferences are represented by a utility function 𝑉(𝑍1, 𝑍2, … , 𝑍𝑇 ) , in which such 𝑍𝑡 is a function of utility of level consumption in t (𝑐𝑡). Then the planner would choose a consumption common plan to maximize 𝑉(𝑍1, 𝑍2, … , 𝑍𝑇 ), subject to their fixed income stream ∑ 𝑐𝑡 ≤ 𝑌 𝑡 𝑡=1 . The consumption plan chosen by each agent will provide different degrees of preference for different types of consumption according to their preferences, then to reach an agreement it simply suffices aggregate the preferences of each planner (Section 6). However, the consensual choice obtained will have a greater degree due to the false consensus effect established in the preferences of each planner. So the utility function of each planner may be released in advance of the false consensus effect by reducing the degree of preference of favorite choices. The function to maximize will always be 𝑉(𝑍1, 𝑍2, … , 𝑍𝑇 ), but each Z will represent a degree of utility lower for each type of preferred consume. This example can be analyzed according to the theory of repeated games. The choice of “what we consume with the common fund” can be seen as a choice that is repeated over time. The repeated games study the repetition of the strategic choices over time. According to the theory of games, if in a repeated game, finitely or infinitely, there are multiple Nash equilibria, then there are many subgame perfect equilibria. Some of these involve the play of strategies that are collectively more profitable for players than the one-shot game Nash equilibria. The economic reasoning that supports this balance is as follows: the players will agree to maximize their utility in the first period, while the actions to be taken in the second period are of two types: a punishment if the rival does not maintain the agreement and a prize (the best Nash equilibrium of the single game) if it is fair. In this case the strategies take into account the history of the game, which makes possible the cooperation. When the agents interact only once, they often have an incentive to deviate from cooperation, but in a repeated interaction, any mutually beneficial outcome can be sustained in an equilibrium. The deviation Information and Intertemporal Choices in Multi-Agent Decision Problems 19 is not convenient in the long run, since players can make retaliation and this operates especially when the game is repeated infinitely. According to our theory, the end result is the same: repeating a cooperative game make possible to obtain a common result which is not achievable in a one- period situation (see the battle of the sexes). However, this happens not because the rational player has more convenience to cooperate in the long run, but because through the agreements made at first period they eliminate any temptation to deviate, which is then made impossible. It is necessary set the impossibility to divert, otherwise, in later games, the doer of each player push his agent to deviate, also believing that the others will do the same as a result of the false consensus. 9. Non-cooperative decision problems In non-cooperative games, also called competitive games, players can not stipulate binding agreements, regardless of their goals. So in a non-cooperative decision problem each agent makes decisions independently, without collaboration or communication with any of the others (J. Nash, 1951), an example is the daily trading on the stock exchange. In this category the solution is given by Nash Equilibrium. Consequently in this kind of interaction is not possible to implement some pre-commitment to control the doer’s actions, as a consequence is not possible recognize the best choice on a rational base. If we analyze a non-cooperative multi-agent decision problem like the traditional prisoner’s dilemma, on one temporal interval and with only two alternatives, we see that the agents achieve common decision, and this is the best strategy, because each doer wants obtain the higher advantage which is the same and, for the false consensus effect, each one thinks that other make the same. The doer of each prisoner will choose the strategy of “do not confess”. In the traditional version of the game, the police arrest two suspects (A and B) and interrogate them in separate rooms. Each can either confess, thereby implicating the other, or keep silent. In terms of years in prison the payoff for each strategy are these: M. Olivieri, M. Squillante, V. Ventre 20 Agent A Confess (C) Do not confess (NC) A g e n t B Confess (C) 5, 5 0, 10 Do not confess (NC) 10, 0 1, 1 According to the theory of games, given this set of payoffs, there is a strong tendency for each to confess. If prisoner A remains silent, prisoner B is better off confessing (because 0 is better than 1 year in jail). However, B is also better off confessing if A confesses (because 5 years is better than 10). Hence, B will tend to confess regardless of what A will do; and by an identical argument, A will also tend to confess. This line of reasoning implies two rational players with consistent preferences. Actually, when each player has to choose the best strategy every doer drives his agent to make decision that leads him a greater advantage, believing that the other will do the same due to the effect of the false consensus. Consequently, the decision made by each leads to optimal decision in terms of Pareto, because both have the same utility function and both doers choose the only action that is the best strategy. This creates the paradoxical situation that rational players lead to a poorer outcome than irrational players. However, it is just a coincidence that the two players have achieved a common strategy. In other types of non-cooperative problems this can not happen, with the result that you will never achieve a joint decision without a prior agreement. Consider, for example, a multi-agent decision problem in which the agents set to save money to realize a common purchase. Even agent has a fixed income, 𝑌𝐴 and 𝑌𝐵 , and a nonnegative level of saving, 𝑆𝐴 and 𝑆𝐵. As in cooperative games, the planner of each agent choose the best strategy which maximize his function utility of saving (thinking for future), but the doer of each agent want obtain the highest advantage now, so it would consume 𝑌 and therefore choose = 0 , with a degree =1. Indeed, the doers are impulsives, each one assigns weight=1 at one preference and weight=0 at all the others, thinking that everybody will make in the same way for effect of false consensus. In this case, as we see in cooperative game, is not possible to aggregate the preferences to obtain a common decision. Information and Intertemporal Choices in Multi-Agent Decision Problems 21 The plan made in advance by group of agent (to realize a common purchase) is not feasible if they don’t set some rule or some method to alter the incentives for the doers. This type of problem can be represented in the following way: Agent A Save (S) Do not save (NS) A g e n t B Save (S) 10, 10 5, 5 Do not save (NS) 5, 5 -10, -10 where the payoff represent the utility of each agent for each strategy. According rational choice we note the Nash equilibrium coincides with the best strategy (S,S). However false consensus effect and impulsivity lead each agent to the worst equilibrium, because utility functions of the agents are different among them (each agent prefers consumptions to savings). This causes the lack of consensus on a common decision. In conclusion, in a non-cooperative multi-agent decision problem, there are two situations: 1) the doers of each agent have the same preference and they will reach a common decision that is given by the unanimous choice, 2) the doers have different preferences and do not assign any weight to the other preferences, so it is not possible to aggregate the preferences. Then the influence of doers don’t affect if their choices are unanimous, and in this case the final decision will be also the best decision in term of Pareto, but if this does not happen is impossible to achieve a common strategy without arresting impulsivity, and when the number of agents increases unanimity becomes increasingly difficult to obtain. Analyzing this type of decision problem in long time, we note that the influence of the psychological aspects leads to the same conclusion of the theory of games, namely the impossibility of obtaining cooperation over time, but in a different way: according to the theory of games because the dominant strategy prevails, according to our analysis because the doers will divert to their preferences. Indeed, according to the theory of games a repeated game with a unique Nash equilibrium has the same subgame perfect equilibrium outcome, because in the last stage the strategy which will be played by each player does not depend on the history of the game, that is the strategies of the last stage of game are history M. Olivieri, M. Squillante, V. Ventre 22 independent: every player in last round probably choose the equilibrium of dominant strategy so he betray (playing the last time is like playing a single time). Thus, in finitely repeated games, if you fail to cooperate in the last game you can not do in any other round. However analyzing the situation according our theory we obtain the same conclusion but for different causes. We can reconsider the example of the two agents who save for common expenses, and continue the game for several years: in the same way, in subsequent periods, the doer of each agent will push to consume all what he has saved. If we consider two periods, at the first the payoffs are the same, in the second they are the sum: Agent A Save (S) Do not save (NS) A g e n t B Save (S) 20, 20 10, 10 Do not save (NS) 10, 10 -20, -20 The doers of the second period will want to consume everything and choose 𝑆2 = 0, with the result that is not possible achieve the plan and the equilibrium is the worst solution (NS,NS). The planners will establish a consumption plan by discounting the expected future payoff and so smearing the savings over the years, but in every period the doers will deviate their agents for the temptation to consume everything today and save tomorrow, this impulsiveness is psychologically justified by the effect of the false consensus. In conclusion, even in the long time psychological influence of the doers can not lead to cooperation and to achievement of rational results. We can affirm that in a non-cooperative decision problem is only a chance obtaining a common decision. Information and Intertemporal Choices in Multi-Agent Decision Problems 23 References [1] Ainslie G., “Specious reward: a behavioral theory of impulsiveness and impulse control”, Psychological Bulletin, 463-496 (1975). [2] Bickel W. K., Odum A. L., Madden G. J., “Impulsivity and cigarette smoking: delay discounting in current, never, and ex-smokers”, Psychopharmacology 4, 447-454 (1999). [3] Fedrizzi M, Kacprzyk J, Nurmi H., “Group decision making and consensus under fuzzy preferences and fuzzy majority”, Fuzzy Set Syst 49:21–31 (1992). [4] Fedrizzi M, Kacprzyk J, Nurmi H., “Consensus degrees under fuzzy preferences using OWA operators”, Contr Cyberne 22(2):78–86 (1993). [5] Frederick S., Loewenstein G., O’Donoghue T., “Time Discounting And Time Preference: A Critical Review”, Journal Of Economic Literature 40, 351-401 (2002). [6] Kahneman D., Slovic P., Tversky (Eds.), “Judgment under uncertainty: Heuristics and biases”, Cambridge University Press, New York and Cambridge (1982). [7] Laibson D. I., “Golden Eggs and Hyperbolic Discounting”, Quarterly Journal of Economics 112, 443-477 (1997). [8] Loewenstein G., Prelec D., “Anomalies in intertemporal choice: Evidence and an interpretation”, Quarterly Journal of Economics, pp. 573-97, (1992). [9] Mullen B., Atkins J.L., Champion D.S., Edwards C., Hardy D., Story J.E.,Vanderklok M., “The false consensus effect: a meta-analysis of 115 hypothesis test”, J Exp Soc Psychol; 21:262–283 (1985). [10] Nash J., “Non-Cooperative Games”, The Annals of Mathematics, Second Series, Vol. 54, No. 2, pp. 286-295, (September 1951). [11] O’Donoghue T., Rabin M., “Doing it now or later”, American Economic Review, 89:1, pp. 103-24 (1999). M. Olivieri, M. Squillante, V. Ventre 24 [12] Ross L., Amabile T.M., Steinmetz J.L., “Social roles, social control and biases in social perception”, J Pers Soc Psychol; 35:485–494 (1977). [13] Samuelson P. A., “A Note on The Measurement of Utility”, The Review of Economic Studies 4, 155-161 (1937). [14] Squillante M., Ventre V., “Assessing False Consensus Effect in a Consensus Enhancing Procedure”, International Journal of Intelligent System, vol. 25, 274–285 (2010) [15] Strotz R. H., “Myopia and Inconsistency in Dynamic Utililty Maximization”, Review of Economic Studies 23(3), 165-80 (1955-1956). [16] Takahashi T., “Theoretical frameworks for neuroeconomics of intertemporal choice”, Journal of Neuroscience, Psychology, and Economics, Vol 2(2), 75-90 (2009). [17] Takahashi T., Ikeda K., Hasegawa T., “A hyperbolic decay of subjective probability of obtaining delayed rewards”, Behavioral and Brain Functions, ResearchBio Med Central (2007). [18] Thaler R., Shefrin H. M., “An Economic Theory of Self-Control”, Journal of Political Economy (University of Chicago Press) 89(2), 392-406 (April 1981). [19] Thaler R., “Quasi Rational Economics”, New York, Russell Sage Foundation, 127-133; In Loewenstein, G. and Elster, J., Eds., Choice over time, New York, Russell Sage Foundation, (1991). [20] Ventre A. G. S., Ventre V., “The intertemporal choice behaviour: classical and alternative delay discounting models and control techniques”, Atti Accad. Pelorit. Pericol. Cl. Sci. Fis. Mat. Nat., Vol. 90, Suppl. No. 1, C3 (2012). [21] Yager R.R. “On ordered weighted averaging aggregation operators in multi criteria decision making”, IEEE Trans Syst Man Cybern, 18(1):183–190 (1988).