Original Research Accounting for outcome and process measures in dynamic decision-making tasks through model calibration Varun Dutt1 and Cleotilde Gonzalez2 1School of Computing and Electrical Engineering and School of Humanities and Social Sciences, Indian Institute of Technology Mandi, India and 2Dynamic Decision Making Laboratory, Department of Social and Decision Sciences, Carnegie Mellon University, Pittsburgh, PA, USA Computational models of learning and the theories they represent are often validated by calibrating them to hu- man data on decision outcomes. However, only a few models explain the process by which these decision out- comes are reached. We argue that models of learning should be able to reflect the process through which the decision outcomes are reached, and validating a model on the process is likely to help simultaneously explain both the process as well as the decision outcome. To demon- strate the proposed validation, we use a large dataset from the Technion Prediction Tournament and an exist- ing Instance-based Learning Model. We present two ways of calibrating the Model’s parameters to human data: on an outcome measure and on a process measure. In agree- ment with our expectations, we find that calibrating the Model on the process measure helps to explain both the process and outcome measures compared to calibrating the Model on the outcome measure. These results hold when the Model is generalized to a different dataset. We discuss implications for explaining the process and the de- cision outcomes in computational models of learning. Keywords: outcome and process measures, computational mod- els of learning, Instance-based learning, dynamic decisions, binary choice, calibration Unlike disciplines like economics, models of decision mak-ing in psychology often incorporate theories of the un- derlying cognitive processes that lead to specific outcomes in a decision task. For example, Instance-Based Learn- ing Theory (IBLT; Gonzalez & Dutt, 2011), a theory of how people make dynamic decisions commonly includes assumptions of how people search for information (i.e., the process) and how this information search helps peo- ple to arrive at a decision (i.e., the outcome). However, many of the process theories and corresponding models are just tested on an outcome level; rather than on the process level itself (Johnson et al., 2008). Accounting for both the decision outcomes and the process through which these outcomes are reached is important in mathemati- cal models (Scheres & Sanfey, 2006). That is because by accounting for the process and decision outcomes will en- able such models to provide better account of the observed phenomena. Furthermore, it is also important to account for process and decision outcomes in computational mod- els of learning that try to explain human decisions (Buse- meyer & Diederich, 2009, Erev & Barron, 2005, Rapoport & Budescu, 1992). For example, researchers investigat- ing choice behavior are often interested in explaining the overall maximization behavior (an outcome measure) and the exploratory behavior (e.g., alternation between alterna- tives, a process measure) through cognitive models, which explains how people learn to maximize long-term rewards (Biele, Erev & Ert, 2009; Erev, Ert, Roth, Haruvy et al., 2010; Gonzalez & Dutt, 2011). Amidst the importance of accounting for both the de- cision outcome and the process, literature has revealed a strong relationship between these two, where the result- ing outcome is consistent with the adopted process (Erev & Barron, 2005; Green, Price & Hamburger, 1995; Hills & Hertwig, 2010). According to Erev and Barron (2005), one expects a strong relationship between process and de- cision outcomes in cases where the decision environment is dynamic (i.e., repeated), and where the decision outcome is contingent upon the process. For example, consider a repeated binary-choice task, where choices are made re- peatedly between two alternatives. One of the alternatives is risky with a high outcome and a low outcome. These two outcomes occur with a certain pre-defined probabili- ties when this risky alternative is chosen. The other al- ternative is safe with a medium outcome. This medium outcome occurs with a sure (100%) chance when this al- ternative is chosen. Now, if the expected value of the risky alternative is greater than that of the safe alternative (i.e., the safe alternative is maximizing), then participants who alternate a lot while selecting alternatives would end-up maximizing their choices only half of the time. In fact, Hills and Hertwig (2010) show that people seem to rely on two distinct alternation processes while making binary choices; both these processes achieve different amounts of maximization behavior. These arguments are not only rel- evant to human decisions but also to decision making in animals. For example, Green et al. (1995) have shown that pigeons can only learn to maximize their outcomes by alternating between available alternatives in a probabilistic environment involving repeated choices between safe and risky alternatives. Calibrating models to both process and outcome mea- sures from one-time sequential sampling tasks is already common in literature (Ratcliff, 1978; Ratcliff & Smith, 2004). For example, Ratcliff (1978) calibrated models to Corresponding author: Varun Dutt, School of Computing and Electrical Engineering and School of Humanities and Social Sciences, Indian Institute of Technology, Mandi, PWD Rest House, Near Bus Stand, Mandi – 175 001, Himachal Pradesh, India. e-mail: varun@iitmandi.ac.in 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 1 mailto:varun@iitmandi.ac.in http://dx.doi.org/10.11588/jddm.2015.1.17663 Dutt & Gonzalez: Accounting for outcome and process measures both outcome and process measures in an old-new recog- nition memory task. In this task, the outcome measure was proportion of correct responses and the process mea- sure was the accumulation of evidence to a threshold for making a response. In fact, calibrating models to both outcome and process measures in one-time choice tasks is so common that a suite of software called Diffusion Model Analysis Toolbox (DMAT, Vandekerckhove & Tuerlinckx, 2007) has been recently developed for this purpose. In contrast, to the authors’ best knowledge, except for one study (mentioned below) no one has explicitly cali- brated models to outcome and process measures simultane- ously in dynamic decision making tasks (Johnson, Schulte- Mecklenbeck & Willemsen, 2008). Johnson et al. (2008) demonstrated via computational modeling that the prior- ity heuristic, which provides a novel account of how people make risky choices, captures the decision outcomes; yet, this heuristic fails to account for the process measures. The general finding is that although certain behavioral results reveal a strong connection between the decision outcome and the process, existing models of learning in dynamic decision tasks rarely show any relationship between them (Dember & Fowler, 1958; Erev & Barron, 2005; Erev, Ert, Roth, Haruvy et al., 2010; Rapoport & Budescu, 1992; Rapoport, Erev, Abraham & Olson, 1997; Tolman, 1925). For example, although the outcome results (i.e., maximiza- tion) in a symmetrical zero-sum matching pennies game were consistent with predictions from a reinforcement- learning algorithm, process results (i.e., alternations be- tween alternatives) could not be accounted for by the algo- rithm (Erev & Barron, 2005; Rapoport & Budescu, 1992). Similarly, according to Johnson et al. (2008), the prior- ity heuristic, a strategy to account for risky choices, fails to account for the process measures in dynamic decision tasks. In one study, Gonzalez and Dutt (2011) have calibrated cognitive models in the sampling paradigm (a dynamic task), where participants are asked to sample options free of cost before making a consequential choice for real. Gon- zalez and Dutt (2011) demonstrate that a computational model based upon the IBLT (Gonzalez, Lerch & Lebiere, 2003), (“IBL model” hereafter), when calibrated on the outcome measure, was able to also explain the process measure better than the best models known in two dif- ferent experimental paradigms. Gonzalez and Dutt (2011) however, did not calibrate their model on the process mea- sure as well. Thus, it remains unclear what effect cali- brating a model to the process measure compared to the outcome measure has on the model predictions of both these measures. In general, one expects the decision out- come to be the result of the process (Johnson et al., 2008). Thus, calibrating models on process measures rather than outcome measures should have benefits in explaining both these measures at the same time. Although it is hard to find models calibrated to out- come and process measures in dynamic tasks, past studies have made certain qualitative predictions of dynamic de- cision models (Busemeyer, 1985; Hertwig, Barron, Weber & Erev, 2004; Lee, Zhang, Munro & Steyvers, 2011) on outcome and process measures. However, a quantitative empirical investigation of these models on both these mea- sures is something currently lacking and much needed in literature. This paper makes a contribution in this area by investigating the benefit of calibrating cognitive models to outcome and process data in a dynamic decision task. In this paper, we evaluate the role of calibrating a com- putational model to either the decision outcome or the process in explaining and predicting both these measures. Specifically, we calibrate an IBL model (Gonzalez & Dutt, 2011), to a risk-taking measure (decision outcome) or an al- ternation measure (process), and evaluate the model fits to human data (through parameter calibration in a dataset) and predictions (through generalization in a dataset differ- ent from calibration). Given the hypothesized benefits of calibrating models on process measures (Camerer & Ho, 1999; Suppes & Atkinson 1959), we expect that the IBL model being calibrated to the alternation measure would improve its explanation about both the risk-taking and al- ternations compared to when it is calibrated on the risk- taking measure. We use two large human datasets, esti- mation and competition, that were collected for the 2008 Technion Prediction Tournament (TPT (Erev, Ert, Roth, Haruvy et al., 2010). The choice of TPT datasets is be- cause the main focus of the tournament was on outcome measures, where no attention was given to process mea- sures (Erev, Ert, Roth, Haruvy et al., 2010). That is be- cause it was felt that paying less attention to the process measures can actually help the prediction of the outcome measures (Erev & Haruvy, 2005; Estes, 1962), which is contrary to the hypothesis under test in this paper. Thus, this dataset becomes an ideal choice for testing a process- measure calibrated model’s ability to perform on the out- come measure. In what follows, we first discuss the role of the calibration process in computational models. Next, we present the effects of calibrating an existing IBL model on the outcome measure or the process measure on the ex- planations and predictions of one or both measures in the TPT’s datasets. We close this paper by discussing the role of model calibration to account for both the process and decision outcomes. The Role of Model Calibration in Explaining Different Measures of Performance Calibrating a model to human data means finding the values of its parameters that mini- mize the deviation between model’s predictions and obser- vations on a dependent measure. In the TPT, several influ- ential models1 of learning in binary choice were calibrated and evaluated on only the outcome measure (risk-taking) and not on the process measure (alternations). These mod- els were able to account for risk-taking very well; however, many of them did not provide any way of computing the alternations (Gonzalez & Dutt, 2011). In fact, most of the competing models did not provide any way to explain the learning process (see an extended discussion about these models in Gonzalez and Dutt (2011)). For example, a number of models submitted to the TPT used prospect the- ory (Tversky & Kahneman, 1992), to predict choices based upon calibrated mathematical functions. Prospect Theory does not provide any mechanism that would predict the sequential selection of options over time. In fact, only a few recent models of repeated binary-choice may account for both the risk-taking and alternation measures simulta- neously: One of these models is the Inertia Sampling and Weighting (I-SAW) model (Chen et al., 2011; Nevo & Erev, 2012; Erev, Ert, Roth, Haruvy et al., 2010) and the other is an IBL model (Gonzalez & Dutt, 2011; Gonzalez, Dutt & Lejarraga, 2011; Lejarraga, Dutt & Gonzalez, 2012). How- ever, these models were calibrated on both the outcome and process measures at the same time, which makes it 1Some of these models included the two-stage sampler model, the normalized reinforcement learning with inertia model, and the explorative sampler with recency model (Erev, Ert, Roth, Haruvy et al., 2010) 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 2 http://dx.doi.org/10.11588/jddm.2015.1.17663 Dutt & Gonzalez: Accounting for outcome and process measures difficult to evaluate the utility of calibrating models to one of these measures. We expect that calibrating a model to the process mea- sure should generally be beneficial for the model’s ability to explain both the process and outcome measures upon generalization to novel conditions. Next, we provide de- tails about the TPT datasets that we use to evaluate the IBL model. Method Risk-taking and Alternations in Technion Prediction Tournament Competing models submitted to the TPT were evalu- ated according to the generalization criterion method (Busemeyer & Wang, 2000), by which models were calibrated on choices made by participants in 60 prob- lems (the estimation set) and later tested in a new set of 60 problems (the competition set) with the param- eters obtained from the calibration process in the es- timation set. The generalization criterion method was believed to be a true test for models to explain ob- served choice decisions. Although the TPT involved three different experimental paradigms, we only use data from the “E-repeated” paradigm that involved consequential choices in a repeated binary-choice task with immediate outcome feedback on the chosen alter- native. For each of the 60 problems in the estimation and competition sets in this paradigm, a sample of 100 participants was randomly assigned into 5 groups of 20 participants each, and each group completed 12 of the 60 problems. Each participant was instructed to repeatedly and consequentially select between two unlabeled buttons on a computer screen in order to maximize long-term rewards for a block of 100 trials per problem (this end point was not known to partic- ipants). One button was associated with a risky al- ternative and the other button with a safe alternative. Selecting an alternative, safe or risky, generated an outcome for the selected alternative (thus, the foregone outcome on the unselected alternative was not shown). The selection of the alternative with the higher ex- pected value, which could be either the safe or risky button, would maximize a participant’s long-term re- wards. Therefore, choosing a maximizing alternative across all the repeated trials would constitute the op- timal strategy in the task. Other details about the E-repeated paradigm are reported in Erev, Ert, Roth, Haruvy et al. (2010). The models submitted to the TPT were not pro- vided with human data for alternation between options (i.e., the A-rate or the process measure), and they were evaluated only according to their ability to account for the risk-taking behavior (i.e., the R-rate or the out- come measure) (Erev, Ert, Roth, Haruvy et al., 2010). We calculated the A-rate for analyses of alternations from the TPT data (see results in Gonzalez and Dutt, 2011). First, alternations are either coded as 1s, the respondent switched from making a risky or safe choice in the last trial to making a safe or risky choice in the current trial; or they are coded as 0s, the respondent simply repeated the last trial’s choice. The proportion of alternations in each trial is computed by averaging the alternations over 20 participants per problem and the 60 problems in each dataset. The R-rate is the proportion of risky choices in each trial averaged over 20 participants per problem and the 60 problems in each dataset. A problem is defined as consisting of two alternatives, risky and safe. In the risky alterna- tive, there are two possible outcomes, high and low, where the occurrence of these outcomes is determined by corresponding probability value. In the safe alter- native, there is one possible outcome, medium, where this outcomes occurs with a 100% chance. For cal- culating the A-rate and R-rate, the averaging is done over 20 participants as this many participants were collected in the TPT (Erev, Ert, Roth, Haruvy et al., 2010). Figure 1 shows the overall R-rate and A-rate over 99 trials from trial 2 to trial 100 in the estimation and competition sets. As seen in both of these datasets, the R-rate is relatively constant across trials, in con- trast to the sharp decrease in the A-rate. The sharp decrease in the A-rate shows a transition in the pattern of information-search across trials (Gonzalez & Dutt, 2011). Overall, these R-rate and A-rate curves sug- gest that risk-taking remains relatively steady across trials, while they learn to alternate less and choose one of the two alternatives more often. Thus, the A-rate (process) is more dynamic compared to the R-rate (de- cision outcome) and due to these differences it is likely to be harder for a model to account for the A-rate compared to the R-rate. We use the R-rate and A- rate curves in Figure 1 to evaluate the role of model calibration ahead in this paper. An Instance-based Learning Model of Repeated Binary-choice IBLT (Gonzalez et al., 2003) has been used as the basis for developing computational models that capture hu- man behavior in a wide variety of dynamic decision making tasks. These include dynamically-complex tasks like the water purification plant task (Gonzalez & Lebiere, 2005; Gonzalez et al., 2003; Martin, Gonza- lez & Lebiere, 2004), training paradigms of simple and complextasks (Gonzalez, Best, Healy, Bourne & Kole, 2010), simple stimulus-response practice and skill ac- quisition tasks (Dutt, Yamaguchi, Gonzalez & Proc- tor, 2009) and repeated binary-choice tasks (Gonzalez & Dutt, 2011; Gonzalez et al., 2011; Lebiere, Gonzalez & Martin, 2007; Lejarraga et al., 2012) among others. The different computational applications of IBLT il- lustrate its generality and ability to capture decisions from experience in multiple contexts. A recent IBL model has showcased the theory’s ro- bustness across multiple choice tasks: A probability- learning task, a repeated binary-choice task with fixed probabilities, and a repeated binary-choice task with changing probabilities (Lejarraga et al., 2012). We use this model to evaluate the effects of model cali- 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 3 http://dx.doi.org/10.11588/jddm.2015.1.17663 Dutt & Gonzalez: Accounting for outcome and process measures Figure 1. (A) The R-rate and A-rate across trials observed in human data in the estimation set of the TPT between trial 2 and trial 100. (B) The R-rate and A-rate across trials observed in human data in the competition set of the TPT between trial 2 and trial 100. bration to different outcome or process measures. The model’s formulations and decision-making process are further explained in other publications (Gonzalez & Dutt, 2011; Lejarraga et al., 2012) and summarized in the Appendix. This model makes choice selec- tions between alternatives in a trial by comparing the weighted averages of observed outcomes on each alter- native called “blended values.” A blended value for an alternative, safe or risky, is a function of the probabil- ity of retrieving instances from memory multiplied by their respective outcomes that have been observed on previous selections of the alternative (Lebiere, 1999; Lejarraga et al., 2012). Each instance consists of a label that identifies a decision alternative in the task and the outcome obtained. For example, (risky, $32) is an instance where the decision was to choose the risky alternative and the outcome obtained was $32. The probability of retrieving an instance from mem- ory, which is used to compute the blended value, is a function of its activation (Anderson & Lebiere, 1998). Each observed outcome (represented by a correspond- ing instance in memory) has an activation value that is a function of the recency and frequency of observing the outcome plus a noise term. This simplified activa- tion equation has shown to be sufficient at explaining human choices in several experiential tasks (Gonzalez & Dutt, 2011; Lejarraga et al., 2012). The activation is influenced by the decay parameter , which captures the rate of forgetting or the reliance on recency and fre- quency of observing outcomes. The higher the value of the parameter, the greater is the model’s reliance on outcomes experienced recently. The activation is also influenced by a noise parameter that is important for capturing the variability in human behavior from one participant to another. IBL borrows d and s parame- ters and the activation equation from a popular cog- nitive framework called ACT-R (Atomic Components of Thought – Rational; Anderson & Lebiere, 1998). However, unlike ACT-R where d and s parameters are kept fixed, we calibrate the values of these parameters in the IBL model to account for choices in human data. The model equations for blending and activation are included in the Appendix. Results Model Calibration to Different Measures We used a genetic algorithm program to calibrate the model’s parameters to minimize the mean squared de- viation (MSD) between its predictions and the ob- served average A-rate per problem or the average R- rate per problem. The average R-rate per problem and the average A-rate per problem were computed by averaging the risky choices and alternations in each problem over 20 participants per problem and 100 tri- als per problem (for a problem’s definition, please see the description above). Later, the MSDs were calcu- lated across the 60 estimation set problems by using the average R-rate per problem and by the average A-rate per problem from the model and human data. For calibration, both the s and the d parameters were varied between 0.0 and 10.0 and the genetic algorithm was run for 500 generations (crossover rate = 50%; mutation rate = 10%). The assumed range of vari- ation for the s and d parameters and the number of generations in the genetic algorithm is large, and it en- sures that the optimization process does not miss the minimum MSD value due to a small range of parame- ter variation (for more details about genetic algorithm optimization, please see Gonzalez & Dutt, 2011). We calibrated the IBL model separately on the R-rate and the A-rate measures, and the optimized values of the d and s parameters were determined for each calibra- tion. The model calibrated on the R-rate produced the smallest MSD for d = 5.00 and s = 1.50. These pa- rameters have the same optimal values as reported by Lejarraga et al. (2012), who had also calibrated this IBL model on the R-rate measure on the same dataset. As documented by Lejarraga et al. (2012), the value of both the d and s parameters is high compared to the ACT-R default values of d = 0.5 and s = 0.25 (Anderson & Lebiere, 1998). Furthermore, the model calibrated on the A-rate produced the smallest MSD for d = 9.74 and s = 0.96. Thus, calibrating the model on the A-rate produces a greater value for the d pa- rameter and a slightly smaller value for the s param- eter. The greater d parameter value suggests a high 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 4 http://dx.doi.org/10.11588/jddm.2015.1.17663 Dutt & Gonzalez: Accounting for outcome and process measures dependency on recently experienced outcomes to make choice decisions. Figure 2 shows the MSDs for the R-rate and the A- rate from the IBL model that was calibrated on the R- rate or the A-rate in the estimation set. When model parameters were calibrated on the R-rate (i.e., d = 5.0 and s = 1.5), the model explained the R-rate quite well (MSD = 0.008), but it explained the A-rate less well (MSD = 0.063). Thus, the model explains the outcome measure well when calibrated on the outcome measure; but, it explains the process measure less well. In contrast, when the IBL model parameters are cal- ibrated on the A-rate, the model explains the A-rate much better (MSD=0.002) and the resulting R-rate also relatively well (MSD = 0.023). Thus, the bene- fit of calibrating the model on the A-rate measure (= 0.061) is larger than the detriment of calibrating the model on the R-rate measure (= 0.015). Overall, these results show that by calibrating the IBL model to the process measure, one is able to explain both the pro- cess and outcome measures better than by calibrating the IBL model to the outcome measure. Thus, these results suggest that the components of the IBL model are good representations of the A-rate process and well as the R-rate decision outcomes, especially when ac- counting for the A-rate is more challenging than the R-rate because the A-rate is more dynamic than the R-rate (Gonzalez & Dutt, 2011). Figure 2. The MSD for the R-rate per problem and A-rate per problem in the estimation set of the TPT. The model was either calibrated on the R-rate per problem or calibrated on the A-rate per problem in the estimation set. The calibrated values of the d and s parameters obtained for each measure(R-rate or A-rate per problem) have been shown in brackets.The differences for calibrat- ing with A-rate measure(respective R-rate measure) are shown by two vertical arrows. Figure 3 presents the human and model R-rate and A-rate across trials when the model was calibrated to the R-rate (Figure 3A) and when it was calibrated to the A-rate (Figure 3B). Here, it can be observed how the model explains the human learning data better for the measure used to calibrate the model. Generalizing the Calibrated IBL model to the Competition set The demonstration that calibrating a model to a pro- cess measure helps explain both the process and out- come measures is an important way to corroborate the consistency of predictions from cognitive models. A robust model should be able to explain the learning process, as well as the outcomes resulting from that very process. According to Lebiere, Gonzalez, and Warwick (2009), models that explain only the outcome and not the process behavior might find it difficult to general- ize their predictions to novel conditions. Here, we used the generalization criterion test (Ahn, Busemeyer, Wa- genmakers, & Stout, 2009; Busemeyer & Wang, 2000), to investigate the predictions that the different calibra- tion procedures can make in novel data sets: We ran the calibrated models in novel conditions to evaluate and compare performance. The model calibrated to the TPT’s estimation set on the R-rate or the A-rate was generalized to TPT’s competition set by keeping the same parameter values that were derived during calibration. The model was run using 20 participants per problems and 60 problems in the competition set. There were different sets of problems used between the estimation and competition sets. Also, these problems were run as part of two separate experiments involving different human participants. Given these differences, one expects poorer performance from both the mod- els in the competition set compared to the estimation set. However, as the algorithm used to generate prob- lems in the competition set was same as that used to generate problems in the estimation set, one also expects both models to showcase results that are sim- ilar to those found for the estimation set: The model calibrated to the process measure is able to explain both the process and outcome measures better than the model calibrated to the outcome measure. Figure 4 shows the resulting MSDs from generaliz- ing the IBL model to the competition set. The model that was calibrated on the estimation set’s R-rate re- sulted in the best predictions for the same measure in the competition set (MSD = 0.006); however, its pre- dictions for the A-rate were relatively inferior (MSD = 0.074). Furthermore, the model that was calibrated on the A-rate resulted in the best predictions for the same measure in the competition set (MSD = 0.006) with reasonably good predictions for the R-rate (MSD = 0.032). Thus, again the improvement in MSD for the A-rate is larger than (= 0.068) the decrement in the MSD for the R-rate (= 0.026). Also note that the re- sults in competition set (Figure 4) generate poorer per- formance (higher MSDs) from the models, in general, compared to those in the estimation set (Figure 2). As in the estimation set, these results translate to the process of learning over trials (see Figure 5). The model’s predictions for the measure on which it was calibrated to in the estimation set are the best. The model that was calibrated on the R-rate in the estima- tion set predicted the R-rate better than the A-rate 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 5 http://dx.doi.org/10.11588/jddm.2015.1.17663 Dutt & Gonzalez: Accounting for outcome and process measures Figure 3. The R-rate and A-rate across trials predicted by the IBL model and that observed in human data in the TPT’s estimation set. Panels A and B show the results of calibrating the IBL model to the R-rate per problem and the A-rate per problem, respectively. Figure 4. The MSD for the R-rate per problem and A-rate per problem in the competition set of the TPT. The model was either calibrated on the R-rate per problem or calibrated on the A-rate per problem in the estimation set. The calibrated values of the d and s parameters obtained for each measure (R-rate or A-rate per problem)in the estimation set have been shown in brackets. The differences for calibrating with A-rate measure (respective R-rate measure) are shown by two vertical arrows. (Figure 5A); however, the model that was calibrated on the A-rate in the estimation set predicted both the R-rate and A-rate over time quite well (Figure 5B). Discussion We argue that strong and robust models of human be- havior need to explain both the decision outcome and the process from which that outcome came about. We suggest that many models of human behavior, particu- larly in the context of repeated choice and dynamic de- cisions from experience, have only focused on predict- ing outcomes but not the process. Furthermore, most of the existing computational models of experiential decisions explain the decision outcomes, while com- pletely ignoring or failing to account for the process through which these decision outcomes are reached (see a review of models in (Gonzalez & Dutt, 2011). This observation is perhaps not a coincidence, because predicting outcome as a result of a process is very chal- lenging (Erev & Barron, 2005; Rapoport et al., 1997). Our findings presented the robustness of explaining and predicting outcome and process measures through an IBL model. We demonstrated a method for find- ing out a cognitive model’s ability in explaining both the process and the decision outcomes. The model’s calibration on the process measure reduced the MSD for the A-rate (process) by a large amount without a large deterioration in the MSD for the R-rate (decision outcome). The proposed calibration was also helpful in accounting for both these measures after the model was generalized into a novel condition. Explaining both the process and decision outcomes 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 6 http://dx.doi.org/10.11588/jddm.2015.1.17663 Dutt & Gonzalez: Accounting for outcome and process measures Figure 5. The generalization of the IBL model in the TPT’s competition set. (A) The model’s parameters were calibrated on the R-rate per problem measure in the TPT’s estimation set. (B) The model’s parameters were calibrated on the A-rate per problem measure in the TPT’s estimation set. is important, because doing so will improve our un- derstanding of how people maximize long-term goals through the process of sequential choices from expe- rience. Several recent model-comparison competitions have suggested the use of different dependent measures for calibrating models without a clear motivation for choosing one measure over the other. For example, the measure of model evaluation in the TPT was solely risk-taking, i.e., decision outcomes (Erev & Barron 2005); however, the measure of evaluation in the re- cently concluded market-entry competition (Erev, Ert, & Roth, 2010) was a combination of risk-taking (out- come) and alternations (process). Our analysis sug- gests that stronger and more robust models of learning should be able to explain both the decision outcomes and the process by which these outcomes came about. Future model comparison efforts should enforce both types of measures. In this paper, we used one IBL model to showcase the benefits of calibrating models on a process mea- sure compared to an outcome measure. This attempt maybe limited in its ability at present as we only used one model, IBL, on two datasets. However, this at- tempt does showcase the wider generalizability of the theory, IBLT, which has been used in literature to derive a number of models on a number of decision tasks (please see: Gonzalez, in press; Gonzalez, 2013 for more arguments). As part of our future research, we would like to build on our current finding by calibrating and evaluating models on both the outcome and process measures in various tasks that differ in their outcome feedback and dynamics. Also, as part of future research, we would like to consider the mutual benefits of calibrating mod- els to both process and decision outcomes especially when there are more than two measures. It would be interesting to observe the extent to which the bene- fits of calibrating models to different kinds of process measures carries over to different kinds of decision out- comes. In the case there are more than two measures, one could combine multiple process and outcome mea- sures by doing a weighted sum of mean-squared devi- ations calculated on these measures. One could keep weights at values such that all combining measures are weighted equally during optimization. Furthermore, it would be interesting to observe how calibrating mod- els to the process measures carries over to the outcome measures when the calibration is done at the individual level rather than at the aggregate level. These eval- uations would help extend our existing knowledge on this topic and help us explore benefits and limitations for computational models in explaining both the de- cision outcomes and the process through which these outcomes are reached. Acknowledgements: This research is partially sup- ported by the following funding sources: Defense Threat Reduction Agency (DTRA) grant number: HDTRA1- 09-1-0053 to Dr. Cleotilde Gonzalez; Department of Science and Technology (DST) grant number: SR/CSRI/28/2013(G) to Dr. Varun Dutt. We would also like to thank Dr. Ido Erev of the Technion-Israel 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 7 http://dx.doi.org/10.11588/jddm.2015.1.17663 Dutt & Gonzalez: Accounting for outcome and process measures Institute of Technology for making the data from the Technion Prediction Tournament available. Declaration of conflicting interests: The authors de- clare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. Author contributions: The authors contributed equally to this work. Supplementary material: Supplementary material available online. Handling editor: Andreas Fischer Copyright: This work is licensed under a Creative Com- mons Attribution-NonCommercial-NoDerivatives 4.0 In- ternational License. Citation: Dutt, V. & Gonzalez, C. (2015). Ac- counting for outcome and process measures in dy- namic decision-making tasks through model calibra- tion. Journal of Dynamic Decision Making, 1, 2. doi:10.11588/jddm.2015.1.17663 Received: 15 December 2014 Accepted: 13 July 2015 Published: 29 September 2015 References Ahn, W. Y., Busemeyer, J. R., Wagenmakers, E. J., & Stout, J. C. (2009). Comparison of decision learning models using the generalization criterion method. Cognitive Science, 32, 1376- 1402. doi:10.1080/03640210802352992 Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Erlbaum. Biele, G., Erev, I., & Ert, E. (2009). Learning, risk attitude and hot stoves in restless bandit problems. Journal of Mathematical Psychology, 53(3), 155-167. doi:10.1016/j.jmp.2008.05.006 Busemeyer, J. R. (1985). Decision making under uncertainty: A comparison of simple scalability, fixed sample, and sequential sampling models. Journal of Experimental Psychology, 11, 538- 564. doi:10.1037/0278-7393.11.3.538 Busemeyer, J. R., & Diederich, A. (2009). Cognitive Modeling. New York, NY: Sage Publications. Busemeyer, J. R., & Wang, Y.M. (2000). Model comparison and model selections based on generalization criterion method- ology. Journal of Mathematical Psychology, 44(1), 171–189. doi:10.1006/jmps.1999.1282 Camerer, C., & Ho, T. H. (1999). Experience-weighted attraction learning in normal form games. Econometrica, 67(4), 827-874. Retrieved from: http://www.jstor.org/stable/2999459 Chen, W., Liu, S. Y., Chen, C. H., & Lee, Y. S. (2011). Bounded memory, inertia, sampling and weighting model for market entry games. Games, 2, 187-199. doi:10.3390/g2010187 Dember, W. N., & Fowler, F. (1958). Spontaneous al- ternation behavior. Psychological Bulletin, 55, 412–428. doi:10.1037/h0045446 Dutt, V., Yamaguchi, M., Gonzalez, C., & Proctor, R.W. (2009). An Instance-Based Learning model of stimulus-response compat- ibility effects in mixed location-relevant and location-irrelevant tasks. In A. Howes, D. Peebles, R. Cooper (Eds.), 9th Interna- tional Conference on Cognitive Modeling – ICCM2009. Manch- ester, UK. Retrieved from: http://act-r.psy.cmu.edu/wordpress/ wp-content/uploads/2012/12/863paper115.pdf Erev, I., & Barron, G. (2005). On adaptation, maximization and reinforcement learning among cognitive strategies. Psychological Review, 112(4), 912-931. doi:10.1037/0033-295X.112.4.912 Erev, I., Ert, E., & Roth A. E. (2010). A choice prediction com- petition for market entry games: An introduction. Games, 1(2), 117-136. doi:10.3390/g1020117 Erev, I., Ert, E., Roth, A. E., Haruvy, E., Herzog, S. M., Hau, R., Hertwig, R., Stewart, T., West, R., & Lebiere, C. (2010). A choice prediction competition: Choices from experience and from description. Journal of Behavioral Decision Making, 23(1), 15-47. doi:10.1002/bdm.683 Erev, I., & Haruvy, E. (2005). Generality, Repetition, and the Role of Descriptive Learning Models. Journal of Mathematical Psychology, 49(5), 357-371. doi:10.1016/j.jmp.2005.06.009 Estes, W. K. (1962). Learning theory. Annual Review of Psychol- ogy, 13, 107-144. doi:10.1146/annurev.ps.13.020162.000543 Gonzalez, C. (in press). Decision Making: A Cognitive Science Perspective. Chapter 6 (pp. TBD). In Chipman, S. (Ed.), The Oxford Handbook of Cognitive Science. New York, NY: Oxford University Press. Gonzalez, C. (2013). The boundaries of Instance-Based Learning Theory for explaining decisions from experience. Chapter 5, pp. 73-98. In Pammi and Srinivasan (Eds.), Decision Making: Neu- ral and Behavioural Approaches. Vol. 202, Progress in Brain Research. New York, NY: Elsevier. Gonzalez, C., Best, B. J., Healy, A. F., Bourne, L. E., Jr, & Kole, J. A. (2010). A cognitive modeling account of simultaneous learn- ing and fatigue effects. Journal of Cognitive Systems Research, 12(1), 19-32. doi:10.1016/j.cogsys.2010.06.004 Gonzalez, C., & Dutt, V. (2011). Instance-based learning: Inte- grating sampling and repeated decisions from experience. Psy- chological Review, 118, 523-551. doi:10.1037/a0024558 Gonzalez, C., Dutt, V., & Lejarraga, T. (2011). A loser can be a winner: Comparison of two instance-based learning mod- els in a market entry competition. Games, 2(1), 136-162. doi:10.3390/g2010136 Gonzalez, C., & Lebiere, C. (2005). Instance-based cognitive mod- els of decision making. In D. Zizzo & A. Courakis (Eds.), Transfer of knowledge in economic decision-making (pp.148-165). New York, NY: Palgrave Macmillan. Gonzalez, C., Lerch, F. J., & Lebiere, C. (2003). Instance-based learning in real-time dynamic decision making. Cognitive Science. 27(4), 591-635. doi:10.1016/S0364-0213(03)00031-4 Green, L., Price, P. C., & Hamburger, M. E. (1995). Pris- oner’s dilemma and the pigeon: Control by immediate con- sequences. Journal of Experimental Analytical Behaviour, 64, 1–17. doi:10.1901/jeab.1995.64-1 Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (2004). De- cisions from experience and the effect of rare events in risky choice. Psychological Science, 15, 534-539. doi:10.1111/j.0956- 7976.2004.00715.x Hills, T. T., & Hertwig, R. (2010). Information search in de- cisions from experience: Do our patterns of sampling fore- shadow our decisions? Psychological Science, 21(12), 1787- 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 8 http://dx.doi.org/10.1080/03640210802352992 http://dx.doi.org/10.1016/j.jmp.2008.05.006 http://dx.doi.org/10.1037/0278-7393.11.3.538 http://dx.doi.org/10.1006/jmps.1999.1282 http://www.jstor.org/stable/2999459 http://dx.doi.org/10.3390/g2010187 http://dx.doi.org/10.1037/h0045446 http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/12/863paper115.pdf http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/12/863paper115.pdf http://dx.doi.org/10.1037/0033-295X.112.4.912 http://dx.doi.org/10.3390/g1020117 http://dx.doi.org/10.1002/bdm.683 http://dx.doi.org/10.1016/j.jmp.2005.06.009 http://dx.doi.org/10.1146/annurev.ps.13.020162.000543 http://dx.doi.org/10.1016/j.cogsys.2010.06.004 http://dx.doi.org/10.1037/a0024558 http://dx.doi.org/10.3390/g2010136 http://dx.doi.org/10.1016/S0364-0213(03)00031-4 http://dx.doi.org/10.1901/jeab.1995.64-1 http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.11588/jddm.2015.1.17663 Dutt & Gonzalez: Accounting for outcome and process measures 1792. doi:10.1177/0956797610387443 Johnson, E. J., Schulte-Mecklenbeck, M., & Willemsen, M. (2008). Process Models deserve Process Data: Comment on Brand- stätter, Gigerenzer, & Hertwig (2006). Psychological Review, 115(1), 263-272. doi:10.1037/0033-295X.115.1.263 Lebiere, C. (1999). Blending: An ACT-R mechanism for aggregate retrievals. Paper presented at the Sixth Annual ACT-R Workshop at George Mason University. Retrieved from: http://act-r.psy.cmu.edu/wordpress/wp-content/themes/ ACT-R/workshops/1999/talks/blending.pdf Lebiere, C., Gonzalez, C., & Martin, M. (2007). Instance- based decision making model of repeated binary choice. In Proceedings of the 8th International Conference on Cogni- tive Modeling (pp. 67-72). Oxford, UK: Psychology Press. Retrieved from: http://repository.cmu.edu/cgi/viewcontent.cgi ?article=1083&context=sds Lebiere, C., Gonzalez, C., & Warwick, W. (2009). A compara- tive approach to understanding general intelligence: Predicting cognitive performance in an open-ended dynamic task. In Go- ertzel, B., Hitzler, P., & Hutter, M., (Eds.), Proceedings of the Second Conference on Artificial General Intelligence, 103-107. Amsterdam-Paris: Atlantis Press. doi:10.2991/agi.2009.2 Lee, M. D., Zhang, S., Munro, M., & Steyvers, M. (2011). Psychological models of human and optimal performance in bandit problems. Cognitive Systems Research, 12, 164-174. doi:10.1016/j.cogsys.2010.07.007 Lejarraga, T., Dutt, V., & Gonzalez, C. (2012). Instance- based learning: A general model of repeated binary choice. Journal of Behavioral Decision Making, 25(2), 143-153. doi:10.1002/bdm.722 Martin, M. K., Gonzalez, C., & Lebiere, C. (2004). Learn- ing to make decisions in dynamic environments: ACT-R Plays the beer game. In Proceedings of the Sixth International Conference on Cognitive Modeling (pp. 178-183). Mahwah, NJ: Erlbaum. Retrieved from: http://repository.cmu.edu/cgi/ viewcontent.cgi?article=1087&context=sds Nevo, I., & Erev, I. (2012). On surprise, change, and the effect of recent outcomes. Frontiers in Psychology, 3, 1-9. doi:10.3389/fpsyg.2012.00024 Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108. doi:10.1037/0033-295X.85.2.59 Ratcliff, R., & Smith, P. (2004). A comparison of sequential sam- pling modles for two-choice reaction time. Psychological Review, 111, 333–367. doi:10.1037/0033-295X.111.2.333 Rapoport, A., & Budescu, D.V. (1992). Generation of random se- ries in two-person strictly competitive games. Journal of Experi- mental Psychology: General, 121, 352–363. doi:10.1037/0096- 3445.121.3.352 Rapoport, A., Erev, I., Abraham, E. V., & Olson, D. E. (1997). Randomization and adaptive learning in a simplified poker game. Organizational Behavior and Human Decision Processes, 69(1), 31-49. doi:10.1006/obhd.1996.2670 Scheres, A. & Sanfey, A.G. (2006). Individual differences in decision-making: drive and reward responsiveness affects strate- gic bargaining in economic games. Behavioral and Brain Func- tions, 2, 35. doi:10.1186/1744-9081-2-35 Suppes, P., & Atkinson, R.C. (1959). Markov Learning Models for Multiperson Situations, I. The Theory. Technical Report Prepared under Contract Nonr 255(17)(NR 171-034), 21, 1-78. Retrieved from: http://suppes-corpus.stanford.edu/techreports/ IMSSS_21.pdf Tolman, E.C. (1925). Purpose and cognition: The determin- ers of animal learning. Psychological Review, 32, 285–297. doi:10.1037/h0072784 Tversky, A., & Kahneman, D. (1992). Advances in prospect the- ory: Cumulative representation of uncertainty. Journal of Risk Uncertainty, 9, 195–230. doi:10.1007/BF00122574 Vandekerckhove, J., & Tuerlinckx, F. (2007). Fitting the Ratcliff diffusion model to experimental data. Psychonomic Bulletin & Review, 14, 1011-1026. doi:10.3758/PBR.15.6.1229 Appendix Decision Rule A choice is made in the model in trial t+1 as the selec- tion of an alternative with the highest blended value as per Equation 1 (below). Blending and activation mechanisms The blended value of alternative j is defined as Vj = n∑ i=1 pixi (1) Where xi is the value of the observed outcome in the outcome slot of an instance i corresponding to the al- ternative j, and pi is the probability of that instance’s retrieval from memory (for the case of our binary- choice task in the experience condition, the value of j in Equation 1 could be either Risky or Safe). The blended value of an alternative is the sum of all ob- served outcomes xi in the outcome slot of correspond- ing instances, weighted by the instances’ probability of retrieval. Probability of Retrieving Instances In any trial t,the probability of retrieving instance i from memory is a function of that instance’s activa- tion relative to the activation of all other instances corresponding to thatalternative, given by Pi,t = e Ai,t π∑ j e Ai,t π (2) Where π is random noise defined as s× √ 2 s and is a free noise parameter. The noise parameter s captures the imprecision of retrieving instances from memory. Activation of Instances The activation of each instance in memory depends upon the activation mechanism originally proposed in ACT-R [2]. According to this mechanism, for each trial t, activation Ai,t of instance i is: Ai,t = ln ( ∑ ti∈1,...,t−1 (t− ti)−d) + s× ln ( 1 −yi,t yi,t ) (3) Where d is a free decay parameter, and ti is a pre- vious trial when the instance i was created or its acti- vation was reinforced due to an outcome observed in 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 9 http://dx.doi.org/10.1177/0956797610387443 http://dx.doi.org/10.1037/0033-295X.115.1.263 http://act-r.psy.cmu.edu/wordpress/wp-content/themes/ACT-R/workshops/1999/talks/blending.pdf http://act-r.psy.cmu.edu/wordpress/wp-content/themes/ACT-R/workshops/1999/talks/blending.pdf http://repository.cmu.edu/cgi/viewcontent.cgi?article=1083&context=sds http://repository.cmu.edu/cgi/viewcontent.cgi?article=1083&context=sds http://dx.doi.org/10.2991/agi.2009.2 http://dx.doi.org/10.1016/j.cogsys.2010.07.007 http://dx.doi.org/10.1002/bdm.722 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1087&context=sds http://repository.cmu.edu/cgi/viewcontent.cgi?article=1087&context=sds http://dx.doi.org/10.3389/fpsyg.2012.00024 http://dx.doi.org/10.1037/0033-295X.85.2.59 http://dx.doi.org/10.1037/0033-295X.111.2.333 http://dx.doi.org/10.1037/0096-3445.121.3.352 http://dx.doi.org/10.1037/0096-3445.121.3.352 http://dx.doi.org/10.1006/obhd.1996.2670 http://dx.doi.org/10.1186/1744-9081-2-35 http://suppes-corpus.stanford.edu/techreports/IMSSS_21.pdf http://suppes-corpus.stanford.edu/techreports/IMSSS_21.pdf http://dx.doi.org/10.1037/h0072784 http://dx.doi.org/10.1007/BF00122574 http://dx.doi.org/10.3758/PBR.15.6.1229 http://dx.doi.org/10.11588/jddm.2015.1.17663 Dutt & Gonzalez: Accounting for outcome and process measures the task (the instance i is the one that has the ob- served outcome as the value in its outcome slot). The summation will include a number of terms that coin- cides with the number of times an outcome has been observed in previous trials and the corresponding in- stance i’s activation that has been reinforced in mem- ory (by encoding a timestamp of the trial ti). There- fore, the activation of an instance corresponding to an observed outcome increases with the frequency of ob- servation and with the recency of those observations. The decay parameter d affects the activation of an in- stance directly, as it captures the rate of forgetting or reliance on recency. Noise in Activation The yi,t term is a random draw from a uniform distri- bution U(0, 1), and the s× ln ( 1−yi,t yi,t ) term represents Gaussian noise important for capturing the variability of human behavior. Pre-populated Instances in Memory For the first trial,the IBL model does not haveany in- stances in memory from which to calculate blended values. Therefore, the model is made to make a selec- tion between instances that are pre-populated in mem- ory. Lejarraga, Dutt, and Gonzalez [23] used a value of +30 in the outcome slot of the two alternatives’ in- stances. The +30 value is arbitrary, but most impor- tantly, it is greater than any possible outcomes in the TPT problems and will trigger an initial exploration of the two alternatives. We use these pre-populated values in the model in this paper. 10.11588/jddm.2015.1.17663 JDDM | 2015 | Volume 1 | Article 2 | 10 http://dx.doi.org/10.11588/jddm.2015.1.17663