Original Research Modeling decisions from experience: How models with a set of parameters for aggregate choices explain individual choices Neha Sharma and Varun Dutt Applied Cognitive Science Laboratory, Indian Institute of Technology Mandi, Kamand, India - 175005 One of the paradigms (called “sampling paradigm”) in judgment and decision-making involves decision-makers sample information before making a final consequential choice. In the sampling paradigm, certain computational models have been proposed where a set of single or dis- tribution parameters is calibrated to the choice propor- tions of a group of participants (aggregate and hierarchical models). However, currently little is known on how ag- gregate and hierarchical models would account for choices made by individual participants in the sampling paradigm. In this paper, we test the ability of aggregate and hierar- chical models to explain choices made by individual partic- ipants. Several models, Ensemble, Cumulative Prospect Theory (CPT), Best Estimation and Simulation Tech- niques (BEAST), Natural-Mean Heuristic (NMH), and Instance-Based Learning (IBL), had their parameters cal- ibrated to individual choices in a large dataset involving the sampling paradigm. Later, these models were gen- eralized to two large datasets in the sampling paradigm. Results revealed that the aggregate models (like CPT and IBL) accounted for individual choices better than hierar- chical models (like Ensemble and BEAST) upon gener- alization to problems that were like those encountered during calibration. Furthermore, the CPT model, which relies on differential valuing of gains and losses, respec- tively, performed better than other models during cali- bration and generalization on datasets with similar set of problems. The IBL model, relying on recency and fre- quency of sampled information, and the NMH model, relying on frequency of sampled information, performed better than other models during generalization to a chal- lenging dataset. Sequential analyses of results from dif- ferent models showed how these models accounted for transitions from the last sample to final choice in human data. We highlight the implications of using aggregate and hierarchical models in explaining individual choices from experience. Keywords: Aggregate choice, individual choice, sampling paradigm, decisions from experience, computational models, likeli- hood With the advent of Internet, online shopping forproducts has gained popularity (Stevens, 2016). For making satisfying online purchases, a consumer could first sample information about different prod- ucts and then make a choice for the preferred item (Horrace et al., 2009). However, the act of making choices based upon sampled information is not lim- ited to choosing between different products; rather, it is a very common exercise involving different facets of our daily lives (e.g., choosing food items, life part- ners, and careers). In fact, information search before a choice constitutes an integral part of Decisions from Experience (DFE) research, where the focus is on ex- plaining human decisions based upon one’s experience with sampled information (Hertwig & Erev, 2009). To study people’s information search and conse- quential choice behaviors in the laboratory, researchers have proposed the “sampling paradigm” (Hertwig & Erev, 2009). In the sampling paradigm, people are presented with two or more options to choose between. These options are represented as blank buttons on a computer screen. People are first asked to sample as many outcomes as they wish from different button op- tions (information search). Once people are satisfied with their sampling of options, they decide from which option to make a single consequential choice for actual awards. Several computational cognitive models have been proposed in the sampling paradigm, where these mod- els help explain how people search for information and make consequential choices (Erev et al., 2010; Gon- zalez & Dutt, 2011). Some of these models have a set of parameter values calibrated to each individual participant (called “individual models”; Busemeyer & Diederich, 2010; Kudryavtsev & Pavlodsky, 2012; Frey, Mata, & Hertwig, 2015). The parameter cal- ibration exercise in these models results in a set of parameter values per individual participant, where the number of parameter sets from a model equal the num- ber of participants in data. For example, Kudryavt- sev and Pavlodsky (2012) tested three variations of two models, Prospect Theory (PT) (Kahneman & Tversky, 1979) and Expectancy-Valence (EVL) (Buse- meyer & Stout, 2002) by calibrating model parame- ters to each participant’s choice. As another example, Shteingart, Neima and Loewenstein (2013) modeled many repeated choices of individual participants in the Corresponding author: Varun Dutt, Applied Cognitive Science Laboratory, Indian Institute of Technology Mandi, Kamand, District Mandi - 175 005, H.P., India. e-mail: varun@iitmandi.ac.in 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 1 mailto:varun@iitmandi.ac.in http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Technion Prediction Tournament (TPT) dataset, con- sidering a specific reinforcement-learning algorithm. These authors showed that there was a substantial effect of the first experience on choice behavior and this behavior could be accounted by the reinforce- ment learning model if the outcome of first experience rested the values of the experienced actions. Similarly, Frey, Mata, and Hertwig (2015) presented a modeling analysis at the individual level showing that a simple delta-learning rule model with parameters calibrated to younger and older adults separately best described the learning processes for both these age groups. Furthermore, certain computational models have been proposed where model parameters are calibrated to the choice proportions of a group of participants (called “aggregate models”; Busemeyer & Diederich, 2010; Estes & Maddox, 2005). Here, a single set of values of model parameters is calibrated to the av- erage decision computed across several participants (Busemeyer & Diederich, 2010; Erev, Ert, Roth, et al., 2010; Gonzalez & Dutt, 2011; 2012; Lejarraga, Dutt, & Gonzalez, 2012). The calibration exercise re- sults in only one set of values of parameters from a model and these parameters explain the averaged de- cision computed across all participants. For example, Gonzalez and Dutt (2011) calibrated one set of values for three parameters in an Instance-Based Learning (IBL) model to the risky proportions averaged over all participants and problems in different DFE datasets. Similarly, Erev et al. (2010) compared several mod- els, each with a single set of values for parameters, in their ability to capture average risk-taking in the TPT datasets. There is still a third approach to model calibra- tion where model parameters follow certain distribu- tions (possessing density functions) that are defined across the choice proportions of a group of partici- pants (called “hierarchical models”; Lee, 2008; Rouder & Lu, 2005). For example, in the Choice Prediction Competition (Erev, Ert, Plonsky, et. al., 2015), the Best Estimation and Simulation Techniques (BEAST) model was hierarchical and it contained a set of distri- bution parameters that were calibrated to the choice proportions across many participants. Although literature has focused on calibrating pa- rameters of individual, hierarchical, and aggregate models (Estes & Maddox, 2005; Gonzalez & Dutt, 2011; Rouder & Lu, 2005); however, little is currently known on how aggregate or hierarchical models and their set of single or distribution parameter values, re- spectively, account for decisions of individual partici- pants. In this paper, we address this question by con- sidering both aggregate and hierarchical models with a set of single or distribution parameter values and eval- uate how these models explain individual choices. We perform our evaluation by calibrating and generalizing a set of parameter values in aggregate or hierarchi- cal models to choices made by individual participants in large publically available datasets in the sampling paradigm. For example, the aggregate IBL model con- sists of a set of two parameters, d and σ where these two parameters possess single values and explain the average risk-taking in DFE datasets (Dutt & Gonza- lez, 2012; Gonzalez & Dutt, 2011; 2012). In this paper, however, we recalibrate the d and σ parameters in the IBL model by assigning them a value each to predict individual choices in DFE datasets. The aggregate models that possess a single set of parameter values and predict aggregated choices, i.e., choices that are averaged over several participants, may or may not explain individual choices well. One reason for this expectation is that if several individuals learn linearly at different points in time, then the aver- age learning curve is likely to be curvilinear (Gallistel et al., 2004). Thus, even if models with a single set of parameter values explain a group’s aggregate curvi- linear learning, it is possible that such models may not explain individual linear behavior. Another rea- son why these models may not explain individual be- havior is due to the degree of heterogeneity present in individual choices (Busemeyer & Diederich, 2010): A single set of parameter values may not be sufficient to explain many individual choices. However, hierarchi- cal models possess a set of distribution parameters. If these models account for aggregate choices, then they are also likely to account for individual choices. That is because the parameter values are resampled in a hi- erarchical model from their density functions for each individual participant and this resampling may allow these models to account for individual choices. In addition, there seems to be a tradeoff between ag- gregate models (like IBL; Dutt & Gonzalez, 2012) that possess cognitive mechanisms (like recency, frequency, and blending of outcomes) and a single set of param- eter values that are fixed across individuals; and, hi- erarchical models (like BEAST; Erev, Ert, Plonsky, et al., 2015) that possess mathematical functions to account for individual biases with a set of parame- ters that vary across individuals according to distribu- tions. On one hand, one expects that aggregate models with cognitive mechanisms and a set of single parame- ters would account for individual choices; however, one may also expect that hierarchical models with mathe- matical functions and a set of distribution parameters would also account for individual choices. In this paper, we test these expectations by tak- ing both aggregate and hierarchical models where these models’ parameters are calibrated to individual choices. Furthermore, using the sampling paradigm, we also evaluate the sequential decisions of partici- pants from their last sample to final choice as ac- counted by different aggregate and hierarchical mod- els. This sequential analysis helps us showcase the ability of aggregate models in accounting for individ- ual differences in decisions with a set of single or dis- tribution parameters. To calibrate aggregate and hierarchical model pa- rameters to individual choices, we use the estimation dataset from TPT (Erev, Ert, Roth, et al., 2010), the largest publically available DFE dataset. We compare calibrated aggregate and hierarchical models by gen- eralizing them to two different DFE datasets in the 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 2 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices sampling paradigm. Furthermore, we investigate an aggregate or hierarchical model’s ability in capturing individual differences in data with a set of single or dis- tribution parameter values. In what follows, we first motivate our model choices, different datasets used, and the working of different models. Furthermore, we discuss the method used for calibrating a set of single or distribution parameters in models to choices made by individual participants. Finally, we present the results of model evaluations both during calibration and generalization and close the paper by discussing the implications of our results for predicting individual choices from experience. Models in the sampling paradigm Two classes of models have been proposed in the sam- pling paradigm (Hertwig, 2012): associative-learning models (e.g., Instance-Based Learning) and cognitive heuristics (e.g., Natural Mean Heuristic). Among the associative-learning class, human choice is conceptual- ized as a learning process (for example, see Busemeyer & Myung, 1992; Bush & Mosteller, 1955). Learning is captured by changing the propensity to select a gam- ble based on the experienced outcomes. Good expe- riences boost the propensity of choosing the gamble associated with them, and bad experiences diminish it (e.g., Barron & Erev, 2003; Denrell, 2007; Erev & Barron, 2005; March, 1996). Some of the models in the associative class include the Instance-Based Learn- ing (IBL) model (Dutt & Gonzalez, 2012; Gonzalez and Dutt, 2011; 2012, Lejarraga, Dutt, & Gonzalez, 2012), Value Updating model (Hertwig et al., 2004), and Fractional Adjustment model (March, 1996). The IBL model (Dutt & Gonzalez, 2012; Gonzalez & Dutt, 2011; 2012, Lejarraga, Dutt, & Gonzalez, 2012) con- sists of experiences (called instances) stored in mem- ory. Each instance’s activation is a function of the frequency and recency of the corresponding outcomes observed during sampling in different options, where the activation function is borrowed from the Adap- tive Control of Thought - Rational (ACT-R) cognitive framework (Anderson & Lebiere, 1998). Activations are used to calculate the blended value for each op- tion and the model makes a final choice for the option with the highest blended value. Gonzalez and Dutt (2011; 2012) showed that an aggregate IBL model with three parameters performed efficiently in accounting for choices aggregated over many participants across two DFE paradigms. In fact, this IBL model was over- all the best model in explaining aggregate choices with fewest parameters. The second class of models are referred to as cog- nitive heuristics and this class aims to describe both the process and outcome of choice as heuristic rules (Brandstätter et al., 2006; Hertwig, 2012). A popu- lar cognitive heuristic that focuses on the expected- value of outcomes obtained during sampling is the Natural-Mean Heuristic (NMH) (Hertwig & Pleskac, 2010; Hertwig, 2012). As per Hertwig (2012), the NMH model has the following interesting properties: (a) it is well tailored to sequentially encountered out- comes; and, (b) it arrives at its choice prediction by the expected-value of options based upon sampled out- comes. Two other heuristics proposed in the cognitive- heuristic class include the Maximax Heuristic (Hau et al., 2008) and the Lexicographic Heuristic (Luce & Raifa, 1957). In Maximax heuristic, the option with best possible outcome, no matter how likely it is, is chosen. A lexicographic heuristic generally con- sists of three building blocks (Gigerenzer & Goldstein, 1996): Search rule: Look up attributes in order of validity. Stopping rule: Stop search after the first at- tribute discriminates between alternatives. Decision rule: Choose the alternative that this attribute favors. Hau et al. (2008) and Brandstätter et al. (2006) have shown that both these heuristics seem to un- derperform compared to the NMH model. Further- more, a very commonly used baseline heuristic is the Primed-Sampler (PS) model (Erev, Glozman & Her- twig, 2008). The PS model depends upon the recency of sampled information and it looks few samples back on each option during sampling before making a fi- nal choice (Gonzalez & Dutt, 2011). A variant of the PS model is the PS model with variability (Erev, Ert, Roth, et al., 2010). In this model variant, the look- back sample size k is varied between participants and problems. The PS model with variability is a special case of the NMH model (as the NMH model looks back the entire sample size while deriving a choice). Furthermore, Hau et al. (2008) have shown that a Cumulative Prospect Theory (CPT) model (Tversky & Kahneman, 1992), which is a popular mathemati- cal model (sometimes referred to as a “measurement model” or an “as-if” model), seems to perform about the same as the NMH model to account for aggregated choices. In the CPT model, a weighing function and a value function is associated with each probability and outcome, respectively. The model chooses the option that has the highest prospect value, where the prospect value is determined by multiplying the value with its corresponding weight. Furthermore, a linear combi- nation heuristic model (Ensemble) was submitted to TPT (Erev, Ert, Roth, et al., 2010). The Ensemble model contains four heuristic rules, PS, CPT, Prior- ity Heuristic (PH), and NMH, and it was shown to be the best model in the sampling paradigm. Most recently, Erev, Ert, Plonsky, et al. (2015) proposed the BEAST model, which consisted of several heuris- tic rules like expected value and mental simulations with a set of distribution parameters. The BEAST model performed well to capture 14-different aggre- gate phenomena in the 2015 Choice Prediction Com- petition. The 14-different aggregate phenomena refer to anomalies such as Ellsberg paradox, Allais para- dox, Reflection effect and others described by Erev, Ert, Plonsky, et al. (2015). Across the associative-learning models, mathemati- cal models, and cognitive heuristics, there are aggre- gate models that possess a single set of parameter val- ues and predict aggregated choices, i.e., choices that 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 3 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices are averaged over several participants (Busemeyer & Wang, 2000; Dutt & Gonzalez, 2012; 2015; Gonza- lez & Dutt, 2011; 2012; Lejarraga, Dutt, & Gonza- lez, 2012). Also, there exist hierarchical models that possess a set of distribution parameters to predict ag- gregated choices, i.e., choices that are averaged over several participants (Erev, Ert, Plonsky, et al., 2015; Lee, 2008; Rouder & Lu, 2005). The IBL, NMH, and CPT models are aggregate models (possessing a set of single parameter values); whereas, the BEAST and Ensemble models are hier- archical models and they possess a set of distribution parameter values. Within the aggregate and hierar- chical models, some of the models (like IBL) possess cognitive processes like recency, frequency, or blend- ing; whereas, other models (like CPT and Ensemble) possess mathematical functions that account for biases in people’s decisions. If possessing a set of distribution parameter values helps models to account for individ- ual choices, then we expect hierarchical models like BEAST and Ensemble to perform well in explaining individual choices. In contrast, if possessing cogni- tive mechanisms helps models account for individual choices, then we expect models like IBL to perform well in explaining individual decisions. In contrast, if mathematical functions can accurately account for biases in individual decisions, then we expect that models like CPT and Ensemble to perform well in explaining individual choices. We test these expec- tations in this paper by calibrating different models to human data in large datasets involving the sampling paradigm. Model selection Among all associative-learning models, the IBL model (Dutt & Gonzalez, 2012; Lejarraga, Dutt, & Gonza- lez, 2012) has been shown as the best performing ag- gregate model in the sampling paradigm (Gonzalez & Dutt, 2011; 2012). Gonzalez and Dutt (2011) showed that the IBL model accounts for aggregate final choices with a small error. Thus, we choose the IBL model as one of the models for our evaluation. For this purpose, we first test the original IBL model (called IBL (LDG) model; Lejarraga, Dutt, & Gonzalez, 2012) in explain- ing individual choices with a set of parameter values. Next, we recalibrated a set of parameter values of this model to individual choices (called IBL (TPT) model) in the TPT dataset. Popular maximax and lexicographic heuristics (Hau et al., 2008; Luce & Raifa, 1957) have underper- formed compared to the NMH model (Brandstätter et al., 2006; Hau et al., 2008). The NMH model has been reported in literature as explaining aggre- gate final choices in the sampling paradigm (Hau et al., 2008; Hertwig, 2012). Thus, we chose the NMH model as another aggregate model for evaluating indi- vidual choices. Furthermore, Hau et al. (2008) have also shown that different variants of the CPT model (Tversky & Kah- neman, 1992) perform about the same as the NMH model to account for aggregate choices. Due to these reasons, we consider three variants of CPT model for our evaluation. The first, CPT (TK) model, is based upon parameters defined by Tversky and Kahneman (1992). The second, CPT (Hau) model, is based upon recalibrated parameters from Hau et al. (2008) to de- rive aggregated final choices. The third, CPT (TPT) model, has its parameters recalibrated to individual choices in the TPT dataset. Erev et al. (2010) have shown the hierarchical En- semble model, consisting of the PS, CPT, PH, and NMH models, to perform best in TPT’s E-sampling condition.1 Given that the Ensemble model contains a collection of several popular heuristic models, we con- sider two variants of this model for our evaluation: Ensemble (TPT) model, which used parameters pro- posed by Erev et al. (2010); and, Ensemble (Individ- ual), where we recalibrated a set of parameter values of this model to individual choices in the TPT dataset. In addition to the above models, we also consid- ered the hierarchical BEAST model, which has re- cently been shown to account for 14-different phenom- ena in aggregate choices (Erev, Ert, Plonsky, et. al., 2015). We considered two variants of the BEAST model: BEAST (CPC) model, which was based on the same set of distribution parameters as reported by Erev, Ert, Plonsky, et. al. (2015); and, BEAST (TPT), which consisted of a set of distribution pa- rameters calibrated to individual choices in the TPT dataset. The Technion Prediction Tournament datasets The Technion Prediction Tournament (TPT) (Erev et al., 2010) was a competition in which several partic- ipants were subjected to an experimental setup, the E-sampling condition. In this condition, participants sampled the two blank button options in a problem be- fore making a final consequential choice for one of the options. During sampling, participants were free to click both button options one-by-one and observe the resulting outcome. Participants were asked to press the "choice-stage" key when they felt that they had sampled enough (but not before sampling at least once from each option). The outcome of each sample was determined by the structure of the relevant problem. One option corresponded to a choice where each sam- ple provided a medium (M) outcome. The other option corresponded to a choice where each sample provided a High (H) payoff with some probability (pH) or a low (L) payoff with the complementary probability (1 - pH). At the choice stage, participants were asked to select once between the two options. Their choice yielded a random draw of one outcome from the se- lected option and this outcome was considered at the end of the experiment to determine the final payoff. Competing models submitted to TPT were evaluated 1The CPT model within this Ensemble model estimates the weighting function using approximations. 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 4 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices following the generalization criterion method (Buse- meyer &Wang, 2000). As per the generalization crite- rion method, models were calibrated to aggregate hu- man choices in 60 problems (the estimation set) and later tested in a new set of 60 problems (the compe- tition set) with the set of parameters obtained in the estimation set. The M, H, pH, and L in a problem were generated randomly, and a selection algorithm was used so that the 60 problems in each set differed in its M, H, pH, and L from other problems. For more details about the TPT, please refer to Erev, Ert, Roth, et al. (2010). In all the models described here, we have consid- ered an individual human or model participant play- ing a problem in a dataset as an individual observa- tion. Also, all model parameters have been calibrated by using the estimation dataset from TPT that con- sisted of 60 problems and 1,170 observations.2 In the experiment involving the TPT’s estimation dataset, forty participants were randomly assigned to two dif- ferent sub-groups, where each sub-group contained 20 participants who were presented with a representa- tive sample of 30 problems. Next, calibrated models were generalized on 60 problems from TPT’s compe- tition set (composed of 1,200 observations) and the Six-Problems (SP) dataset (Hertwig et al., 2004; com- posed of 150 observations). In the experiment involv- ing the TPT’s competition dataset, forty new partic- ipants were randomly assigned to two different sub- groups, where each sub-group contained 20 partici- pants who were presented with a representative sample of 30 problems. In the experiment involving the Six- Problems (SP) dataset, fifty participants were equally divided into two groups, where one group played the first three problems and the other group played the remaining three problems. Working of Models In this section, we detail the working of aggregate or hierarchical models with a set of point or distribu- tion parameters values calibrated to individual choices. In every model, the final choice for each individual observation is estimated by using the following soft- max function (Bishop, 2006; Daw, 2011; Sutton & Barto,1998): Prob(OptionX) = eSM eanX eSM eanX + eSM eanY (1) where, SMeanX and SMeanY are the sample means or expectations of the two options X and Y for a model participant in a problem; and, Prob(Option X) is the probability of choosing Option X by a model partic- ipant. If Option X was chosen by a human partici- pant in a problem, then the Prob(Option X) is used to calculate the log-likelihood from a model given its parameters. The log-likelihood function L is defined as: L = N∑ i=1 ln (Prob(OptionXi)) (2) Where, i refers to the ith observation (a combi- nation of a participant playing a problem) and N is the total number of observations in human data.3 The refers to the natural log and the log-likelihood is negative as Prob(Option X) is a proportion. The log-likelihoods measure the goodness-of-fit for individ- ual choices from a model and greater log-likelihoods values imply better fits from a model (Busemeyer & Diederich, 2010). As suggested by Busemeyer and Diederich (2010), in this paper, to calibrate aggre- gate or hierarchical model parameters, we minimize L . That is because our goal is to derive the likeli- hood of a model making the same choice as made by a human participant. We detail more about this calibra- tion process in a future section. Next, we detail the working of models that we considered for evaluating individual choices. Ensemble Model The Ensemble model (Erev et al., 2010) assumes that each choice is made based on one of four equally likely rules and the predicted choice rate is a simple average across the predictions of four different rules. The first rule is similar to the Primed-Sampler model with vari- ability (Erev, Glozman, & Hertwig, 2008). Decision- makers are assumed to sample each option m times, and select the option with the highest sample mean. The value of m is uniformly drawn from the set 1, 2, 3,. . . , 9. The second rule is identical to the first, but m is drawn from the distribution of sample sizes ob- served in the estimation set, with samples larger than 20 treated as 20. The third rule in the Ensemble model is a stochastic variant of CPT (Tversky & Kahneman, 1992), where the weighting function is approximated based upon certain parameters (the model does not use the sampling data to determine the weighting func- tion). The final rule is a stochastic version of the lexi- cographic priority heuristic (Brandstätter et al., 2006; Rieskamp, 2008). The probabilities with which search orders for final rule were porder1 and porder2. The first order begins by comparing minimum outcomes (i.e., minimum gain or minimum loss depending on the do- main of gambles), then their associated probabilities, and finally the maximum outcomes. The second order begins with probabilities of the minimum outcomes, then proceeds to check minimum outcomes, and ends with the maximum outcomes (the probabilities with which both search orders are implemented were deter- mined from the estimation set). The Ensemble model 2The data of one observation was missing in the original esti- mation dataset downloaded from the website. 3N = 1,170 observations in TPT’s estimation set. 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 5 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices computes expectations for choosing options from its constituting models. These expectations are averaged to give a net expectation. Given a human participant’s choice, the net expectation (averaged across all rules) is used to calculate the log-likelihood (using equation 2). In one version of the Ensemble model (called En- semble (Herzog)), we used the original parameters pro- posed in Erev et al. (2010) to evaluate the model against individual choices. However, in a second ver- sion of the same model (called Ensemble (TPT)), we calibrated a set of Ensemble model’s distribution pa- rameters to individual choices using the log-likelihood function. The Ensemble (TPT) model had a set of 11 parameters. These parameters were assigned sin- gle values when they were recalibrated to individual choices. Among the 11 parameters, α, β,γ, δ, λ, µ belonged to the stochastic variant of CPT; while, pa- rameters To, Tp, σ, porder1, and porder2 were part of the priority heuristic. The σ was a free distribution parameter that defined the variance of a normal distri- bution. If the subjective difference involving the first comparison in each search order exceeds a threshold t, then the more attractive option is selected based on this comparison; otherwise, the next comparison is executed. The values of thresholds are other free distribution parameters. The estimated values are To for the minimum- and maximum- based comparisons, and Tp for the probability-based comparison (both To and Tp enabled define the mean of the normal distri- bution). The α, β, γ, δ, λ, µ, σ and To parameters were varied between 0 and 1.5, σ and To were varied between 0 and 1 while probabilistic parameters poder1, porder2 and Tp were varied between 0 and 1.0. These ranges ensured that the optimization could capture the optimal parameter values with high confidence. Dur- ing model calibration, the initial parameter population was set to parameters from Erev et al. (2010). Natural Mean Heuristic (NMH) Model The NMH model (Hertwig & Pleskac, 2010) involves the following steps: Step 1. Calculate the natural mean of observed out- comes for each option by summing, separately for each option, all n experienced outcomes and then dividing by n. Step 2. Apply equation 1, where the sample mean for an option is replaced by its natural mean. In the NMH model, there are no free parameters. Like the Ensemble model, we evaluate the log-likelihood value from the NMH model (using equation 2). Instance-Based Learning (IBL) Model The IBL model (Dutt & Gonzalez, 2012; Gonzalez & Dutt, 2011; 2012; Lejarraga, Dutt, & Gonzalez, 2012) is based upon the ACT-R cognitive framework (Ander- son & Lebiere, 1998). In this model, every occurrence of an outcome of an option is stored in the form of an instance in memory. An instance is made up of the fol- lowing structure: SDU, here S is the current situation (many blank option buttons on a computer screen), D is the decision made in the current situation (choice for one of the option buttons), and U is the goodness (utility) of the made decision (the outcome obtained upon making a choice for an option). When a deci- sion choice needs to be made, instances belonging to each option are retrieved from memory and blended together. The blended value of an option j (e.g., a gamble that pays $5 with 0.9 probability or $0 with probability 0.1) at any trial t is defined as: Vj,t = n∑ i=1 pi,j,txi,j,t (3) where xi,j,t is the value of the U (outcome) part of an instance (e.g., either $5 or $0, in the previous example) i on option j in trial t. The pi,j,t is the probability of retrieval of instance i on option j from memory in trial t. Because xi,j,t is value of the U part of an instance i on option j in trial t, the number of terms in the summation changes when new outcomes are observed within an option j (and new instances corresponding to observed outcomes are created in memory). Thus, n=1 if j is an option with one possible outcome. If j is an option with two possible outcomes, then n=1 when one of the outcomes has been observed on an option (i.e., one instance is created in memory) and n=2 when both outcomes have been observed (i.e., two instances are created in memory). In any trial t, the probability of retrieval of an in- stance i on option j is a function of the activation of that instance relative to the activation of all instances (1, 2, . . . n) created within the option j, given by: pi,j,t = e(Ai,j,t)/τ∑n i=1 e (Ai,j,t)/τ (4) where τ, is random noise defined as σ ∗ √ 2 and σ is a free noise parameter. Noise in Equation (4) cap- tures the imprecision of recalling past experiences from memory. Activation of an instance is a function of the frequency and recency of observed outcomes that occur on choosing options during sampling. The activation of an instance i corresponding to an observed outcome on an option j in a given trial t is a function of the frequency of the outcome’s past occurrences and the recency of the outcome’s past occurrences (as done in ACT-R). In each trial t, activation Ai,j,t of an instance i on option j is given by: 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 6 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices where d is a free decay parameter; γi,j,t is a ran- dom draw from a uniform distribution bounded be- tween 0 and 1 for instance i on option j in trial t; and Tp is each of the previous trials in which the out- come corresponding to instance i was observed in the binary-choice task. The IBL model has two free pa- rameters that need to be calibrated: d and σ. The d parameter controls the reliance on recent or distant sampled information. Thus, when d is large (> 1.0), then the model gives more weight to recently observed outcomes in computing instance activations compared to when d is small (< 1.0). The σ parameter helps to account for the sample-to-sample variability in an in- stance’s activation. Thus, blended value of each option is a function of activation of instances corresponding to outcomes observed on the option. In this model, we feed the sampling of individual human participants to generate instance activations and blended values. Ev- ery time a choice is made and outcome is observed, the instance associated with it is activated and thereafter blended values are computed for options faced by an individual participant. At final choice, the likelihood is computed from the blended values that replace the option means in Equation 1. In one version of the model, IBL (LDG), we used single value of parameters suggested by Lejarraga, Dutt, and Gonzalez (2012) to test the model against individual choices. However, in a second version of the model, IBL (TPT), we cali- brated a set of d and σ parameters in the IBL model to individual choices. For this calibration, we deter- mine the model’s log-likelihood value for making the same choice as made by each human participant. Dur- ing optimization, both d and σ parameters were varied between 0 and 20. These ranges ensured that the opti- mization could capture the optimal parameter values with high confidence. During parameter calibration, the initial parameter population was set to parame- ters from Lejarraga, Dutt, and Gonzalez (2012). Cumulative Prospect Theory (CPT) Model The CPT model (Hau et al., 2008; Tversky & Kah- neman, 1992) assumes that people first form subjec- tive beliefs of the probability of events, and then enter these beliefs into cumulative prospect theory’s weight- ing function (Fox & Tversky, 1998; Tversky & Fox, 1995). Similarly, people associate a value (utility) cor- responding to outcomes observed in options. The CPT consists of the following four steps: Step 1. Assess the sample probability, pj, of the nonzero outcome in given option j. Step 2. Calculate the expected gain (loss) of option j, Ej Ej = w(pj)v(xj) (6) where w represents a weighting function for the probability experienced in the option j, and v repre- sents a value function for the experienced outcome xj in the option j. According to Tversky and Kahneman (1992), the weighting function w is defined as: w(pj) =   p γ j( p γ j + (1 −pj)γ )1/γ ,if x ≥ 0 pδj( pδj + (1 −pj)δ )1/δ ,if x < 0 (7) The γ and δ are adjustable parameters that fit the shape of the function for gains and losses, respectively. The weighting function w has an S-shape that under- weights small probabilities and overweighs larger ones (Hertwig, 2012). The x represents the outcome asso- ciated with the probability pj. The value function v is defined as: v(xj) = { xαj ,if xj ≥ 0 −λ ( |xj|β ) ,if xj < 0 (8) Here, α and β are adjustable parameters that fit the curvature for gain and loss domains, respectively. Fi- nally, the λ parameter (λ > 1) scales loss aversion. The xj represents the outcome associated with the option j. Step 3. Assess the prospect value of the option by multiplying the weight with the value obtained. Step 4. Given a human participant’s choice, cal- culate the log-likelihood value of model making this choice using Equation 1 and Equation 2. The prospect value replaces the sample mean in Equation 1. As seen above, the CPT model has 5 parameters, α, β, γ, δ, and λ; and, we investigated three versions of CPT model. In the first model, CPT (TK), we tested the set of parameter values estimated by Tversky and Kahneman (1992) against individual choices. In the second model, CPT (Hau), we tested the set of pa- rameter values estimated by Hau et al. (2008) against individual choices. In the third model, CPT (TPT), we recalibrated a set of parameter values in the CPT model to individual choices. All five parameters were varied between 0 to 5. These ranges ensured that the optimization could capture the optimal parameter val- ues with high confidence. During calibration, the ini- tial parameter population was set to parameters from Hau et al. (2008). Best Estimate and Simulation Techniques (BEAST) Model The BEAST model captures the joint effect of and the interaction between 14-choice phenomena at aggregate level discussed in the 2015 Choice Prediction Compe- tition (Erev, Ert, Plonsky, et. al., 2015). The first assumption in this model is to compute the expected 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 7 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices values of options (since people try to maximize pay- offs). The second assumption uses mental simulations that were found to lead to good outcomes in similar situations in the past (Marchiori, Di Guida, & Erev, 2015; Plonsky, Teodorescu, & Erev, 2015). Each sim- ulation uses four different techniques, unbiased, uni- form, contingent pessimism, and sign. The unbiased technique implies random and unbiased draws, either from an option’s described distributions or from an op- tion’s observed history of outcomes. The other three techniques are “biased” and imply overgeneralizations. They can be described as mental draws from distribu- tions that differ from the objective problem distribu- tions. The three biased techniques are each used with equal probability. The simulation technique uniform yields each of the possible outcomes with equal prob- ability. This technique enables the model to capture underweighting of rare events and the splitting effect.4 The simulation-technique contingent pessimism is like the priority heuristic (Brandstätter et al., 2006); it de- pends on the sign of the best possible payoff and the ratio of the minimum payoffs. This technique helps the model capture loss aversion and the certainty effect. The simulation technique sign implies high sensitivity to the payoff sign. It is identical to the technique un- biased, with one important exception: positive drawn values are replaced by R, and negative outcomes are replaced by -R, where R is the payoff range (the dif- ference between the best and worst possible payoffs in the current problem). This model has six distribu- tion parameters, σ, κ, β, γ, φ, and θ, where each of these parameters defines the upper bound of uniform distributions with a 0.0 lower bound (κ defined the upper bound of a discrete uniform distribution with a 0.0 lower bound). Four of these parameters (σ, κ, β, and γ) are needed to capture decisions under risk without feedback. The parameter φ captures attitude toward ambiguity, and θ abstracts the reaction to feed- back. In this model, the expectation for one of the op- tions, option A, equals BEV A(r) + ST A(r) + e(r) and that for the other option, option B, equals BEV B(r) + ST B(r). Here, BEV A(r) and BEV B(r) are the best estimates of the expected values of both options A and B after r samples; ST A(r) and ST B(r) are the expec- tations based on mental simulations techniques after r samples, and e(r) is an error term after r samples (e(r) is drawn from a normal distribution with a mean 0 and standard deviation σ). Given a human partici- pant’s choice, the expectations on different options are used to determine the log-likelihood in the model (us- ing Equation 1 and Equation 2). In one of the BEAST versions, BEAST (CPC), we used the set of parameter values reported by Erev, Ert, Plonsky, et. al. (2015) against individual choices. However, in another ver- sion, BEAST (TPT), we recalibrated the set of distri- bution parameter values to individual choices. All six parameters were varied between 0 to 20. These ranges ensured that the optimization could capture the op- timal parameter values with high confidence. During recalibration, the initial population of parameters was taken from Erev, Ert, Plonsky, et. al. (2015). Method Dependent variables In this paper, we account for final choices made by in- dividual participants in different problems. For this purpose, given a choice made by a human partici- pant in a problem, we calculate the log-likelihood of a model participant making the same choice in the same problem. In all models, if the probability of making a human participant’s choice is greater than 0.5, then it is assumed that the model choice coin- cides with the human choice. Using this 0.5 rule, we compare whether both model and human participants select the maximizing option in a problem. The maxi- mizing option is the one that has the highest expected value among both options (expected value is calcu- lated by using the objective probability distribution of outcomes in options). If both human participants and model participants select the maximizing option or the non-maximizing option in a problem, then the model can explain the human participant’s choice. Us- ing this method, in the TPT’s estimation set, the final choices made by model observations are compared to 1,170 human observations, i.e., the total number of hu- man observations available. The comparison between human choices and model choices is used to compute the incorrect proportion for each model, which is the main criteria for capturing individual behavior by a model. The incorrect proportion is simply a propor- tion of human choices that were different from model predictions. It is defined as: Incorrect Proportion = (MHNM + NHMM )/ (MHNM + NHMM + NHNM + MHMM ) (9) where, MHNM is the number of observations where the human participant makes a maximizing choice but the model predicts a non-maximizing choice. NHMM is the number of observations where the human partic- ipant makes a non-maximizing choice but the model predicts a maximizing choice. Similarly, the MHMM and NHNM are the number of observations, where the human participant observation makes the same choice (maximizing or non-maximizing) as predicted by the model. The smaller the value of the incorrect pro- portion, the more accurate is the model in account- ing for individual human choices. Once model param- eters were calibrated to individual choices using the log-likelihood function, the incorrect proportions were computed from different models and compared. 4According to Birnbaum (2008) and Tversky and Kahneman (1986) splitting an attractive outcome into two distinct out- comes can increase the attractiveness of a prospect even when it reduces its expected value. This phenomenon is referred to as the splitting effect. 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 8 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Parameter calibration Given the choice made by a human participant, we use Equation 1 and Equation 2 to compute the log- likelihood from a model of making the same choice as made by a human participant. Classically, Equation 1 has used an inverse temperature parameter β which scales the sample means (Busemeyer & Diederich, 2010). In this paper, we assume β = 1 across all models as we did not want to introduce an additional free parameter beyond those already present in mod- els. That is because, the β parameter’s recalibration to individual choices could benefit models differently. As β = 1 across all models, the β parameter does not favor some models over others. The NMH model did not require parameter calibra- tion as this model did not possess any parameters. The set of parameters of Ensemble, BEAST, CPT, and IBL models were recalibrated using a Genetic Algorithm (GA) program. The GA is a probabilis- tic (stochastic) trial-and-error method of optimization that is different from other deterministic methods like steepest gradient descent. Due to GA’s trial-and-error nature and its dependence on processes like reproduc- tion, crossover, and mutation, the algorithm provides good chances of avoiding local optima in the param- eter search space (Jakobsen, 2010; Gonzalez & Dutt, 2011; Houck, Joines & Kay, 1995). In addition, prior research involving models have used the GA procedure for model calibration (Gonzalez & Dutt, 2011; 2012; Lejarraga, Dutt, & Gonzalez, 2012). In our model cal- ibrations, the GA repeatedly modified a population of parameter tuples to find the tuple that minimized the negative of model’s log-likelihood function (Equation 2) across all human participants. In each generation, the GA selected parameter tuples randomly from a population to become parents and used these parents to select children for the next generation. For each pa- rameter tuple in a generation, each model was run five times across 1,170 participants to minimize the neg- ative of model’s average log-likelihood function over five runs.5 Over successive generations, the population evolved toward an optimal solution. The population- size was set to 20 randomly-selected parameter tuples per generation (each tuple contained a certain value for each of the model’s parameters). The mutation and crossover fractions were both set at 0.5 after a grid search for the best combination. The best combination for mutation and crossover fractions was found by cal- ibrating the IBL (LDG) model to aggregate choices using its known parameters (d = 5.0; σ = 1.5). We systematically varied the mutation and crossover frac- tions in steps of 0.1 in the interval [0, 1] for finding their best combination. The optimal values of muta- tion and crossover fraction (= 0.5) were those where the optimization converged the IBL (LDG) parameters to their optimal values in the least number of genera- tions. These optimal values of mutation and crossover fraction found were used for calibration of model pa- rameters to individual choices. The GA procedure was implemented in Matlab R© toolbox (Houck, Joines & Kay, 1995; Mathworks, 2012), where the stopping cri- teria in optimization of model parameters involved the following constraints: stall generations = 200, func- tion tolerance = 1x10−8, and when the average relative change in the fitness function value over 200 stall gen- erations was less than function tolerance (1x10−8). Results Calibration in TPT’s estimation set Table 1 shows parameter calibration results from dif- ferent models in TPT’s estimation dataset. The ta- ble lists different models, calibrated parameter values, combinations obtained due to comparison of human and model final choices, log-likelihoods, and incorrect proportions. Calibrated parameters The best model in terms of log-likelihood values was CPT (TPT). Five parameters were calibrated in the CPT (TPT) model and the calibrated model possessed the log-likelihood of -634.7, which was significantly larger than that for the CPT (TK) model (-662.8) and CPT (Hau) model (-643.9). The calibrated parameter values were: α = 1.008; β =0.96; γ =2.00; δ =0.92; λ =1.03. The free parameters for the value function in- dicated slightly less magnitude of disutility for losses compared to the utility for gains. The value func- tion for the CPT (TPT) model was aligned with risk- neutral behavior for both gains and losses, which was different from the behavior in the CPT (Hau) model and in the CPT (TK) model. Furthermore, the weight- ing function of the CPT (TPT) model showed under- weighting of small probabilities for positive outcomes and about equal weighting of small probabilities for negative outcomes. Furthermore, the weighting func- tion for the CPT (Hau) and CPT (TK) models over- weight small probabilities for both positive and nega- tive outcomes. Please see Appendix D for shapes of value and weighting functions for different CPT mod- els. The Ensemble (TPT) model was second best model where the model exhibited a log-likelihood of -691.0. The model’s calibrated parameters were α=0.75, β = 1.46, γ = 1.42, δ = 1.03, λ = 1.13, µ= 0.37, T0=0.001, porder1= 0.38, σ = 0.020, Tp= 0.18 and porder2 = 0.62. The first six parameters from the model de- picted underweighting of rare events and loss aversion with losses perceived as more damaging compared to gains. The later five parameters from priority heuristic showed smaller variance in distribution of σ parame- ter compared to the last round. Also, results indicated underweighting of small probabilities, overweighting of 5The number of runs were set to five after analyzing the run- to-run variability in models with stochasticity (e.g., IBL and BEAST). Five runs were chosen as there was little change in the standard deviation by increasing the number of runs beyond five. 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 9 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Figure 1. The correct proportions against number of parameters from different models calibrated in the TPT’s estimation set. large probabilities, and diminishing sensitivity to gains and losses. The next best model was the IBL (TPT) model, where this model exhibited a considerably larger log- likelihood value of -929.0 compared to the IBL (LDG) model. The IBL (TPT) model’s calibrated parame- ters were: d = 5.39 and σ = 0.04. These parame- ters indicated reliance on recency of sampled informa- tion, which provides a plausible account of recency’s role in human participant’s sampling and subsequent choice. The recency reliance for individual choices is also in agreement with documented reliance on recency in aggregate choices (Dutt & Gonzalez, 2012; Gonzalez & Dutt, 2011; 2012; Hertwig et al., 2004; Lejarraga, Dutt, & Gonzalez, 2011). In fact, the d parameter value was higher for the model calibrated to individ- ual choices compared to the model calibrated to the aggregate choices. Furthermore, the participant-to- participant variability (captured by the σ) was smaller in the IBL (TPT) compared to the IBL (LDG) model. This observation showed less variability among indi- vidual participants in their choices. For the BEAST (TPT) and NMH model, the log- likelihood values (-1129.0, -1386.5) were much smaller compared to those for the individual versions of CPT, Ensemble, and IBL models. Please see Table 1 for log-likelihood values of different models. Incorrect proportion In the calibration datasets, the CPT (TPT) model possessed the best incorrect proportion of 0.15. In the CPT (TPT) model, the desirable NHNM and MHMM combinations were 39% and 45%, respectively. In con- trast, the erroneous NHMM and MHNM combinations were 10% and 6%, respectively. The CPT (Hau) model showed an error proportion of 0.16. The model showed 39% of NHNM combinations and 45% of MHMM com- binations. The erroneous combinations included, 9% for NHMM and 7% for MHNM. The incorrect propor- tion for the CPT (TK) model was 0.18. The propor- tions of desirable NHNM and MHMM combinations were 41% and 42%, respectively. In addition, the er- roneous NHMM and MHNM combinations were 8% and 10%, respectively. The next best model was the IBL (TPT) model, where the model exhibited incorrect proportion of 0.21. The IBL (TPT) model showed 39% and 40% of desirable NHNM and MHMM combinations. The erroneous NHMM and MHNM combinations were 9% and 12%, respectively. Beyond the IBL (TPT) model, the BEAST (TPT) model did well with an incorrect proportion of 0.24. The four combination proportions for BEAST (TPT) model were: 36% (NHNM), 41% (MHMM), 13% (NHMM) and 10% (MHNM). Next, to gauge the benefit of explaining individ- ual choices with different model parameters, we plot- ted correct proportions from calibrated models against their number of free parameters (see Figure 1). Models closer to the origin are the ones that explain individ- ual choices with least number of free parameters. The magnitude distance of IBL (TPT) and CPT (TPT) models from origin (= 2 and 5 units, respectively) was much less than that for the BEAST (TPT) and En- semble (TPT) models (= 6 and 11 units, respectively). Thus, based upon the distance metric, IBL and CPT models explained individual choices with fewer num- ber of free parameters. Thus, it seems that cognitive mechanisms like recency, frequency, and blending as well as mathematical functions that underweight rare outcomes and value gains and losses differently are ap- propriate to account for individual choices. 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 10 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices T ab le 1. C al ib ra ti on re su lt s fr om m od el s in T P T ’s es ti m at io n da ta se t. P er ce nt ag e of 11 70 O bs er va ti on s C om bi na ti on s fr om H um an E ns em bl e E ns em bl e N M H IB L IB L C P T C P T C P T B E A ST B E A ST an d M od el D at a H /M (H er zo g) (T P T ) (L D G ) (T P T ) (T K ) (H au ) (T P T ) (C P C ) (T P T ) α = 1. 19 , α = 0. 75 , β = 1. 35 , β = 1. 46 , γ = 1. 42 , γ = 1. 42 , σ = 7. 00 , σ = 0. 24 , δ = 1. 54 , δ = 1. 03 , α = 0. 88 , α = 0. 94 , α = 1. 00 8, κ = 3. 00 , κ = 1. 99 , λ = 1. 19 , λ = 1. 13 , β = 0. 88 , β = 0. 86 , β = 0. 96 , β = 2. 6, β = 0. 06 , µ = 0. 41 , µ = 0. 37 , d = 5. 00 , d = 5. 39 , γ = 0. 61 , γ = 0. 99 , γ = 2. 00 , γ = 0. 50 , γ = 1. 16 , T 0 = 0. 00 01 , T 0 = 0. 00 1, - σ = 1. 50 σ = 0. 04 δ = 0. 69 , δ = 0. 93 , δ = 0. 92 , ϕ = 0. 07 , ϕ = 0. 03 , p o r d e r 1 = 0. 38 , p o r d e r 1 = 0. 38 , λ = 1. 00 λ = 1. 00 λ = 1. 03 θ = 1. 00 , θ = 1. 17 , σ = 0. 03 7, σ = 0. 02 , T p = 0. 11 , T p = 0. 18 , p o r d e r 2 = 0. 62 p o r d e r 2 = 0. 62 N H N 1 M 31 32 29 26 39 41 39 39 33 36 M H M M 40 40 33 32 40 42 45 45 37 41 N H M M 17 18 19 23 09 08 09 10 15 13 M H N M 12 11 19 20 12 10 07 06 15 10 In co rr ec t pr op or ti on 0. 29 0. 28 0. 37 0. 43 0. 21 0. 18 0. 16 0. 15 0. 31 0. 24 Lo g Li ke lih oo d -6 96 .2 -6 91 .0 -1 38 6. 5 -3 15 8. 0 -9 29 .0 -6 62 .8 -6 43 .9 -6 43 .7 -1 97 1. 0 -1 12 9. 0 N ot e. 1 N H an d M H re fe rs to no n- m ax im iz in g an d m ax im iz in g hu m an ch oi ce s, re sp ec ti ve ly . N M an d M M re fe rs to no n- m ax im iz in g an d m ax im iz in g m od el ch oi ce s, re sp ec ti ve ly . 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 11 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Generalization to different datasets Up to now, different models predicted choices of in- dividual participants in TPT’s estimation set using a single set of parameter values. These models, how- ever, possess different number of free parameters. Due to differences in model parameters it becomes diffi- cult to compare model performance during parameter calibration. One method that allows us to compare models and account for parameter differences is gen- eralization (Busemeyer & Diederich, 2010; Busemeyer & Wang, 2000; Dutt & Gonzalez, 2012). In gener- alization, models with calibrated parameters are run in new problems (Busemeyer & Wang, 2000). Ide- ally, new problems encountered during generalization should be different from those encountered during cal- ibration; otherwise, generalization may favor models that show superior performance during calibration. In what follows, we first generalize calibrated models to problems in TPT’s competition set. Generalization of this kind was also followed for models submitted to TPT (Erev et al., 2010). However, problems in TPT’s competition set were derived using the same algorithm as in TPT’s estimation set (Erev et al., 2010). Thus, it is likely that the nature of problems across competition and estimation sets were similar and that TPT’s com- petition set provided a weaker generalization dataset with respect to TPT’s estimation set. To overcome this limitation, we also generalized calibrated models to the Six-Choice (SC) problems dataset (Hertwig et al., 2004), where the SC problems were different in structure and nature compared to the TPT problems. We will first report on the generalization results for the TPT’s competition set, then those for the SC dataset. Generalization to competition set. TPT’s competi- tion set was like the estimation set with two excep- tions: problems in competition set were different from those in the estimation set and different subjects par- ticipated in the competition set compared to the esti- mation set (Erev et al., 2010). The 60 problems in the competition set were selected using the same algorithm as used for the estimation set. To explain individual choices, all models were run in the competition set us- ing the parameters obtained in the estimation set. Table 2 shows generalization results from different models in the competition set. In all models, param- eters were set to values reported in Table 1. Over- all, the incorrect proportions obtained from models in the competition set were like those obtained in the estimation set. Calibrated models performed better compared to their uncalibrated counterparts that bor- rowed parameter values for aggregate choices from lit- erature. The incorrect proportion was the lowest for the CPT (TPT) model, where IBL (TPT) and Ensem- ble (TPT) models took the second and third places, respectively. Also, all three models performed sig- nificantly better than the BEAST and NMH models. These results highlight the role of the certain mech- anisms in explaining individual choices: recency, fre- quency, and blending of encountered information dur- ing sampling, the underweighting of rare events, and the differential valuation of gains and losses. Sequential analysis. To gauge models in accounting for individual differences, we evaluated the proportion of sequential decisions in models from last sample to final choice. Here, human and model choices were analyzed sequentially. Thus, we evaluated decisions made by human participants during their last sam- ple and consequential choice and then compared these sequential decisions to those from models. Table 3 presents the proportion of model participants and like- lihoods showing a transition that was similar to or dif- ferent from human participants in TPT’s competition dataset. Based upon the last sample and consequen- tial choice among human participants, the following four transition possibilities existed: N → N, N → M, M → N, and M → M, where the first letter (before the arrow) corresponds to the choice made by a hu- man participant during her last sample and the second letter (after the arrow) corresponds to the final choice made by the same participant after sampling. For each last sample and final choice transition by a human par- ticipant, there are two transition possibilities for the model: first, like the human participant; and, second, different from the human participant. If the model is suggestive of individual choice, then the model should show a transition between last sample and final choice like human participants for more than 50 percent (i.e., a majority) of its participants. We evaluated sequen- tial decisions in the top four models: CPT (TPT), IBL (TPT), Ensemble (TPT) and NMH model. As shown in Table 3, across all transitions, N → N, N → M, M → N, and M → M, the CPT (TPT) model per- formed better compared to all other models. Thus, the CPT (TPT) model made stronger correct predic- tions for human transitions from last sample to final choice compared to the Ensemble (TPT), NMH, and IBL models. The IBL (TPT) model performed supe- rior to the Ensemble (TPT) model on two kinds of transitions: NN and MN. Overall, these results show that underweighting of experienced probabilities, loss- aversion due to negative outcomes, and recency and frequency processes seem to account for sequential in- dividual choices in data. Six Choice (SC) dataset. In the section above, we generalized models to TPT’s competition set. How- ever, the problems in the competition set were sim- ilar to those in the estimation set as the problem- generation algorithm remained the same between the two sets. Due to this observation, the competition set provides a weaker generalization dataset. In order to overcome this limitation, we also generalized cali- brated models to the Six Choice (SC) dataset (Her- twig et al., 2004; Appendix C), where the structure of options across problems in SC dataset was differ- ent from that in TPT’s estimation and competition sets. In the SC dataset, all six problems presented options that differed with respect to expected value; 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 12 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Table 2. Generalization results from models in TPT’s competition dataset. Percentage of 1200 Observations Combinations from Hu- man Data and Model H/M Ensemble (Her- zog) Ensemble (TPT) NMH IBL (LDG) IBL (TPT) CPT (TK) CPT (Hau) CPT (TPT) BEAST (CPC) BEAST (TPT) NHNM 20 20 25 22 29 32 33 33 21 24 MHMM 46 46 39 40 43 53 49 50 36 39 NHMM 21 20 15 19 12 09 08 09 19 17 MHNM 14 13 21 20 17 09 10 07 24 20 Incorrect proportion 0.34 0.33 0.36 0.39 0.28 0.17 0.18 0.16 0.42 0.37 Note. 1 NH and MH refers to non-maximizing and maximizing human choices, respectively. NM and MM refers to non-maximizing and maximizing model choices, respectively. Table 3. Proportion of model participants following a transition that is similar to or different from human participants in the Competition dataset. Human Transition (Last Sample → Final Choice) Model Transition (Last Sample → Final Choice) CPT (TPT) (%) IBL (TPT) (%) Ensemble (TPT) (%) NMH (%) N→N N→N 79 73 54 62 N→M 21 27 46 38 N→M N→M 80 70 77 64 N→N 20 30 23 36 M→N M→N 77 67 51 62 M→M 23 34 49 38 M→M M→M 87 74 78 66 M→N 13 26 22 34 Note. N and M refers to non-maximizing and maximizing choices, respectively. Table 4. Generalization results from models in the SC problems dataset. Percentage of 150 Observations Combinations from Hu- man Data and Model H/M Ensemble (Her- zog) Ensemble (TPT) NMH IBL (LDG) IBL (TPT) CPT (TK) CPT (Hau) CPT (TPT) BEAST (CPC) BEAST (TPT) NHNM 45 46 55 41 51 37 37 39 33 34 MHMM 20 19 26 23 32 25 31 27 31 31 NHMM 14 13 03 18 07 22 21 20 25 25 MHNM 21 22 15 19 09 17 11 10 10 11 Incorrect proportion 0.35 0.35 0.19 0.37 0.16 0.39 0.34 0.33 0.36 0.35 Note. 1 NH and MH refers to non-maximizing and maximizing human choices, respectively. NM and MM refers to non-maximizing and maximizing model choices, respectively. 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 13 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Table 5. Proportion of model participants following a transition that is similar to or different from human participants in the SC problems dataset. Human Transition (Last Sample → Final Choice) Model Transition (Last Sample → Final Choice) NMH (%) IBL (TPT) (%) Ensemble (TPT) (%) CPT TPT (%) N→N N→N 96 81 79 66 N→M 4 19 21 34 N→M N→M 54 75 36 64 N→N 46 25 64 36 M→N M→N 91 89 71 66 M→M 9 11 29 34 M→M M→M 71 74 59 68 M→N 29 26 41 32 Note. N and M refers to non-maximizing and maximizing choices, respectively. four of them offered positive prospects and two offered negative prospects. All problems in the SC dataset were run in the sampling paradigm format: free sam- pling of options followed by a final choice for one of the options for real. During sampling, participants could sample options in whatever order they desired, and however often they wished. They were encour- aged to sample until they felt confident enough to de- cide from which option to draw a real payoff. Like the TPT dataset, each problem consisted of choosing be- tween two options. However, unlike the TPT dataset, problems in the SC dataset could have both options risky: Both options could independently contain high and low outcomes with predefined probability distribu- tions. Problems in SC dataset belonged to both pos- itive and negative domains. In positive domain, the associated non-zero outcomes were positive; whereas, in the negative domain, the associated non-zero out- comes were negative. Overall, the TPT dataset and SC dataset differed on the number of outcomes possi- ble on options and the presence of the mixed domain in TPT and its absence in the SC problems. Table 4 shows the generalization results from run- ning different models in the SC dataset (model param- eters were calibrated in the estimation set). As shown in Table 4, the IBL (TPT) model was the best per- forming model with an incorrect proportion of 0.16. The NMH model was the second-best model with an incorrect proportion of 0.19. The CPT (TPT) model was the third-best model with an incorrect propor- tion of 0.33. Other hierarchical models like Ensemble and BEAST did not perform as well in the SC dataset and possessed higher incorrect proportions. Further- more, models with recalibrated parameters performed better compared to models with parameters for aggre- gate choices borrowed from literature. These results show that when a more challenging generalization is performed, models like IBL and CPT that are based upon activations and recency and frequency mecha- nisms as well as assumptions of underweighting of rare outcomes and different valuation of gains and losses perform better compared to other model that rely on heuristics rules and biased sampling techniques. Sequential Analyses. To evaluate models at explain- ing individual differences, we analyzed the top four models in the SC dataset. Table 5 shows the tran- sition from the last sample to final choice for human and model participants in the SC problems dataset. As seen in Table 5, both IBL and NMH models were suggestive of human-like transitions for all four com- binations based upon the 50% rule. The IBL (TPT) model performed better compared to the NMH model in NM and MM transitions and poorer compared to the NMH in NN and MN transitions. Overall, these results show the role of recency and frequency pro- cesses during sampling in individual choices. Discussion Till recently, researchers had evaluated how aggregate or hierarchical models with a set of parameter val- ues explained aggregate choices made from experience (Dutt & Gonzalez, 2012; Gonzalez & Dutt, 2011; 2012; Lee, 2008; Lejarraga, Dutt, & Gonzalez, 2012; Rouder & Lu, 2005). Also, researcher had evaluated how mod- els with a single set of parameter values calibrated to each participant explained individual choices (indi- vidual models; Kudryavtsev & Pavlodsky, 2012; Frey, Mata, & Hertwig, 2015). However, little was known on how aggregate or hierarchical models with a set of single or distribution parameter values would per- form when they are made to account for individual choices. In this paper, we contributed to this investi- gation by calibrating aggregate or hierarchical models with a set of single or distribution parameter values to individual choices across three different datasets. Ag- gregate and hierarchicalmodels were calibrated in the Technion Prediction Tournament (TPT)’s estimation set using the log-likelihood function and later general- ized to TPT’s competition dataset (Erev, Ert, Roth, et al., 2010) and the Six-Choice (SC) problems dataset (Hertwig et al., 2004). We followed the traditional approach of model comparison via generalization as 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 14 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices proposed by Busemeyer and Wang (2000). Overall, our results revealed that both aggregate and hierarchical models performed above chance (= 50%) when their parameters were calibrated to in- dividual choices. Even parameter values calibrated to aggregate choices (borrowed from literature) per- formed above chance in these models. The CPT model performed well overall in the calibration and general- ization datasets from TPT. Models such as Ensemble and CPT possess rules like weighting and value func- tions that abstract the sampling process experienced by human participants. From our results, these con- structs help such models in cases where the general- ization environment is similar to the calibration envi- ronment (as in TPT); however, not when the gener- alization environment is different from the calibration environment (as in SC dataset). However, upon performing a generalization to the SC problems dataset, the IBL model, relying on re- cency, frequency, and blending mechanisms, showed superior performance compared to other models em- ploying mathematical functions (Ensemble or CPT) or biased sampling techniques (BEAST). Also, the NMH model, which incorporates frequency and magnitude of experienced outcomes, also performed well to account for individual decisions. One likely reason for this ob- servation is the presence of cognitive constructs like expectations, instances, activations, and blended val- ues in the IBL model and the averaging mechanism in the NMH model. These mechanisms help these models account for individual experiences gained during sam- pling of options. For example, the IBL model is moti- vated from the ACT-R theory of cognition (Anderson & Libere, 1998). The IBL model’s reliance on recency and frequency of experiences during sampling (exhib- ited through activations and blended values) helps this model to make human-like choices. Similarly, the nat- ural means in the NMH model are computed based upon experienced outcomes during the sampling pro- cess. These natural means represent expectations of choosing different options and enables this model to account for individual choices. Next, we found that the IBL model performed con- sistently well in both calibration and generalization datasets standing among the top two models even though it possessed only two parameters. One likely reason for this observation could be that the IBL model uses the blending mechanism, where for ev- ery option, the values of all the observed outcomes are weighted by their activation strengths. Blending of experiences considers both the activation of out- comes in memory as well as their magnitude. Per- haps, the IBL model’s blending mechanism makes the model blend outcomes correctly for both maximizing and non-maximizing choices. Other factors affect- ing performance of IBL model are its two parame- ters d and σ. The calibrated value of d parameter was higher for individual choices compared to its cal- ibrated value for aggregate choices (the latter being done by Lejarraga, Dutt, and Gonzalez, 2012). The increased d value shows that individual choices rely excessively on recency of outcomes. Furthermore, the σ parameter helped the IBL model account for sample- to-sample variability in instance activations. Here, when the model parameters were calibrated to individ- ual choices, the σ parameter’s value was much smaller and closer to its ACT-R default compared to when the same model was calibrated by Lejarraga, Dutt, and Gonzalez (2012) to aggregate choices. The smaller value of σ parameter closer to its ACT-R default show- cases lesser variation in outcome activations among in- dividual choices. This research work builds upon literature in judg- ment and decision making in several ways. First, the BEAST and Ensemble models were hierarchical, where these models possessed distribution parameters to ac- count for individual choices. The parameters in these models assumed different values from distribution for different participants in the dataset. Thus, these dis- tribution parameters should have helped these mod- els to account for individual choices due to parameter heterogeneity. However, in our results, the BEAST and Ensemble models did not account for individual choices as well as those models (like IBL or CPT) that possessed single parameters. This finding likely shows that it is more important for a model to possess the right cognitive or mathematical mechanisms compared to possessing heterogeneity among its parameters for different participants. Second, we performed generalizations to large datasets that were similar or dissimilar to the cali- bration dataset. An insight from this generalization exercise it that the true picture emerges when the gen- eralization dataset is different in its structure from the calibration dataset. The SC dataset possessed prob- lems where the problem structure was different from that of the TPT datasets (both options could be risky in SC dataset). Thus, it is recommended that general- izations should be performed to datasets that possess structural differences from the calibration datasets. Third, we used individual-level techniques like like- lihoods and incorrect proportions, where these tech- niques enabled us to evaluate aggregate and hierar- chical models at the individual participant level. In summary, the likelihood approach is powerful and it enables us to calibrate models at the individual level. However, beyond calibration, one needs to test mod- els based upon dependent measures that account for model error at the individual level. This need is es- pecially true for generalizations, where the calibration measures like likelihood cannot be used as parameters have already been fixed to their calibrated values. In this paper, our focus was on investigating how aggregate and hierarchical models with a set of single or distribution parameters performed when their pa- rameters were calibrated to individual choices rather than aggregate choices. As part of our future research, we plan to also perform individual modeling: calibrate a set of model parameters to each individual decision such that we get a set of parameters for each partici- pant in the dataset. This evaluation will enable us to test the tradeoffs between aggregate modeling, hierar- 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 15 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices chical modeling, and individual modeling when these models are evaluated for explaining individual deci- sions (as in this paper). Individual modeling may help us to account for individual differences well; however, these models also run the risk of overfitting individ- ual decisions due to too many parameter values (one for each individual participant). To provide a robust comparison of this tradeoff, as part of our future re- search, we plan to generalize individual models across both similar and dissimilar datasets within the same paradigm or across datasets in different paradigms. Furthermore, as part of our future research, we plan to extend our investigation to decision tasks where decision-makers make decisions across multiple options rather than make a binary choice. An example of this task is the Iowa-Gambling Task (Bechara, Damasio, Damasio, & Anderson, 1994), where the problem con- sists of making a choice between four options. In this paper, we took problem environments that were static in terms of outcomes and probabilities. Thus, out- comes and probabilities in a problem did not change during sampling. In future, it would be worthwhile to extend the evaluation of models in explain individual choices in dynamic environments, where outcomes and probabilities change during information search. Some of these ideas form the immediate next steps that we would like to undertake as part of our research. Conclusion This paper helped to bridge the gap in literature on how aggregate and hierarchical models with a set of parameter values (either single or distribution) would perform when they are made to account for individ- ual choices. We contributed to this investigation by calibrating different models with a set of parame- ter values to individual choices across three different datasets. Models with constructs that abstract the sampling process performed well when generalized to problems that were similar to the calibration problems. However, generalization to other problems that were structurally different from the calibration problems re- vealed that model mechanisms like differential valuing of gains and losses, recency, frequency, blending and, underweighting of rare outcomes were important to account for individual choices. Also, models using dis- tribution parameters with heuristic rules and biased techniques did not perform well in accounting for in- dividual choices when these models were generalized to different problems. Acknowledgements: This research was supported by Indian Institute of Technology Mandi and Tata Consul- tancy Services Research Scholar program. Declaration of conflicting interests: The authors de- clare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. Handling editor: Andreas Fischer Author contributions: The authors contributed equally to this work. Supplementary material: Supplementary material available online. Copyright: This work is licensed under a Creative Com- mons Attribution-NonCommercial-NoDerivatives 4.0 In- ternational License. Citation: Sharma, N., & Dutt, V. (2017). Model- ing decisions from experience: How models with a set of parameters for aggregate choices explain individual choices. Journal of Dynamic Decision Making, 3, 3. doi:10.11588/jddm.2017.1.37687 Received: 27 April 2017 Accepted: 10 September 2017 Published: 06 October 2017 References Akaike, H. (1974). A new look at the statistical model identifica- tion. IEEE Transactions on Automatic Control, 19(6), 716–723. doi:10.1109/tac.1974.1100705 Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Hillsdale, NJ: Erlbaum. Barron, G., & Erev, I. (2003). Small feedback-based deci- sions and their limited correspondence to description-based de- cisions. Journal of Behavioral Decision Making, 16(3), 215–233. doi:10.1002/bdm.443 Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W. (1994). Insensitivity to future consequences following dam- age to human prefrontal cortex. Cognition, 50(1–3), 7–15. doi:10.1016/0010-0277(94)90018-3 Birnbaum, M. H. (2008). New paradoxes of risky decision mak- ing. Psychological review, 115(2), 463. doi:10.1037/0033- 295X.115.2.463 Bishop, C. M. (2006). Pattern recognition and machine learning. New York, NY: Springer. Busemeyer, J. R., & Diederich, A. (2010). Cognitive modeling. Thousand Oaks, CA: Sage. Busemeyer J. R., Myung, I. J. (1992). An adaptive approach to human decision making: Learning theory, decision theory, and hu- man performance. Journal of Experimental Psychology: General, 121(2), 177–194. doi:10.1037/0096-3445.121.2.177 Busemeyer, J. R., & Stout, J. C. (2002). A contribution of cog- nitive decision models to clinical assessment: decomposing per- formance on the Bechara gambling task. Psychological Assess- ment, 14(3), 253–262.doi:10.1037//1040-3590.14.3.253 Busemeyer, J. R., & Wang, Y. (2000). Model comparisons and model selections based on the generalization criterion method- ology. Journal of Mathematical Psychology, 44(1), 171–189. doi:10.1006/jmps.1999.1282 Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning. Oxford, England: Wiley & Sons. 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 16 http://dx.doi.org/10.11588/jddm.2017.1.37687 http://dx.doi.org/10.1109/tac.1974.1100705 http://dx.doi.org/10.1002/bdm.443 http://dx.doi.org/10.1016/0010-0277(94)90018-3 http://dx.doi.org/10.1037/0033-295X.115.2.463 http://dx.doi.org/10.1037/0033-295X.115.2.463 http://dx.doi.org/10.1037/0096-3445.121.2.177 https://doi.org/10.1037//1040-3590.14.3.253 http://dx.doi.org/10.1006/jmps.1999.1282 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Brandstätter, E., Gigerenzer, G., & Hertwig, R. (2006). The pri- ority heuristic: making choices without trade-offs. Psychological Review, 113(2), 409–432. doi:10.1037/0033-295X.113.2.409 Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215. doi:10.1016/j.neuron.2011.02.027 Denrell, J. (2007). Adaptive learning and risk taking. Psychological Review, 114(1), 177–187. doi:10.1037/0033-295X.114.1.177 Dutt, V., & Gonzalez, C. (2012). The role of inertia in modeling decisions from experience with instance-based learning. Frontiers in Psychology, 3(177). doi:10.3389/fpsyg.2012.001777 Dutt, V. & Gonzalez, C. (2015). Accounting for out- come and process measures and the effects of model cali- bration. Journal of Dynamic Decision Making, 1(2),1–10. doi:10.11588/jddm.2015.1.17663 Erev, I., & Barron, G. (2005). On adaptation, maximization, and reinforcement learning among cognitive strategies. Psychological Review, 112(4), 912–31. doi:10.1037/0033-295X.112.4.912 Erev, I., Ert, E., Plonsky, O., Cohen, D., & Cohen, O. (2015). From anomalies to forecasts: A choice prediction competition for deci- sions under risk and ambiguity. Mimeo, 1–56. Erev, I., Ert, E., Roth, A. E., Haruvy, E., Herzog, S. M., & Hau, R. (2010). A choice prediction competition: Choices from ex- perience and from description. Journal of Behavioral Decision Making, 23(1), 15–47. doi:10.1002/bdm.683 Erev, I., Glozman, I., & Hertwig, R. (2008). What impacts the impact of rare events. Journal of Risk and Uncertainty, 36(2), 153–177. doi:10.1007/s11166-008-9035-z Estes, W. K., & Todd Maddox, W. (2005). Risks of drawing in- ferences about cognitive processes from model fits to individual versus average performance. Psychonomic Bulletin & Review, 12(3), 403–408.doi:10.3758/bf03193784 Fox, C. R., & Tversky, A. (1998). A belief-based account of decision under uncertainty. Management Science, 44(7), 879– 895.doi:10.1287/mnsc.44.7.879 Frey, R., Mata, R., & Hertwig, R. (2015). The role of cog- nitive abilities in decisions from experience: Age differences emerge as a function of choice set size. Cognition, 142, 60–80. doi:10.1016/j.cognition.2015.05.004 Gallistel, C. R., Fairhurst, S., & Balsam, P. (2004). The learn- ing curve: implications of a quantitative analysis. Proceedings of the National Academy of Sciences of the United States of America, 101(36), 13124–13131. doi:10.1073/pnas.0404965101 Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: models of bounded rationality. Psychological Re- view, 103(4), 650–669.doi:10.1037//0033-295x.103.4.650 Gilboa, I., & Schmeidler, D. (1989). Maxmin expected utility with non-unique prior. Journal of Mathematical Economics, 18(2), 141–153. doi:10.1016/0304-4068(89)90018-9 Gonzalez, C., & Dutt, V. (2011). Instance-Based Learning: In- tegrating Sampling and Repeated Decisions From Experience. Psychological Review, 118(4), 523–551. doi:10.1037/a0024558 Gonzalez, C., & Dutt, V. (2012).Refuting data aggregation argu- ments and how the instance-based learning model stands crit- icism: A reply to Hills and Hertwig. Psychological Review, 119(4), 893–898.doi:10.1037/a0029445 Hau, R., Pleskac, T. J., Kiefer, J., & Hertwig, R. (2008). The description-experience gap in risky choice: The role of sample size and experienced probabilities. Journal of Behavioral Decision Making, 21(5), 493–518. doi:10.3758/s13423-015-0924-2 Hertwig, R. (2012). The psychology and rationality of decisions from experience. Synthese, 187(1), 269–292. doi:10.1007/s11229-011-0024-4 Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (2004). Deci- sions from experience and the effect of rare events in risky choice. Psychological Science, 15(8), 534–539. doi:10.1111/j.0956- 7976.2004.00715.x Hertwig, R., & Erev, I. (2009). The description-experience gap in risky choice. Trends in Cognitive Sciences, 13(12), 517–523. doi:10.1016/j.tics.2009.09.004 Hertwig, R., & Pleskac, T. J. (2010). Decisions from expe- rience: Why small samples? Cognition, 115(2), 225–237. doi:10.1016/j.cognition.2009.12.009 Horrace, R. H., William, C., and Jeffrey, M. P. (2009), Variety: Consumer choice and optimal diversity. Food Marketing Policy, Center Research Report, 115. Houck, C. R., Joines, J., & Kay, M. G. (1995). A genetic algo- rithm for function optimization: a Matlab implementation. North Carolina State University, Technical Report NCSU-IE TR 95-09. Jakobsen, T. (2010). Genetic algorithms. Retrieved from http://subsimple.com/genealgo.asp Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263– 291.doi:10.2307/1914185 Kudryavtsev, A., & Pavlodsky, J. (2012). Description-based and experience-based decisions: individual analysis. Judgment and Decision Making, 7(3), 316–331. Lebiere, C. (1999). Blending: An ACT-R mechanism for Aggregate retrievals. Paper presented at the 6th Annual ACT-R Workshop at George Mason University. Fairfax County, VA. Lee, M. D. (2008). Three case studies in the Bayesian analysis of cognitive models. Psychonomic Bulletin & Review, 15(1), 1–15. doi:10.3758/PBR.15.1.1 Lejarraga, T. & Dutt, V. & Gonzalez, C. (2012). Instance- Based Learning: A general model of repeated binary choice. Journal of Behavioral Decision Making, 25(2),143–153. doi:10.1002/bdm.722 Luce, R. D., & Raiffa, H. (1957). Games and decisions: Introduc- tion and critical surveys. New York: Wiley. March, J. G. (1996). Learning to be risk averse. Psychological Review, 103(2), 309–319. doi:10.1037/0033-295X.103.2.309 Marchiori, D., Di Guida, S., & Erev, I. (2015). Noisy retrieval mod- els of over-and under sensitivity to rare events. Decision, 2(2), 82–106. doi:10.1037/dec0000023 Mathworks (2012). MATLAB and Statistics Toolbox Release 2012b [Computer software]. Natick, Massachusetts, United States: The MathWorks, Inc. Plonsky, O., Teodorescu, K., & Erev, I. (2015). Re- liance on small samples, the wavy recency effect, and similarity-based learning. Psychological Review, 122(4), 621– 647. doi:10.1037/a0039413 Rieskamp, J. (2008). The probabilistic nature of preferential choice. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(6), 1446–1465. doi:10.1037/a0013646 Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. doi:10.3758/BF03196750 Shteingart, H., Neiman, T., & Loewenstein, Y. (2013). The role of first impression in operant learning. Journal of Experimental Psychology: General, 142(2), 476–488. doi:10.1037/a0029550 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 17 http://dx.doi.org/10.1037/0033-295X.113.2.409 http://dx.doi.org/10.1016/j.neuron.2011.02.027 http://dx.doi.org/10.1037/0033-295X.114.1.177 http://dx.doi.org/10.3389/fpsyg.2012.001777 http://dx.doi.org/10.11588/jddm.2015.1.17663 http://dx.doi.org/10.1037/0033-295X.112.4.912 http://dx.doi.org/10.1002/bdm.683 http://dx.doi.org/10.1007/s11166-008-9035-z https://doi.org/10.3758/bf03193784 https://doi.org/10.1287/mnsc.44.7.879 http://dx.doi.org/10.1016/j.cognition.2015.05.004 http://dx.doi.org/10.1073/pnas.0404965101 https://doi.org/10.1037//0033-295x.103.4.650 http://dx.doi.org/10.1016/0304-4068(89)90018-9 http://dx.doi.org/10.1037/a0024558 https://doi.org/10.1037/a0029445 http://dx.doi.org/10.3758/s13423-015-0924-2 http://dx.doi.org/10.1007/s11229-011-0024-4 http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.1016/j.tics.2009.09.004 http://dx.doi.org/10.1016/j.cognition.2009.12.009 http://subsimple.com/genealgo.asp https://doi.org/10.2307/1914185 https://doi.org/10.3758/PBR.15.1.1 http://dx.doi.org/10.1002/bdm.722 http://dx.doi.org/10.1037/0033-295X.103.2.309 http://dx.doi.org/10.1037/dec0000023 http://dx.doi.org/10.1037/a0039413 http://dx.doi.org/10.1037/a0013646 http://dx.doi.org/10.3758/BF03196750 http://dx.doi.org/10.1037/a0029550 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Stevens, L. (2016, June 8). Survey shows rapid growth in online shopping. The Wall Street Journal. Retrieved from https://www.wsj.com/articles/survey-shows-rapid-growth -in-online-shopping-1465358582. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge: MIT Press. Tversky, A., & Kahneman, D. (1986). Rational choice and the framing of decisions. Journal of business, 59(S4), S251–S278. doi:10.1086/296365 Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5(4), 297–323.doi:10.1007/bf00122574 Tversky, A., & Fox, C. R. (1995). Weighing risk and uncertainty. Psychological Review, 102(2), 269–283. doi:10.1037/0033- 295X.102.2.269 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 18 https://www.wsj.com/articles/survey-shows-rapid-growth-in-online-shopping-1465358582 https://www.wsj.com/articles/survey-shows-rapid-growth-in-online-shopping-1465358582 https://doi.org/10.1086/296365 https://doi.org/10.1007/bf00122574 http://dx.doi.org/10.1037/0033-295X.102.2.269 http://dx.doi.org/10.1037/0033-295X.102.2.269 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Appendix Appendix A: Estimation Set (TPT) Problem Set High P(High) Low Medium 1 Est -0.3 0.96 -2.1 -0.3 2 Est -0.9 0.95 -4.2 -1.0 3 Est -6.3 0.3 -15.2 -12.2 4 Est -10 0.2 -29.2 -25.6 5 Est -1.7 0.9 -3.9 -1.9 6 Est -6.3 0.99 -15.7 -6.4 7 Est -5.6 0.7 -20.2 -11.7 8 Est -0.7 0.1 -6.5 -6.0 9 Est -5.7 0.95 -16.3 -6.1 10 Est -1.5 0.92 -6.4 -1.8 11 Est -1.2 0.02 -12.3 -12.1 12 Est -5.4 0.94 -16.8 -6.4 13 Est -2.0 0.05 -10.4 -9.4 14 Est -8.8 0.6 -19.5 -15.5 15 Est -8.9 0.08 -26.3 -25.4 16 Est -7.1 0.07 -19.6 -18.7 17 Est -9.7 0.1 -24.7 -23.8 18 Est -4.0 0.2 -9.3 -8.1 19 Est -6.5 0.9 -17.5 -8.4 20 Est -4.3 0.6 -16.1 -4.5 21 Est 2.0 0.1 -5.7 -4.6 22 Est 9.6 0.91 -6.4 8.7 23 Est 7.3 0.8 -3.6 5.6 24 Est 9.2 0.05 -9.5 -7.5 25 Est 7.4 0.02 -6.6 -6.4 26 Est 6.4 0.05 -5.3 -4.9 27 Est 1.6 0.93 -8.3 1.2 28 Est 5.9 0.8 -0.8 4.6 29 Est 7.9 0.92 -2.3 7.0 30 Est 3.0 0.91 -7.7 1.4 31 Est 6.7 0.95 -1.8 6.4 32 Est 6.7 0.93 -5.0 5.6 33 Est 7.3 0.96 -8.5 6.8 34 Est 1.3 0.05 -4.3 -4.1 35 Est 3.0 0.93 -7.2 2.2 36 Est 5.0 0.08 -9.1 -7.9 37 Est 2.1 0.8 -8.4 1.3 38 Est 6.7 0.07 -6.2 -5.1 39 Est 7.4 0.3 -8.2 -6.9 40 Est 6.0 0.98 -1.3 5.9 41 Est 18.8 0.8 7.6 15.5 42 Est 17.9 0.92 7.2 17.1 43 Est 22.9 0.06 9.6 9.2 44 Est 10.0 0.96 1.7 9.9 45 Est 2.8 0.8 1.0 2.2 46 Est 17.1 0.1 6.9 8.0 47 Est 24.3 0.04 9.7 10.6 48 Est 18.2 0.98 6.9 18.1 49 Est 13.4 0.5 3.8 9.9 50 Est 5.8 0.04 2.7 2.8 51 Est 13.1 0.94 3.8 12.8 52 Est 3.5 0.09 0.1 0.5 53 Est 25.7 0.1 8.1 11.5 54 Est 16.5 0.01 6.9 7.0 55 Est 11.4 0.97 1.9 11.0 56 Est 26.5 0.94 8.3 25.2 57 Est 11.5 0.6 3.7 7.9 58 Est 20.8 0.99 8.9 20.7 59 Est 10.1 0.3 4.2 6.0 60 Est 8.0 0.92 0.8 7.7 Appendix B: Competition Set (TPT) Problem Set High P(High) Low Medium 1 Comp -8.7 0.06 -22.8 -21.4 2 Comp -2.2 0.09 -9.6 -8.7 3 Comp -2.0 0.1 -11.2 -9.5 4 Comp -1.4 0.02 -9.1 -9.0 5 Comp -0.9 0.07 -4.8 -4.7 6 Comp -4.7 0.91 -18.1 -6.8 7 Comp -9.7 0.06 -24.8 -24.2 8 Comp -5.7 0.96 -20.6 -6.4 9 Comp -5.6 0.1 -19.4 -18.1 10 Comp -2.5 0.6 -5.5 -3.6 11 Comp -5.8 0.97 -16.4 -6.6 12 Comp -7.2 0.05 -16.1 -15.6 13 Comp -1.8 0.93 -6.7 -2.0 14 Comp -6.4 0.2 -22.4 -18.0 15 Comp -3.3 0.97 -10.5 -3.2 16 Comp -9.5 0.1 -24.5 -23.5 17 Comp -2.2 0.92 -11.5 -3.4 18 Comp -1.4 0.93 -4.7 -1.7 19 Comp -8.6 0.1 -26.5 -26.3 20 Comp -6.9 0.06 -20.5 -20.3 21 Comp 1.8 0.6 -4.1 1.7 22 Comp 9.0 0.97 -6.7 9.1 23 Comp 5.5 0.06 -3.4 -2.6 24 Comp 1.0 0.93 -7.1 0.6 25 Comp 3.0 0.2 -1.3 -0.1 26 Comp 8.9 0.1 -1.4 -0.9 27 Comp 9.4 0.95 -6.3 8.5 28 Comp 3.3 0.91 -3.5 2.7 29 Comp 5.0 0.4 -6.9 -3.8 30 Comp 2.1 0.06 -9.4 -8.4 31 Comp 0.9 0.2 -5.0 -5.3 32 Comp 9.9 0.05 -8.7 -7.6 33 Comp 7.7 0.02 -3.1 -3.0 34 Comp 2.5 0.96 -2.0 2.3 35 Comp 9.2 0.91 -0.7 8.2 36 Comp 2.9 0.98 -9.4 2.9 37 Comp 2.9 0.05 -6.5 -5.7 38 Comp 7.8 0.99 -9.3 7.6 39 Comp 6.5 0.8 -4.8 6.2 40 Comp 5.0 0.9 -3.8 4.1 41 Comp 20.1 0.95 6.5 19.6 42 Comp 5.2 0.5 1.4 5.1 43 Comp 12.0 0.5 2.4 9.0 44 Comp 20.7 0.9 9.1 19.8 45 Comp 8.4 0.07 1.2 1.6 46 Comp 22.6 0.4 7.2 12.4 47 Comp 23.4 0.93 7.6 22.1 48 Comp 17.2 0.09 5.0 5.9 49 Comp 18.9 0.9 6.7 17.7 50 Comp 12.8 0.04 4.7 4.9 51 Comp 19.1 0.03 4.8 5.2 52 Comp 12.3 0.91 1.3 12.1 53 Comp 6.8 0.9 3.0 6.7 54 Comp 22.6 0.3 9.2 11.0 55 Comp 6.4 0.09 0.5 1.5 56 Comp 15.3 0.06 5.9 7.1 57 Comp 5.3 0.9 1.5 4.7 58 Comp 21.9 0.5 8.1 12.6 59 Comp 27.5 0.7 9.2 21.9 60 Comp 4.4 0.2 0.7 1.1 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 19 http://dx.doi.org/10.11588/jddm.2017.1.37687 Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices Appendix C: SC Problems Set Problem Set High P(High) Low Medium 1 SC Problems 4 0.8 0 3 2 SC Problems 4 0.2 0 3 3 SC Problems -3 1 0 -32 4 SC Problems -3 1 0 -4 5 SC Problems 32 0.1 0 3 6 SC Problems 32 0.025 0 3 Appendix D: CPT Models’ Value and Weighting Functions 10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 20 http://dx.doi.org/10.11588/jddm.2017.1.37687