Original Research

Modeling decisions from experience: How models
with a set of parameters for aggregate choices
explain individual choices
Neha Sharma and Varun Dutt
Applied Cognitive Science Laboratory, Indian Institute of Technology Mandi, Kamand, India - 175005

One of the paradigms (called “sampling paradigm”) in
judgment and decision-making involves decision-makers
sample information before making a final consequential
choice. In the sampling paradigm, certain computational
models have been proposed where a set of single or dis-
tribution parameters is calibrated to the choice propor-
tions of a group of participants (aggregate and hierarchical
models). However, currently little is known on how ag-
gregate and hierarchical models would account for choices
made by individual participants in the sampling paradigm.
In this paper, we test the ability of aggregate and hierar-
chical models to explain choices made by individual partic-
ipants. Several models, Ensemble, Cumulative Prospect
Theory (CPT), Best Estimation and Simulation Tech-
niques (BEAST), Natural-Mean Heuristic (NMH), and
Instance-Based Learning (IBL), had their parameters cal-
ibrated to individual choices in a large dataset involving
the sampling paradigm. Later, these models were gen-
eralized to two large datasets in the sampling paradigm.
Results revealed that the aggregate models (like CPT and
IBL) accounted for individual choices better than hierar-
chical models (like Ensemble and BEAST) upon gener-
alization to problems that were like those encountered
during calibration. Furthermore, the CPT model, which
relies on differential valuing of gains and losses, respec-
tively, performed better than other models during cali-
bration and generalization on datasets with similar set of
problems. The IBL model, relying on recency and fre-
quency of sampled information, and the NMH model,
relying on frequency of sampled information, performed
better than other models during generalization to a chal-
lenging dataset. Sequential analyses of results from dif-
ferent models showed how these models accounted for
transitions from the last sample to final choice in human
data. We highlight the implications of using aggregate
and hierarchical models in explaining individual choices
from experience.

Keywords: Aggregate choice, individual choice, sampling
paradigm, decisions from experience, computational models, likeli-
hood

With the advent of Internet, online shopping forproducts has gained popularity (Stevens, 2016).
For making satisfying online purchases, a consumer
could first sample information about different prod-
ucts and then make a choice for the preferred item

(Horrace et al., 2009). However, the act of making
choices based upon sampled information is not lim-
ited to choosing between different products; rather, it
is a very common exercise involving different facets
of our daily lives (e.g., choosing food items, life part-
ners, and careers). In fact, information search before
a choice constitutes an integral part of Decisions from
Experience (DFE) research, where the focus is on ex-
plaining human decisions based upon one’s experience
with sampled information (Hertwig & Erev, 2009).

To study people’s information search and conse-
quential choice behaviors in the laboratory, researchers
have proposed the “sampling paradigm” (Hertwig &
Erev, 2009). In the sampling paradigm, people are
presented with two or more options to choose between.
These options are represented as blank buttons on a
computer screen. People are first asked to sample as
many outcomes as they wish from different button op-
tions (information search). Once people are satisfied
with their sampling of options, they decide from which
option to make a single consequential choice for actual
awards.

Several computational cognitive models have been
proposed in the sampling paradigm, where these mod-
els help explain how people search for information and
make consequential choices (Erev et al., 2010; Gon-
zalez & Dutt, 2011). Some of these models have a
set of parameter values calibrated to each individual
participant (called “individual models”; Busemeyer
& Diederich, 2010; Kudryavtsev & Pavlodsky, 2012;
Frey, Mata, & Hertwig, 2015). The parameter cal-
ibration exercise in these models results in a set of
parameter values per individual participant, where the
number of parameter sets from a model equal the num-
ber of participants in data. For example, Kudryavt-
sev and Pavlodsky (2012) tested three variations of
two models, Prospect Theory (PT) (Kahneman &
Tversky, 1979) and Expectancy-Valence (EVL) (Buse-
meyer & Stout, 2002) by calibrating model parame-
ters to each participant’s choice. As another example,
Shteingart, Neima and Loewenstein (2013) modeled
many repeated choices of individual participants in the

Corresponding author: Varun Dutt, Applied Cognitive Science Laboratory,
Indian Institute of Technology Mandi, Kamand, District Mandi - 175 005,
H.P., India. e-mail: varun@iitmandi.ac.in

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 1

mailto:varun@iitmandi.ac.in
http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Technion Prediction Tournament (TPT) dataset, con-
sidering a specific reinforcement-learning algorithm.
These authors showed that there was a substantial
effect of the first experience on choice behavior and
this behavior could be accounted by the reinforce-
ment learning model if the outcome of first experience
rested the values of the experienced actions. Similarly,
Frey, Mata, and Hertwig (2015) presented a modeling
analysis at the individual level showing that a simple
delta-learning rule model with parameters calibrated
to younger and older adults separately best described
the learning processes for both these age groups.
Furthermore, certain computational models have

been proposed where model parameters are calibrated
to the choice proportions of a group of participants
(called “aggregate models”; Busemeyer & Diederich,
2010; Estes & Maddox, 2005). Here, a single set of
values of model parameters is calibrated to the av-
erage decision computed across several participants
(Busemeyer & Diederich, 2010; Erev, Ert, Roth, et
al., 2010; Gonzalez & Dutt, 2011; 2012; Lejarraga,
Dutt, & Gonzalez, 2012). The calibration exercise re-
sults in only one set of values of parameters from a
model and these parameters explain the averaged de-
cision computed across all participants. For example,
Gonzalez and Dutt (2011) calibrated one set of values
for three parameters in an Instance-Based Learning
(IBL) model to the risky proportions averaged over all
participants and problems in different DFE datasets.
Similarly, Erev et al. (2010) compared several mod-
els, each with a single set of values for parameters, in
their ability to capture average risk-taking in the TPT
datasets.

There is still a third approach to model calibra-
tion where model parameters follow certain distribu-
tions (possessing density functions) that are defined
across the choice proportions of a group of partici-
pants (called “hierarchical models”; Lee, 2008; Rouder
& Lu, 2005). For example, in the Choice Prediction
Competition (Erev, Ert, Plonsky, et. al., 2015), the
Best Estimation and Simulation Techniques (BEAST)
model was hierarchical and it contained a set of distri-
bution parameters that were calibrated to the choice
proportions across many participants.

Although literature has focused on calibrating pa-
rameters of individual, hierarchical, and aggregate
models (Estes & Maddox, 2005; Gonzalez & Dutt,
2011; Rouder & Lu, 2005); however, little is currently
known on how aggregate or hierarchical models and
their set of single or distribution parameter values, re-
spectively, account for decisions of individual partici-
pants. In this paper, we address this question by con-
sidering both aggregate and hierarchical models with a
set of single or distribution parameter values and eval-
uate how these models explain individual choices. We
perform our evaluation by calibrating and generalizing
a set of parameter values in aggregate or hierarchi-
cal models to choices made by individual participants
in large publically available datasets in the sampling
paradigm. For example, the aggregate IBL model con-
sists of a set of two parameters, d and σ where these

two parameters possess single values and explain the
average risk-taking in DFE datasets (Dutt & Gonza-
lez, 2012; Gonzalez & Dutt, 2011; 2012). In this paper,
however, we recalibrate the d and σ parameters in the
IBL model by assigning them a value each to predict
individual choices in DFE datasets.
The aggregate models that possess a single set of

parameter values and predict aggregated choices, i.e.,
choices that are averaged over several participants,
may or may not explain individual choices well. One
reason for this expectation is that if several individuals
learn linearly at different points in time, then the aver-
age learning curve is likely to be curvilinear (Gallistel
et al., 2004). Thus, even if models with a single set
of parameter values explain a group’s aggregate curvi-
linear learning, it is possible that such models may
not explain individual linear behavior. Another rea-
son why these models may not explain individual be-
havior is due to the degree of heterogeneity present in
individual choices (Busemeyer & Diederich, 2010): A
single set of parameter values may not be sufficient to
explain many individual choices. However, hierarchi-
cal models possess a set of distribution parameters. If
these models account for aggregate choices, then they
are also likely to account for individual choices. That
is because the parameter values are resampled in a hi-
erarchical model from their density functions for each
individual participant and this resampling may allow
these models to account for individual choices.
In addition, there seems to be a tradeoff between ag-

gregate models (like IBL; Dutt & Gonzalez, 2012) that
possess cognitive mechanisms (like recency, frequency,
and blending of outcomes) and a single set of param-
eter values that are fixed across individuals; and, hi-
erarchical models (like BEAST; Erev, Ert, Plonsky,
et al., 2015) that possess mathematical functions to
account for individual biases with a set of parame-
ters that vary across individuals according to distribu-
tions. On one hand, one expects that aggregate models
with cognitive mechanisms and a set of single parame-
ters would account for individual choices; however, one
may also expect that hierarchical models with mathe-
matical functions and a set of distribution parameters
would also account for individual choices.
In this paper, we test these expectations by tak-

ing both aggregate and hierarchical models where
these models’ parameters are calibrated to individual
choices. Furthermore, using the sampling paradigm,
we also evaluate the sequential decisions of partici-
pants from their last sample to final choice as ac-
counted by different aggregate and hierarchical mod-
els. This sequential analysis helps us showcase the
ability of aggregate models in accounting for individ-
ual differences in decisions with a set of single or dis-
tribution parameters.
To calibrate aggregate and hierarchical model pa-

rameters to individual choices, we use the estimation
dataset from TPT (Erev, Ert, Roth, et al., 2010), the
largest publically available DFE dataset. We compare
calibrated aggregate and hierarchical models by gen-
eralizing them to two different DFE datasets in the

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 2

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

sampling paradigm. Furthermore, we investigate an
aggregate or hierarchical model’s ability in capturing
individual differences in data with a set of single or dis-
tribution parameter values. In what follows, we first
motivate our model choices, different datasets used,
and the working of different models. Furthermore, we
discuss the method used for calibrating a set of single
or distribution parameters in models to choices made
by individual participants. Finally, we present the
results of model evaluations both during calibration
and generalization and close the paper by discussing
the implications of our results for predicting individual
choices from experience.

Models in the sampling paradigm

Two classes of models have been proposed in the sam-
pling paradigm (Hertwig, 2012): associative-learning
models (e.g., Instance-Based Learning) and cognitive
heuristics (e.g., Natural Mean Heuristic). Among the
associative-learning class, human choice is conceptual-
ized as a learning process (for example, see Busemeyer
& Myung, 1992; Bush & Mosteller, 1955). Learning is
captured by changing the propensity to select a gam-
ble based on the experienced outcomes. Good expe-
riences boost the propensity of choosing the gamble
associated with them, and bad experiences diminish
it (e.g., Barron & Erev, 2003; Denrell, 2007; Erev &
Barron, 2005; March, 1996). Some of the models in
the associative class include the Instance-Based Learn-
ing (IBL) model (Dutt & Gonzalez, 2012; Gonzalez
and Dutt, 2011; 2012, Lejarraga, Dutt, & Gonzalez,
2012), Value Updating model (Hertwig et al., 2004),
and Fractional Adjustment model (March, 1996). The
IBL model (Dutt & Gonzalez, 2012; Gonzalez & Dutt,
2011; 2012, Lejarraga, Dutt, & Gonzalez, 2012) con-
sists of experiences (called instances) stored in mem-
ory. Each instance’s activation is a function of the
frequency and recency of the corresponding outcomes
observed during sampling in different options, where
the activation function is borrowed from the Adap-
tive Control of Thought - Rational (ACT-R) cognitive
framework (Anderson & Lebiere, 1998). Activations
are used to calculate the blended value for each op-
tion and the model makes a final choice for the option
with the highest blended value. Gonzalez and Dutt
(2011; 2012) showed that an aggregate IBL model with
three parameters performed efficiently in accounting
for choices aggregated over many participants across
two DFE paradigms. In fact, this IBL model was over-
all the best model in explaining aggregate choices with
fewest parameters.
The second class of models are referred to as cog-

nitive heuristics and this class aims to describe both
the process and outcome of choice as heuristic rules
(Brandstätter et al., 2006; Hertwig, 2012). A popu-
lar cognitive heuristic that focuses on the expected-
value of outcomes obtained during sampling is the
Natural-Mean Heuristic (NMH) (Hertwig & Pleskac,
2010; Hertwig, 2012). As per Hertwig (2012), the

NMH model has the following interesting properties:
(a) it is well tailored to sequentially encountered out-
comes; and, (b) it arrives at its choice prediction by
the expected-value of options based upon sampled out-
comes. Two other heuristics proposed in the cognitive-
heuristic class include the Maximax Heuristic (Hau
et al., 2008) and the Lexicographic Heuristic (Luce
& Raifa, 1957). In Maximax heuristic, the option
with best possible outcome, no matter how likely it
is, is chosen. A lexicographic heuristic generally con-
sists of three building blocks (Gigerenzer & Goldstein,
1996): Search rule: Look up attributes in order of
validity. Stopping rule: Stop search after the first at-
tribute discriminates between alternatives. Decision
rule: Choose the alternative that this attribute favors.
Hau et al. (2008) and Brandstätter et al. (2006)
have shown that both these heuristics seem to un-
derperform compared to the NMH model. Further-
more, a very commonly used baseline heuristic is the
Primed-Sampler (PS) model (Erev, Glozman & Her-
twig, 2008). The PS model depends upon the recency
of sampled information and it looks few samples back
on each option during sampling before making a fi-
nal choice (Gonzalez & Dutt, 2011). A variant of the
PS model is the PS model with variability (Erev, Ert,
Roth, et al., 2010). In this model variant, the look-
back sample size k is varied between participants and
problems. The PS model with variability is a special
case of the NMH model (as the NMH model looks back
the entire sample size while deriving a choice).

Furthermore, Hau et al. (2008) have shown that a
Cumulative Prospect Theory (CPT) model (Tversky
& Kahneman, 1992), which is a popular mathemati-
cal model (sometimes referred to as a “measurement
model” or an “as-if” model), seems to perform about
the same as the NMH model to account for aggregated
choices. In the CPT model, a weighing function and a
value function is associated with each probability and
outcome, respectively. The model chooses the option
that has the highest prospect value, where the prospect
value is determined by multiplying the value with its
corresponding weight. Furthermore, a linear combi-
nation heuristic model (Ensemble) was submitted to
TPT (Erev, Ert, Roth, et al., 2010). The Ensemble
model contains four heuristic rules, PS, CPT, Prior-
ity Heuristic (PH), and NMH, and it was shown to
be the best model in the sampling paradigm. Most
recently, Erev, Ert, Plonsky, et al. (2015) proposed
the BEAST model, which consisted of several heuris-
tic rules like expected value and mental simulations
with a set of distribution parameters. The BEAST
model performed well to capture 14-different aggre-
gate phenomena in the 2015 Choice Prediction Com-
petition. The 14-different aggregate phenomena refer
to anomalies such as Ellsberg paradox, Allais para-
dox, Reflection effect and others described by Erev,
Ert, Plonsky, et al. (2015).

Across the associative-learning models, mathemati-
cal models, and cognitive heuristics, there are aggre-
gate models that possess a single set of parameter val-
ues and predict aggregated choices, i.e., choices that

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 3

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

are averaged over several participants (Busemeyer &
Wang, 2000; Dutt & Gonzalez, 2012; 2015; Gonza-
lez & Dutt, 2011; 2012; Lejarraga, Dutt, & Gonza-
lez, 2012). Also, there exist hierarchical models that
possess a set of distribution parameters to predict ag-
gregated choices, i.e., choices that are averaged over
several participants (Erev, Ert, Plonsky, et al., 2015;
Lee, 2008; Rouder & Lu, 2005).
The IBL, NMH, and CPT models are aggregate

models (possessing a set of single parameter values);
whereas, the BEAST and Ensemble models are hier-
archical models and they possess a set of distribution
parameter values. Within the aggregate and hierar-
chical models, some of the models (like IBL) possess
cognitive processes like recency, frequency, or blend-
ing; whereas, other models (like CPT and Ensemble)
possess mathematical functions that account for biases
in people’s decisions. If possessing a set of distribution
parameter values helps models to account for individ-
ual choices, then we expect hierarchical models like
BEAST and Ensemble to perform well in explaining
individual choices. In contrast, if possessing cogni-
tive mechanisms helps models account for individual
choices, then we expect models like IBL to perform
well in explaining individual decisions. In contrast,
if mathematical functions can accurately account for
biases in individual decisions, then we expect that
models like CPT and Ensemble to perform well in
explaining individual choices. We test these expec-
tations in this paper by calibrating different models to
human data in large datasets involving the sampling
paradigm.

Model selection

Among all associative-learning models, the IBL model
(Dutt & Gonzalez, 2012; Lejarraga, Dutt, & Gonza-
lez, 2012) has been shown as the best performing ag-
gregate model in the sampling paradigm (Gonzalez &
Dutt, 2011; 2012). Gonzalez and Dutt (2011) showed
that the IBL model accounts for aggregate final choices
with a small error. Thus, we choose the IBL model as
one of the models for our evaluation. For this purpose,
we first test the original IBL model (called IBL (LDG)
model; Lejarraga, Dutt, & Gonzalez, 2012) in explain-
ing individual choices with a set of parameter values.
Next, we recalibrated a set of parameter values of this
model to individual choices (called IBL (TPT) model)
in the TPT dataset.

Popular maximax and lexicographic heuristics (Hau
et al., 2008; Luce & Raifa, 1957) have underper-
formed compared to the NMH model (Brandstätter
et al., 2006; Hau et al., 2008). The NMH model
has been reported in literature as explaining aggre-
gate final choices in the sampling paradigm (Hau et
al., 2008; Hertwig, 2012). Thus, we chose the NMH
model as another aggregate model for evaluating indi-
vidual choices.

Furthermore, Hau et al. (2008) have also shown that
different variants of the CPT model (Tversky & Kah-

neman, 1992) perform about the same as the NMH
model to account for aggregate choices. Due to these
reasons, we consider three variants of CPT model for
our evaluation. The first, CPT (TK) model, is based
upon parameters defined by Tversky and Kahneman
(1992). The second, CPT (Hau) model, is based upon
recalibrated parameters from Hau et al. (2008) to de-
rive aggregated final choices. The third, CPT (TPT)
model, has its parameters recalibrated to individual
choices in the TPT dataset.

Erev et al. (2010) have shown the hierarchical En-
semble model, consisting of the PS, CPT, PH, and
NMH models, to perform best in TPT’s E-sampling
condition.1 Given that the Ensemble model contains a
collection of several popular heuristic models, we con-
sider two variants of this model for our evaluation:
Ensemble (TPT) model, which used parameters pro-
posed by Erev et al. (2010); and, Ensemble (Individ-
ual), where we recalibrated a set of parameter values
of this model to individual choices in the TPT dataset.

In addition to the above models, we also consid-
ered the hierarchical BEAST model, which has re-
cently been shown to account for 14-different phenom-
ena in aggregate choices (Erev, Ert, Plonsky, et. al.,
2015). We considered two variants of the BEAST
model: BEAST (CPC) model, which was based on
the same set of distribution parameters as reported
by Erev, Ert, Plonsky, et. al. (2015); and, BEAST
(TPT), which consisted of a set of distribution pa-
rameters calibrated to individual choices in the TPT
dataset.

The Technion Prediction Tournament datasets

The Technion Prediction Tournament (TPT) (Erev et
al., 2010) was a competition in which several partic-
ipants were subjected to an experimental setup, the
E-sampling condition. In this condition, participants
sampled the two blank button options in a problem be-
fore making a final consequential choice for one of the
options. During sampling, participants were free to
click both button options one-by-one and observe the
resulting outcome. Participants were asked to press
the "choice-stage" key when they felt that they had
sampled enough (but not before sampling at least once
from each option). The outcome of each sample was
determined by the structure of the relevant problem.
One option corresponded to a choice where each sam-
ple provided a medium (M) outcome. The other option
corresponded to a choice where each sample provided
a High (H) payoff with some probability (pH) or a
low (L) payoff with the complementary probability (1
- pH). At the choice stage, participants were asked
to select once between the two options. Their choice
yielded a random draw of one outcome from the se-
lected option and this outcome was considered at the
end of the experiment to determine the final payoff.
Competing models submitted to TPT were evaluated

1The CPT model within this Ensemble model estimates the
weighting function using approximations.

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 4

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

following the generalization criterion method (Buse-
meyer &Wang, 2000). As per the generalization crite-
rion method, models were calibrated to aggregate hu-
man choices in 60 problems (the estimation set) and
later tested in a new set of 60 problems (the compe-
tition set) with the set of parameters obtained in the
estimation set. The M, H, pH, and L in a problem
were generated randomly, and a selection algorithm
was used so that the 60 problems in each set differed
in its M, H, pH, and L from other problems. For more
details about the TPT, please refer to Erev, Ert, Roth,
et al. (2010).
In all the models described here, we have consid-

ered an individual human or model participant play-
ing a problem in a dataset as an individual observa-
tion. Also, all model parameters have been calibrated
by using the estimation dataset from TPT that con-
sisted of 60 problems and 1,170 observations.2 In the
experiment involving the TPT’s estimation dataset,
forty participants were randomly assigned to two dif-
ferent sub-groups, where each sub-group contained 20
participants who were presented with a representa-
tive sample of 30 problems. Next, calibrated models
were generalized on 60 problems from TPT’s compe-
tition set (composed of 1,200 observations) and the
Six-Problems (SP) dataset (Hertwig et al., 2004; com-
posed of 150 observations). In the experiment involv-
ing the TPT’s competition dataset, forty new partic-
ipants were randomly assigned to two different sub-
groups, where each sub-group contained 20 partici-
pants who were presented with a representative sample
of 30 problems. In the experiment involving the Six-
Problems (SP) dataset, fifty participants were equally
divided into two groups, where one group played the
first three problems and the other group played the
remaining three problems.

Working of Models

In this section, we detail the working of aggregate or
hierarchical models with a set of point or distribu-
tion parameters values calibrated to individual choices.
In every model, the final choice for each individual
observation is estimated by using the following soft-
max function (Bishop, 2006; Daw, 2011; Sutton &
Barto,1998):

Prob(OptionX) =
eSM eanX

eSM eanX + eSM eanY
(1)

where, SMeanX and SMeanY are the sample means
or expectations of the two options X and Y for a model
participant in a problem; and, Prob(Option X) is the
probability of choosing Option X by a model partic-
ipant. If Option X was chosen by a human partici-
pant in a problem, then the Prob(Option X) is used
to calculate the log-likelihood from a model given its

parameters. The log-likelihood function L is defined
as:

L =
N∑
i=1

ln (Prob(OptionXi)) (2)

Where, i refers to the ith observation (a combi-
nation of a participant playing a problem) and N is
the total number of observations in human data.3
The refers to the natural log and the log-likelihood
is negative as Prob(Option X) is a proportion. The
log-likelihoods measure the goodness-of-fit for individ-
ual choices from a model and greater log-likelihoods
values imply better fits from a model (Busemeyer &
Diederich, 2010). As suggested by Busemeyer and
Diederich (2010), in this paper, to calibrate aggre-
gate or hierarchical model parameters, we minimize
L . That is because our goal is to derive the likeli-
hood of a model making the same choice as made by a
human participant. We detail more about this calibra-
tion process in a future section. Next, we detail the
working of models that we considered for evaluating
individual choices.

Ensemble Model

The Ensemble model (Erev et al., 2010) assumes that
each choice is made based on one of four equally likely
rules and the predicted choice rate is a simple average
across the predictions of four different rules. The first
rule is similar to the Primed-Sampler model with vari-
ability (Erev, Glozman, & Hertwig, 2008). Decision-
makers are assumed to sample each option m times,
and select the option with the highest sample mean.
The value of m is uniformly drawn from the set 1, 2,
3,. . . , 9. The second rule is identical to the first, but
m is drawn from the distribution of sample sizes ob-
served in the estimation set, with samples larger than
20 treated as 20. The third rule in the Ensemble model
is a stochastic variant of CPT (Tversky & Kahneman,
1992), where the weighting function is approximated
based upon certain parameters (the model does not
use the sampling data to determine the weighting func-
tion). The final rule is a stochastic version of the lexi-
cographic priority heuristic (Brandstätter et al., 2006;
Rieskamp, 2008). The probabilities with which search
orders for final rule were porder1 and porder2. The first
order begins by comparing minimum outcomes (i.e.,
minimum gain or minimum loss depending on the do-
main of gambles), then their associated probabilities,
and finally the maximum outcomes. The second order
begins with probabilities of the minimum outcomes,
then proceeds to check minimum outcomes, and ends
with the maximum outcomes (the probabilities with
which both search orders are implemented were deter-
mined from the estimation set). The Ensemble model
2The data of one observation was missing in the original esti-
mation dataset downloaded from the website.
3N = 1,170 observations in TPT’s estimation set.

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 5

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

computes expectations for choosing options from its
constituting models. These expectations are averaged
to give a net expectation. Given a human participant’s
choice, the net expectation (averaged across all rules)
is used to calculate the log-likelihood (using equation
2). In one version of the Ensemble model (called En-
semble (Herzog)), we used the original parameters pro-
posed in Erev et al. (2010) to evaluate the model
against individual choices. However, in a second ver-
sion of the same model (called Ensemble (TPT)), we
calibrated a set of Ensemble model’s distribution pa-
rameters to individual choices using the log-likelihood
function. The Ensemble (TPT) model had a set of
11 parameters. These parameters were assigned sin-
gle values when they were recalibrated to individual
choices. Among the 11 parameters, α, β,γ, δ, λ, µ
belonged to the stochastic variant of CPT; while, pa-
rameters To, Tp, σ, porder1, and porder2 were part of
the priority heuristic. The σ was a free distribution
parameter that defined the variance of a normal distri-
bution. If the subjective difference involving the first
comparison in each search order exceeds a threshold
t, then the more attractive option is selected based
on this comparison; otherwise, the next comparison
is executed. The values of thresholds are other free
distribution parameters. The estimated values are To
for the minimum- and maximum- based comparisons,
and Tp for the probability-based comparison (both To
and Tp enabled define the mean of the normal distri-
bution). The α, β, γ, δ, λ, µ, σ and To parameters
were varied between 0 and 1.5, σ and To were varied
between 0 and 1 while probabilistic parameters poder1,
porder2 and Tp were varied between 0 and 1.0. These
ranges ensured that the optimization could capture the
optimal parameter values with high confidence. Dur-
ing model calibration, the initial parameter population
was set to parameters from Erev et al. (2010).

Natural Mean Heuristic (NMH) Model

The NMH model (Hertwig & Pleskac, 2010) involves
the following steps:
Step 1. Calculate the natural mean of observed out-

comes for each option by summing, separately for each
option, all n experienced outcomes and then dividing
by n.
Step 2. Apply equation 1, where the sample mean

for an option is replaced by its natural mean. In the
NMH model, there are no free parameters. Like the
Ensemble model, we evaluate the log-likelihood value
from the NMH model (using equation 2).

Instance-Based Learning (IBL) Model

The IBL model (Dutt & Gonzalez, 2012; Gonzalez &
Dutt, 2011; 2012; Lejarraga, Dutt, & Gonzalez, 2012)
is based upon the ACT-R cognitive framework (Ander-
son & Lebiere, 1998). In this model, every occurrence
of an outcome of an option is stored in the form of an
instance in memory. An instance is made up of the fol-
lowing structure: SDU, here S is the current situation

(many blank option buttons on a computer screen), D
is the decision made in the current situation (choice
for one of the option buttons), and U is the goodness
(utility) of the made decision (the outcome obtained
upon making a choice for an option). When a deci-
sion choice needs to be made, instances belonging to
each option are retrieved from memory and blended
together. The blended value of an option j (e.g., a
gamble that pays $5 with 0.9 probability or $0 with
probability 0.1) at any trial t is defined as:

Vj,t =
n∑
i=1

pi,j,txi,j,t (3)

where xi,j,t is the value of the U (outcome) part of an
instance (e.g., either $5 or $0, in the previous example)
i on option j in trial t. The pi,j,t is the probability of
retrieval of instance i on option j from memory in trial
t. Because xi,j,t is value of the U part of an instance
i on option j in trial t, the number of terms in the
summation changes when new outcomes are observed
within an option j (and new instances corresponding
to observed outcomes are created in memory). Thus,
n=1 if j is an option with one possible outcome. If j is
an option with two possible outcomes, then n=1 when
one of the outcomes has been observed on an option
(i.e., one instance is created in memory) and n=2 when
both outcomes have been observed (i.e., two instances
are created in memory).

In any trial t, the probability of retrieval of an in-
stance i on option j is a function of the activation of
that instance relative to the activation of all instances
(1, 2, . . . n) created within the option j, given by:

pi,j,t =
e(Ai,j,t)/τ∑n
i=1 e

(Ai,j,t)/τ
(4)

where τ, is random noise defined as σ ∗
√

2 and σ
is a free noise parameter. Noise in Equation (4) cap-
tures the imprecision of recalling past experiences from
memory. Activation of an instance is a function of the
frequency and recency of observed outcomes that occur
on choosing options during sampling. The activation
of an instance i corresponding to an observed outcome
on an option j in a given trial t is a function of the
frequency of the outcome’s past occurrences and the
recency of the outcome’s past occurrences (as done in
ACT-R). In each trial t, activation Ai,j,t of an instance
i on option j is given by:

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 6

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

where d is a free decay parameter; γi,j,t is a ran-
dom draw from a uniform distribution bounded be-
tween 0 and 1 for instance i on option j in trial t;
and Tp is each of the previous trials in which the out-
come corresponding to instance i was observed in the
binary-choice task. The IBL model has two free pa-
rameters that need to be calibrated: d and σ. The
d parameter controls the reliance on recent or distant
sampled information. Thus, when d is large (> 1.0),
then the model gives more weight to recently observed
outcomes in computing instance activations compared
to when d is small (< 1.0). The σ parameter helps to
account for the sample-to-sample variability in an in-
stance’s activation. Thus, blended value of each option
is a function of activation of instances corresponding
to outcomes observed on the option. In this model, we
feed the sampling of individual human participants to
generate instance activations and blended values. Ev-
ery time a choice is made and outcome is observed, the
instance associated with it is activated and thereafter
blended values are computed for options faced by an
individual participant. At final choice, the likelihood
is computed from the blended values that replace the
option means in Equation 1. In one version of the
model, IBL (LDG), we used single value of parameters
suggested by Lejarraga, Dutt, and Gonzalez (2012) to
test the model against individual choices. However, in
a second version of the model, IBL (TPT), we cali-
brated a set of d and σ parameters in the IBL model
to individual choices. For this calibration, we deter-
mine the model’s log-likelihood value for making the
same choice as made by each human participant. Dur-
ing optimization, both d and σ parameters were varied
between 0 and 20. These ranges ensured that the opti-
mization could capture the optimal parameter values
with high confidence. During parameter calibration,
the initial parameter population was set to parame-
ters from Lejarraga, Dutt, and Gonzalez (2012).

Cumulative Prospect Theory (CPT) Model

The CPT model (Hau et al., 2008; Tversky & Kah-
neman, 1992) assumes that people first form subjec-
tive beliefs of the probability of events, and then enter
these beliefs into cumulative prospect theory’s weight-
ing function (Fox & Tversky, 1998; Tversky & Fox,
1995). Similarly, people associate a value (utility) cor-
responding to outcomes observed in options. The CPT
consists of the following four steps:
Step 1. Assess the sample probability, pj, of the

nonzero outcome in given option j.
Step 2. Calculate the expected gain (loss) of option

j, Ej

Ej = w(pj)v(xj) (6)

where w represents a weighting function for the
probability experienced in the option j, and v repre-
sents a value function for the experienced outcome xj

in the option j. According to Tversky and Kahneman
(1992), the weighting function w is defined as:

w(pj) =




p
γ
j(

p
γ
j + (1 −pj)γ

)1/γ ,if x ≥ 0
pδj(

pδj + (1 −pj)δ
)1/δ ,if x < 0

(7)

The γ and δ are adjustable parameters that fit the
shape of the function for gains and losses, respectively.
The weighting function w has an S-shape that under-
weights small probabilities and overweighs larger ones
(Hertwig, 2012). The x represents the outcome asso-
ciated with the probability pj. The value function v is
defined as:

v(xj) =
{
xαj ,if xj ≥ 0

−λ
(
|xj|β

)
,if xj < 0

(8)

Here, α and β are adjustable parameters that fit the
curvature for gain and loss domains, respectively. Fi-
nally, the λ parameter (λ > 1) scales loss aversion. The
xj represents the outcome associated with the option
j.

Step 3. Assess the prospect value of the option by
multiplying the weight with the value obtained.

Step 4. Given a human participant’s choice, cal-
culate the log-likelihood value of model making this
choice using Equation 1 and Equation 2. The prospect
value replaces the sample mean in Equation 1.

As seen above, the CPT model has 5 parameters, α,
β, γ, δ, and λ; and, we investigated three versions of
CPT model. In the first model, CPT (TK), we tested
the set of parameter values estimated by Tversky and
Kahneman (1992) against individual choices. In the
second model, CPT (Hau), we tested the set of pa-
rameter values estimated by Hau et al. (2008) against
individual choices. In the third model, CPT (TPT),
we recalibrated a set of parameter values in the CPT
model to individual choices. All five parameters were
varied between 0 to 5. These ranges ensured that the
optimization could capture the optimal parameter val-
ues with high confidence. During calibration, the ini-
tial parameter population was set to parameters from
Hau et al. (2008).

Best Estimate and Simulation Techniques
(BEAST) Model

The BEAST model captures the joint effect of and the
interaction between 14-choice phenomena at aggregate
level discussed in the 2015 Choice Prediction Compe-
tition (Erev, Ert, Plonsky, et. al., 2015). The first
assumption in this model is to compute the expected

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 7

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

values of options (since people try to maximize pay-
offs). The second assumption uses mental simulations
that were found to lead to good outcomes in similar
situations in the past (Marchiori, Di Guida, & Erev,
2015; Plonsky, Teodorescu, & Erev, 2015). Each sim-
ulation uses four different techniques, unbiased, uni-
form, contingent pessimism, and sign. The unbiased
technique implies random and unbiased draws, either
from an option’s described distributions or from an op-
tion’s observed history of outcomes. The other three
techniques are “biased” and imply overgeneralizations.
They can be described as mental draws from distribu-
tions that differ from the objective problem distribu-
tions. The three biased techniques are each used with
equal probability. The simulation technique uniform
yields each of the possible outcomes with equal prob-
ability. This technique enables the model to capture
underweighting of rare events and the splitting effect.4
The simulation-technique contingent pessimism is like
the priority heuristic (Brandstätter et al., 2006); it de-
pends on the sign of the best possible payoff and the
ratio of the minimum payoffs. This technique helps the
model capture loss aversion and the certainty effect.
The simulation technique sign implies high sensitivity
to the payoff sign. It is identical to the technique un-
biased, with one important exception: positive drawn
values are replaced by R, and negative outcomes are
replaced by -R, where R is the payoff range (the dif-
ference between the best and worst possible payoffs
in the current problem). This model has six distribu-
tion parameters, σ, κ, β, γ, φ, and θ, where each of
these parameters defines the upper bound of uniform
distributions with a 0.0 lower bound (κ defined the
upper bound of a discrete uniform distribution with
a 0.0 lower bound). Four of these parameters (σ, κ,
β, and γ) are needed to capture decisions under risk
without feedback. The parameter φ captures attitude
toward ambiguity, and θ abstracts the reaction to feed-
back. In this model, the expectation for one of the op-
tions, option A, equals BEV A(r) + ST A(r) + e(r) and
that for the other option, option B, equals BEV B(r)
+ ST B(r). Here, BEV A(r) and BEV B(r) are the best
estimates of the expected values of both options A and
B after r samples; ST A(r) and ST B(r) are the expec-
tations based on mental simulations techniques after
r samples, and e(r) is an error term after r samples
(e(r) is drawn from a normal distribution with a mean
0 and standard deviation σ). Given a human partici-
pant’s choice, the expectations on different options are
used to determine the log-likelihood in the model (us-
ing Equation 1 and Equation 2). In one of the BEAST
versions, BEAST (CPC), we used the set of parameter
values reported by Erev, Ert, Plonsky, et. al. (2015)
against individual choices. However, in another ver-
sion, BEAST (TPT), we recalibrated the set of distri-
bution parameter values to individual choices. All six
parameters were varied between 0 to 20. These ranges
ensured that the optimization could capture the op-
timal parameter values with high confidence. During
recalibration, the initial population of parameters was
taken from Erev, Ert, Plonsky, et. al. (2015).

Method

Dependent variables

In this paper, we account for final choices made by in-
dividual participants in different problems. For this
purpose, given a choice made by a human partici-
pant in a problem, we calculate the log-likelihood of
a model participant making the same choice in the
same problem. In all models, if the probability of
making a human participant’s choice is greater than
0.5, then it is assumed that the model choice coin-
cides with the human choice. Using this 0.5 rule, we
compare whether both model and human participants
select the maximizing option in a problem. The maxi-
mizing option is the one that has the highest expected
value among both options (expected value is calcu-
lated by using the objective probability distribution
of outcomes in options). If both human participants
and model participants select the maximizing option
or the non-maximizing option in a problem, then the
model can explain the human participant’s choice. Us-
ing this method, in the TPT’s estimation set, the final
choices made by model observations are compared to
1,170 human observations, i.e., the total number of hu-
man observations available. The comparison between
human choices and model choices is used to compute
the incorrect proportion for each model, which is the
main criteria for capturing individual behavior by a
model. The incorrect proportion is simply a propor-
tion of human choices that were different from model
predictions. It is defined as:

Incorrect Proportion = (MHNM + NHMM )/
(MHNM + NHMM + NHNM + MHMM )

(9)

where, MHNM is the number of observations where
the human participant makes a maximizing choice but
the model predicts a non-maximizing choice. NHMM
is the number of observations where the human partic-
ipant makes a non-maximizing choice but the model
predicts a maximizing choice. Similarly, the MHMM
and NHNM are the number of observations, where the
human participant observation makes the same choice
(maximizing or non-maximizing) as predicted by the
model. The smaller the value of the incorrect pro-
portion, the more accurate is the model in account-
ing for individual human choices. Once model param-
eters were calibrated to individual choices using the
log-likelihood function, the incorrect proportions were
computed from different models and compared.

4According to Birnbaum (2008) and Tversky and Kahneman
(1986) splitting an attractive outcome into two distinct out-
comes can increase the attractiveness of a prospect even when
it reduces its expected value. This phenomenon is referred to as
the splitting effect.

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 8

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Parameter calibration

Given the choice made by a human participant, we
use Equation 1 and Equation 2 to compute the log-
likelihood from a model of making the same choice as
made by a human participant. Classically, Equation 1
has used an inverse temperature parameter β which
scales the sample means (Busemeyer & Diederich,
2010). In this paper, we assume β = 1 across all
models as we did not want to introduce an additional
free parameter beyond those already present in mod-
els. That is because, the β parameter’s recalibration
to individual choices could benefit models differently.
As β = 1 across all models, the β parameter does not
favor some models over others.
The NMH model did not require parameter calibra-

tion as this model did not possess any parameters.
The set of parameters of Ensemble, BEAST, CPT,
and IBL models were recalibrated using a Genetic
Algorithm (GA) program. The GA is a probabilis-
tic (stochastic) trial-and-error method of optimization
that is different from other deterministic methods like
steepest gradient descent. Due to GA’s trial-and-error
nature and its dependence on processes like reproduc-
tion, crossover, and mutation, the algorithm provides
good chances of avoiding local optima in the param-
eter search space (Jakobsen, 2010; Gonzalez & Dutt,
2011; Houck, Joines & Kay, 1995). In addition, prior
research involving models have used the GA procedure
for model calibration (Gonzalez & Dutt, 2011; 2012;
Lejarraga, Dutt, & Gonzalez, 2012). In our model cal-
ibrations, the GA repeatedly modified a population of
parameter tuples to find the tuple that minimized the
negative of model’s log-likelihood function (Equation
2) across all human participants. In each generation,
the GA selected parameter tuples randomly from a
population to become parents and used these parents
to select children for the next generation. For each pa-
rameter tuple in a generation, each model was run five
times across 1,170 participants to minimize the neg-
ative of model’s average log-likelihood function over
five runs.5 Over successive generations, the population
evolved toward an optimal solution. The population-
size was set to 20 randomly-selected parameter tuples
per generation (each tuple contained a certain value
for each of the model’s parameters). The mutation
and crossover fractions were both set at 0.5 after a grid
search for the best combination. The best combination
for mutation and crossover fractions was found by cal-
ibrating the IBL (LDG) model to aggregate choices
using its known parameters (d = 5.0; σ = 1.5). We
systematically varied the mutation and crossover frac-
tions in steps of 0.1 in the interval [0, 1] for finding
their best combination. The optimal values of muta-
tion and crossover fraction (= 0.5) were those where
the optimization converged the IBL (LDG) parameters
to their optimal values in the least number of genera-
tions. These optimal values of mutation and crossover
fraction found were used for calibration of model pa-
rameters to individual choices. The GA procedure was
implemented in Matlab R© toolbox (Houck, Joines &

Kay, 1995; Mathworks, 2012), where the stopping cri-
teria in optimization of model parameters involved the
following constraints: stall generations = 200, func-
tion tolerance = 1x10−8, and when the average relative
change in the fitness function value over 200 stall gen-
erations was less than function tolerance (1x10−8).

Results

Calibration in TPT’s estimation set

Table 1 shows parameter calibration results from dif-
ferent models in TPT’s estimation dataset. The ta-
ble lists different models, calibrated parameter values,
combinations obtained due to comparison of human
and model final choices, log-likelihoods, and incorrect
proportions.

Calibrated parameters

The best model in terms of log-likelihood values was
CPT (TPT). Five parameters were calibrated in the
CPT (TPT) model and the calibrated model possessed
the log-likelihood of -634.7, which was significantly
larger than that for the CPT (TK) model (-662.8) and
CPT (Hau) model (-643.9). The calibrated parameter
values were: α = 1.008; β =0.96; γ =2.00; δ =0.92; λ
=1.03. The free parameters for the value function in-
dicated slightly less magnitude of disutility for losses
compared to the utility for gains. The value func-
tion for the CPT (TPT) model was aligned with risk-
neutral behavior for both gains and losses, which was
different from the behavior in the CPT (Hau) model
and in the CPT (TK) model. Furthermore, the weight-
ing function of the CPT (TPT) model showed under-
weighting of small probabilities for positive outcomes
and about equal weighting of small probabilities for
negative outcomes. Furthermore, the weighting func-
tion for the CPT (Hau) and CPT (TK) models over-
weight small probabilities for both positive and nega-
tive outcomes. Please see Appendix D for shapes of
value and weighting functions for different CPT mod-
els.
The Ensemble (TPT) model was second best model

where the model exhibited a log-likelihood of -691.0.
The model’s calibrated parameters were α=0.75, β =
1.46, γ = 1.42, δ = 1.03, λ = 1.13, µ= 0.37, T0=0.001,
porder1= 0.38, σ = 0.020, Tp= 0.18 and porder2 =
0.62. The first six parameters from the model de-
picted underweighting of rare events and loss aversion
with losses perceived as more damaging compared to
gains. The later five parameters from priority heuristic
showed smaller variance in distribution of σ parame-
ter compared to the last round. Also, results indicated
underweighting of small probabilities, overweighting of

5The number of runs were set to five after analyzing the run-
to-run variability in models with stochasticity (e.g., IBL and
BEAST). Five runs were chosen as there was little change in
the standard deviation by increasing the number of runs beyond
five.

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 9

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Figure 1. The correct proportions against number of parameters from different models calibrated in the TPT’s estimation set.

large probabilities, and diminishing sensitivity to gains
and losses.

The next best model was the IBL (TPT) model,
where this model exhibited a considerably larger log-
likelihood value of -929.0 compared to the IBL (LDG)
model. The IBL (TPT) model’s calibrated parame-
ters were: d = 5.39 and σ = 0.04. These parame-
ters indicated reliance on recency of sampled informa-
tion, which provides a plausible account of recency’s
role in human participant’s sampling and subsequent
choice. The recency reliance for individual choices is
also in agreement with documented reliance on recency
in aggregate choices (Dutt & Gonzalez, 2012; Gonzalez
& Dutt, 2011; 2012; Hertwig et al., 2004; Lejarraga,
Dutt, & Gonzalez, 2011). In fact, the d parameter
value was higher for the model calibrated to individ-
ual choices compared to the model calibrated to the
aggregate choices. Furthermore, the participant-to-
participant variability (captured by the σ) was smaller
in the IBL (TPT) compared to the IBL (LDG) model.
This observation showed less variability among indi-
vidual participants in their choices.
For the BEAST (TPT) and NMH model, the log-

likelihood values (-1129.0, -1386.5) were much smaller
compared to those for the individual versions of CPT,
Ensemble, and IBL models. Please see Table 1 for
log-likelihood values of different models.

Incorrect proportion

In the calibration datasets, the CPT (TPT) model
possessed the best incorrect proportion of 0.15. In the
CPT (TPT) model, the desirable NHNM and MHMM
combinations were 39% and 45%, respectively. In con-
trast, the erroneous NHMM and MHNM combinations
were 10% and 6%, respectively. The CPT (Hau) model
showed an error proportion of 0.16. The model showed

39% of NHNM combinations and 45% of MHMM com-
binations. The erroneous combinations included, 9%
for NHMM and 7% for MHNM. The incorrect propor-
tion for the CPT (TK) model was 0.18. The propor-
tions of desirable NHNM and MHMM combinations
were 41% and 42%, respectively. In addition, the er-
roneous NHMM and MHNM combinations were 8%
and 10%, respectively.
The next best model was the IBL (TPT) model,

where the model exhibited incorrect proportion of
0.21. The IBL (TPT) model showed 39% and 40%
of desirable NHNM and MHMM combinations. The
erroneous NHMM and MHNM combinations were 9%
and 12%, respectively.
Beyond the IBL (TPT) model, the BEAST (TPT)

model did well with an incorrect proportion of 0.24.
The four combination proportions for BEAST (TPT)
model were: 36% (NHNM), 41% (MHMM), 13%
(NHMM) and 10% (MHNM).
Next, to gauge the benefit of explaining individ-

ual choices with different model parameters, we plot-
ted correct proportions from calibrated models against
their number of free parameters (see Figure 1). Models
closer to the origin are the ones that explain individ-
ual choices with least number of free parameters. The
magnitude distance of IBL (TPT) and CPT (TPT)
models from origin (= 2 and 5 units, respectively) was
much less than that for the BEAST (TPT) and En-
semble (TPT) models (= 6 and 11 units, respectively).
Thus, based upon the distance metric, IBL and CPT
models explained individual choices with fewer num-
ber of free parameters. Thus, it seems that cognitive
mechanisms like recency, frequency, and blending as
well as mathematical functions that underweight rare
outcomes and value gains and losses differently are ap-
propriate to account for individual choices.

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 10

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

T
ab
le

1.
C
al
ib
ra
ti
on

re
su
lt
s
fr
om

m
od

el
s
in

T
P
T
’s
es
ti
m
at
io
n
da
ta
se
t.

P
er
ce
nt
ag
e
of

11
70

O
bs
er
va
ti
on

s

C
om

bi
na
ti
on

s
fr
om

H
um

an
E
ns
em

bl
e

E
ns
em

bl
e

N
M
H

IB
L

IB
L

C
P
T

C
P
T

C
P
T

B
E
A
ST

B
E
A
ST

an
d
M
od

el
D
at
a
H
/M

(H
er
zo
g)

(T
P
T
)

(L
D
G
)

(T
P
T
)

(T
K
)

(H
au
)

(T
P
T
)

(C
P
C
)

(T
P
T
)

α
=
1.
19

,
α
=
0.
75

,

β
=
1.
35

,
β
=
1.
46

,

γ
=
1.
42

,
γ
=
1.
42

,
σ
=
7.
00
,

σ
=
0.
24
,

δ
=
1.
54

,
δ
=
1.
03

,
α
=
0.
88
,

α
=
0.
94
,

α
=
1.
00
8,

κ
=
3.
00
,

κ
=
1.
99
,

λ
=
1.
19

,
λ
=
1.
13

,
β
=
0.
88
,

β
=
0.
86
,

β
=
0.
96
,

β
=
2.
6,

β
=
0.
06
,

µ
=
0.
41

,
µ
=
0.
37

,
d
=
5.
00
,

d
=
5.
39
,

γ
=
0.
61
,

γ
=
0.
99
,

γ
=
2.
00
,

γ
=
0.
50
,

γ
=
1.
16
,

T
0
=
0.
00

01
,

T
0
=
0.
00

1,
-

σ
=
1.
50

σ
=
0.
04

δ
=
0.
69
,

δ
=
0.
93
,

δ
=
0.
92
,

ϕ
=
0.
07
,

ϕ
=
0.
03
,

p
o

r
d

e
r

1
=
0.
38

,
p

o
r

d
e

r
1
=
0.
38
,

λ
=
1.
00

λ
=
1.
00

λ
=
1.
03

θ
=
1.
00
,

θ
=
1.
17
,

σ
=
0.
03

7,
σ
=
0.
02

,

T
p
=
0.
11

,
T

p
=
0.
18

,

p
o

r
d

e
r

2
=
0.
62

p
o

r
d

e
r

2
=
0.
62

N
H
N

1 M
31

32
29

26
39

41
39

39
33

36

M
H
M

M
40

40
33

32
40

42
45

45
37

41

N
H
M

M
17

18
19

23
09

08
09

10
15

13

M
H
N

M
12

11
19

20
12

10
07

06
15

10

In
co
rr
ec
t
pr
op

or
ti
on

0.
29

0.
28

0.
37

0.
43

0.
21

0.
18

0.
16

0.
15

0.
31

0.
24

Lo
g
Li
ke
lih
oo

d
-6
96

.2
-6
91

.0
-1
38
6.
5

-3
15
8.
0

-9
29
.0

-6
62
.8

-6
43
.9

-6
43
.7

-1
97
1.
0

-1
12
9.
0

N
ot

e.
1
N

H
an
d
M

H
re
fe
rs

to
no

n-
m
ax
im

iz
in
g
an
d
m
ax
im

iz
in
g
hu

m
an

ch
oi
ce
s,
re
sp
ec
ti
ve
ly
.
N

M
an
d
M

M
re
fe
rs

to
no

n-
m
ax
im

iz
in
g
an
d
m
ax
im

iz
in
g
m
od

el
ch
oi
ce
s,
re
sp
ec
ti
ve
ly
.

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 11

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Generalization to different datasets

Up to now, different models predicted choices of in-
dividual participants in TPT’s estimation set using a
single set of parameter values. These models, how-
ever, possess different number of free parameters. Due
to differences in model parameters it becomes diffi-
cult to compare model performance during parameter
calibration. One method that allows us to compare
models and account for parameter differences is gen-
eralization (Busemeyer & Diederich, 2010; Busemeyer
& Wang, 2000; Dutt & Gonzalez, 2012). In gener-
alization, models with calibrated parameters are run
in new problems (Busemeyer & Wang, 2000). Ide-
ally, new problems encountered during generalization
should be different from those encountered during cal-
ibration; otherwise, generalization may favor models
that show superior performance during calibration. In
what follows, we first generalize calibrated models to
problems in TPT’s competition set. Generalization of
this kind was also followed for models submitted to
TPT (Erev et al., 2010). However, problems in TPT’s
competition set were derived using the same algorithm
as in TPT’s estimation set (Erev et al., 2010). Thus, it
is likely that the nature of problems across competition
and estimation sets were similar and that TPT’s com-
petition set provided a weaker generalization dataset
with respect to TPT’s estimation set. To overcome
this limitation, we also generalized calibrated models
to the Six-Choice (SC) problems dataset (Hertwig et
al., 2004), where the SC problems were different in
structure and nature compared to the TPT problems.
We will first report on the generalization results for the
TPT’s competition set, then those for the SC dataset.

Generalization to competition set. TPT’s competi-
tion set was like the estimation set with two excep-
tions: problems in competition set were different from
those in the estimation set and different subjects par-
ticipated in the competition set compared to the esti-
mation set (Erev et al., 2010). The 60 problems in the
competition set were selected using the same algorithm
as used for the estimation set. To explain individual
choices, all models were run in the competition set us-
ing the parameters obtained in the estimation set.
Table 2 shows generalization results from different

models in the competition set. In all models, param-
eters were set to values reported in Table 1. Over-
all, the incorrect proportions obtained from models in
the competition set were like those obtained in the
estimation set. Calibrated models performed better
compared to their uncalibrated counterparts that bor-
rowed parameter values for aggregate choices from lit-
erature. The incorrect proportion was the lowest for
the CPT (TPT) model, where IBL (TPT) and Ensem-
ble (TPT) models took the second and third places,
respectively. Also, all three models performed sig-
nificantly better than the BEAST and NMH models.
These results highlight the role of the certain mech-
anisms in explaining individual choices: recency, fre-
quency, and blending of encountered information dur-

ing sampling, the underweighting of rare events, and
the differential valuation of gains and losses.

Sequential analysis. To gauge models in accounting
for individual differences, we evaluated the proportion
of sequential decisions in models from last sample to
final choice. Here, human and model choices were
analyzed sequentially. Thus, we evaluated decisions
made by human participants during their last sam-
ple and consequential choice and then compared these
sequential decisions to those from models. Table 3
presents the proportion of model participants and like-
lihoods showing a transition that was similar to or dif-
ferent from human participants in TPT’s competition
dataset. Based upon the last sample and consequen-
tial choice among human participants, the following
four transition possibilities existed: N → N, N → M,
M → N, and M → M, where the first letter (before
the arrow) corresponds to the choice made by a hu-
man participant during her last sample and the second
letter (after the arrow) corresponds to the final choice
made by the same participant after sampling. For each
last sample and final choice transition by a human par-
ticipant, there are two transition possibilities for the
model: first, like the human participant; and, second,
different from the human participant. If the model is
suggestive of individual choice, then the model should
show a transition between last sample and final choice
like human participants for more than 50 percent (i.e.,
a majority) of its participants. We evaluated sequen-
tial decisions in the top four models: CPT (TPT), IBL
(TPT), Ensemble (TPT) and NMH model. As shown
in Table 3, across all transitions, N → N, N → M,
M → N, and M → M, the CPT (TPT) model per-
formed better compared to all other models. Thus,
the CPT (TPT) model made stronger correct predic-
tions for human transitions from last sample to final
choice compared to the Ensemble (TPT), NMH, and
IBL models. The IBL (TPT) model performed supe-
rior to the Ensemble (TPT) model on two kinds of
transitions: NN and MN. Overall, these results show
that underweighting of experienced probabilities, loss-
aversion due to negative outcomes, and recency and
frequency processes seem to account for sequential in-
dividual choices in data.

Six Choice (SC) dataset. In the section above, we
generalized models to TPT’s competition set. How-
ever, the problems in the competition set were sim-
ilar to those in the estimation set as the problem-
generation algorithm remained the same between the
two sets. Due to this observation, the competition
set provides a weaker generalization dataset. In order
to overcome this limitation, we also generalized cali-
brated models to the Six Choice (SC) dataset (Her-
twig et al., 2004; Appendix C), where the structure
of options across problems in SC dataset was differ-
ent from that in TPT’s estimation and competition
sets. In the SC dataset, all six problems presented
options that differed with respect to expected value;

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 12

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Table 2. Generalization results from models in TPT’s competition dataset.

Percentage of 1200 Observations

Combinations
from Hu-
man Data
and Model
H/M

Ensemble
(Her-
zog)

Ensemble
(TPT)

NMH IBL
(LDG)

IBL
(TPT)

CPT
(TK)

CPT
(Hau)

CPT
(TPT)

BEAST
(CPC)

BEAST
(TPT)

NHNM 20 20 25 22 29 32 33 33 21 24

MHMM 46 46 39 40 43 53 49 50 36 39

NHMM 21 20 15 19 12 09 08 09 19 17

MHNM 14 13 21 20 17 09 10 07 24 20

Incorrect
proportion

0.34 0.33 0.36 0.39 0.28 0.17 0.18 0.16 0.42 0.37

Note. 1 NH and MH refers to non-maximizing and maximizing human choices, respectively. NM and MM refers to
non-maximizing and maximizing model choices, respectively.

Table 3. Proportion of model participants following a transition that is similar to or different from human participants in the Competition
dataset.

Human Transition
(Last Sample
→ Final Choice)

Model Transition
(Last Sample
→ Final Choice)

CPT
(TPT) (%)

IBL
(TPT) (%)

Ensemble
(TPT) (%)

NMH (%)

N→N N→N 79 73 54 62

N→M 21 27 46 38

N→M N→M 80 70 77 64

N→N 20 30 23 36

M→N M→N 77 67 51 62

M→M 23 34 49 38

M→M M→M 87 74 78 66

M→N 13 26 22 34

Note. N and M refers to non-maximizing and maximizing choices, respectively.

Table 4. Generalization results from models in the SC problems dataset.

Percentage of 150 Observations

Combinations
from Hu-
man Data
and Model
H/M

Ensemble
(Her-
zog)

Ensemble
(TPT)

NMH IBL
(LDG)

IBL
(TPT)

CPT
(TK)

CPT
(Hau)

CPT
(TPT)

BEAST
(CPC)

BEAST
(TPT)

NHNM 45 46 55 41 51 37 37 39 33 34

MHMM 20 19 26 23 32 25 31 27 31 31

NHMM 14 13 03 18 07 22 21 20 25 25

MHNM 21 22 15 19 09 17 11 10 10 11

Incorrect
proportion

0.35 0.35 0.19 0.37 0.16 0.39 0.34 0.33 0.36 0.35

Note. 1 NH and MH refers to non-maximizing and maximizing human choices, respectively. NM and MM refers to
non-maximizing and maximizing model choices, respectively.

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 13

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Table 5. Proportion of model participants following a transition that is similar to or different from human participants in the SC problems
dataset.

Human Transition
(Last Sample
→ Final Choice)

Model Transition
(Last Sample
→ Final Choice)

NMH (%) IBL
(TPT) (%)

Ensemble
(TPT) (%)

CPT
TPT (%)

N→N N→N 96 81 79 66

N→M 4 19 21 34

N→M N→M 54 75 36 64

N→N 46 25 64 36

M→N M→N 91 89 71 66

M→M 9 11 29 34

M→M M→M 71 74 59 68

M→N 29 26 41 32

Note. N and M refers to non-maximizing and maximizing choices, respectively.

four of them offered positive prospects and two offered
negative prospects. All problems in the SC dataset
were run in the sampling paradigm format: free sam-
pling of options followed by a final choice for one of
the options for real. During sampling, participants
could sample options in whatever order they desired,
and however often they wished. They were encour-
aged to sample until they felt confident enough to de-
cide from which option to draw a real payoff. Like the
TPT dataset, each problem consisted of choosing be-
tween two options. However, unlike the TPT dataset,
problems in the SC dataset could have both options
risky: Both options could independently contain high
and low outcomes with predefined probability distribu-
tions. Problems in SC dataset belonged to both pos-
itive and negative domains. In positive domain, the
associated non-zero outcomes were positive; whereas,
in the negative domain, the associated non-zero out-
comes were negative. Overall, the TPT dataset and
SC dataset differed on the number of outcomes possi-
ble on options and the presence of the mixed domain
in TPT and its absence in the SC problems.

Table 4 shows the generalization results from run-
ning different models in the SC dataset (model param-
eters were calibrated in the estimation set). As shown
in Table 4, the IBL (TPT) model was the best per-
forming model with an incorrect proportion of 0.16.
The NMH model was the second-best model with an
incorrect proportion of 0.19. The CPT (TPT) model
was the third-best model with an incorrect propor-
tion of 0.33. Other hierarchical models like Ensemble
and BEAST did not perform as well in the SC dataset
and possessed higher incorrect proportions. Further-
more, models with recalibrated parameters performed
better compared to models with parameters for aggre-
gate choices borrowed from literature. These results
show that when a more challenging generalization is
performed, models like IBL and CPT that are based
upon activations and recency and frequency mecha-
nisms as well as assumptions of underweighting of rare
outcomes and different valuation of gains and losses
perform better compared to other model that rely on

heuristics rules and biased sampling techniques.

Sequential Analyses. To evaluate models at explain-
ing individual differences, we analyzed the top four
models in the SC dataset. Table 5 shows the tran-
sition from the last sample to final choice for human
and model participants in the SC problems dataset.
As seen in Table 5, both IBL and NMH models were
suggestive of human-like transitions for all four com-
binations based upon the 50% rule. The IBL (TPT)
model performed better compared to the NMH model
in NM and MM transitions and poorer compared to
the NMH in NN and MN transitions. Overall, these
results show the role of recency and frequency pro-
cesses during sampling in individual choices.

Discussion

Till recently, researchers had evaluated how aggregate
or hierarchical models with a set of parameter val-
ues explained aggregate choices made from experience
(Dutt & Gonzalez, 2012; Gonzalez & Dutt, 2011; 2012;
Lee, 2008; Lejarraga, Dutt, & Gonzalez, 2012; Rouder
& Lu, 2005). Also, researcher had evaluated how mod-
els with a single set of parameter values calibrated
to each participant explained individual choices (indi-
vidual models; Kudryavtsev & Pavlodsky, 2012; Frey,
Mata, & Hertwig, 2015). However, little was known
on how aggregate or hierarchical models with a set
of single or distribution parameter values would per-
form when they are made to account for individual
choices. In this paper, we contributed to this investi-
gation by calibrating aggregate or hierarchical models
with a set of single or distribution parameter values to
individual choices across three different datasets. Ag-
gregate and hierarchicalmodels were calibrated in the
Technion Prediction Tournament (TPT)’s estimation
set using the log-likelihood function and later general-
ized to TPT’s competition dataset (Erev, Ert, Roth, et
al., 2010) and the Six-Choice (SC) problems dataset
(Hertwig et al., 2004). We followed the traditional
approach of model comparison via generalization as

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 14

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

proposed by Busemeyer and Wang (2000).
Overall, our results revealed that both aggregate

and hierarchical models performed above chance (=
50%) when their parameters were calibrated to in-
dividual choices. Even parameter values calibrated
to aggregate choices (borrowed from literature) per-
formed above chance in these models. The CPT model
performed well overall in the calibration and general-
ization datasets from TPT. Models such as Ensemble
and CPT possess rules like weighting and value func-
tions that abstract the sampling process experienced
by human participants. From our results, these con-
structs help such models in cases where the general-
ization environment is similar to the calibration envi-
ronment (as in TPT); however, not when the gener-
alization environment is different from the calibration
environment (as in SC dataset).

However, upon performing a generalization to the
SC problems dataset, the IBL model, relying on re-
cency, frequency, and blending mechanisms, showed
superior performance compared to other models em-
ploying mathematical functions (Ensemble or CPT) or
biased sampling techniques (BEAST). Also, the NMH
model, which incorporates frequency and magnitude of
experienced outcomes, also performed well to account
for individual decisions. One likely reason for this ob-
servation is the presence of cognitive constructs like
expectations, instances, activations, and blended val-
ues in the IBL model and the averaging mechanism in
the NMH model. These mechanisms help these models
account for individual experiences gained during sam-
pling of options. For example, the IBL model is moti-
vated from the ACT-R theory of cognition (Anderson
& Libere, 1998). The IBL model’s reliance on recency
and frequency of experiences during sampling (exhib-
ited through activations and blended values) helps this
model to make human-like choices. Similarly, the nat-
ural means in the NMH model are computed based
upon experienced outcomes during the sampling pro-
cess. These natural means represent expectations of
choosing different options and enables this model to
account for individual choices.

Next, we found that the IBL model performed con-
sistently well in both calibration and generalization
datasets standing among the top two models even
though it possessed only two parameters. One likely
reason for this observation could be that the IBL
model uses the blending mechanism, where for ev-
ery option, the values of all the observed outcomes
are weighted by their activation strengths. Blending
of experiences considers both the activation of out-
comes in memory as well as their magnitude. Per-
haps, the IBL model’s blending mechanism makes the
model blend outcomes correctly for both maximizing
and non-maximizing choices. Other factors affect-
ing performance of IBL model are its two parame-
ters d and σ. The calibrated value of d parameter
was higher for individual choices compared to its cal-
ibrated value for aggregate choices (the latter being
done by Lejarraga, Dutt, and Gonzalez, 2012). The
increased d value shows that individual choices rely

excessively on recency of outcomes. Furthermore, the
σ parameter helped the IBL model account for sample-
to-sample variability in instance activations. Here,
when the model parameters were calibrated to individ-
ual choices, the σ parameter’s value was much smaller
and closer to its ACT-R default compared to when
the same model was calibrated by Lejarraga, Dutt,
and Gonzalez (2012) to aggregate choices. The smaller
value of σ parameter closer to its ACT-R default show-
cases lesser variation in outcome activations among in-
dividual choices.

This research work builds upon literature in judg-
ment and decision making in several ways. First, the
BEAST and Ensemble models were hierarchical, where
these models possessed distribution parameters to ac-
count for individual choices. The parameters in these
models assumed different values from distribution for
different participants in the dataset. Thus, these dis-
tribution parameters should have helped these mod-
els to account for individual choices due to parameter
heterogeneity. However, in our results, the BEAST
and Ensemble models did not account for individual
choices as well as those models (like IBL or CPT) that
possessed single parameters. This finding likely shows
that it is more important for a model to possess the
right cognitive or mathematical mechanisms compared
to possessing heterogeneity among its parameters for
different participants.

Second, we performed generalizations to large
datasets that were similar or dissimilar to the cali-
bration dataset. An insight from this generalization
exercise it that the true picture emerges when the gen-
eralization dataset is different in its structure from the
calibration dataset. The SC dataset possessed prob-
lems where the problem structure was different from
that of the TPT datasets (both options could be risky
in SC dataset). Thus, it is recommended that general-
izations should be performed to datasets that possess
structural differences from the calibration datasets.

Third, we used individual-level techniques like like-
lihoods and incorrect proportions, where these tech-
niques enabled us to evaluate aggregate and hierar-
chical models at the individual participant level. In
summary, the likelihood approach is powerful and it
enables us to calibrate models at the individual level.
However, beyond calibration, one needs to test mod-
els based upon dependent measures that account for
model error at the individual level. This need is es-
pecially true for generalizations, where the calibration
measures like likelihood cannot be used as parameters
have already been fixed to their calibrated values.

In this paper, our focus was on investigating how
aggregate and hierarchical models with a set of single
or distribution parameters performed when their pa-
rameters were calibrated to individual choices rather
than aggregate choices. As part of our future research,
we plan to also perform individual modeling: calibrate
a set of model parameters to each individual decision
such that we get a set of parameters for each partici-
pant in the dataset. This evaluation will enable us to
test the tradeoffs between aggregate modeling, hierar-

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 15

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

chical modeling, and individual modeling when these
models are evaluated for explaining individual deci-
sions (as in this paper). Individual modeling may help
us to account for individual differences well; however,
these models also run the risk of overfitting individ-
ual decisions due to too many parameter values (one
for each individual participant). To provide a robust
comparison of this tradeoff, as part of our future re-
search, we plan to generalize individual models across
both similar and dissimilar datasets within the same
paradigm or across datasets in different paradigms.
Furthermore, as part of our future research, we plan

to extend our investigation to decision tasks where
decision-makers make decisions across multiple options
rather than make a binary choice. An example of this
task is the Iowa-Gambling Task (Bechara, Damasio,
Damasio, & Anderson, 1994), where the problem con-
sists of making a choice between four options. In this
paper, we took problem environments that were static
in terms of outcomes and probabilities. Thus, out-
comes and probabilities in a problem did not change
during sampling. In future, it would be worthwhile to
extend the evaluation of models in explain individual
choices in dynamic environments, where outcomes and
probabilities change during information search. Some
of these ideas form the immediate next steps that we
would like to undertake as part of our research.

Conclusion

This paper helped to bridge the gap in literature on
how aggregate and hierarchical models with a set of
parameter values (either single or distribution) would
perform when they are made to account for individ-
ual choices. We contributed to this investigation by
calibrating different models with a set of parame-
ter values to individual choices across three different
datasets. Models with constructs that abstract the
sampling process performed well when generalized to
problems that were similar to the calibration problems.
However, generalization to other problems that were
structurally different from the calibration problems re-
vealed that model mechanisms like differential valuing
of gains and losses, recency, frequency, blending and,
underweighting of rare outcomes were important to
account for individual choices. Also, models using dis-
tribution parameters with heuristic rules and biased
techniques did not perform well in accounting for in-
dividual choices when these models were generalized
to different problems.

Acknowledgements: This research was supported by
Indian Institute of Technology Mandi and Tata Consul-
tancy Services Research Scholar program.

Declaration of conflicting interests: The authors de-
clare that the research was conducted in the absence of
any commercial or financial relationships that could be
constructed as a potential conflict of interest.

Handling editor: Andreas Fischer

Author contributions: The authors contributed
equally to this work.

Supplementary material: Supplementary material
available online.

Copyright: This work is licensed under a Creative Com-
mons Attribution-NonCommercial-NoDerivatives 4.0 In-
ternational License.

Citation: Sharma, N., & Dutt, V. (2017). Model-
ing decisions from experience: How models with a set
of parameters for aggregate choices explain individual
choices. Journal of Dynamic Decision Making, 3, 3.
doi:10.11588/jddm.2017.1.37687

Received: 27 April 2017
Accepted: 10 September 2017
Published: 06 October 2017

References

Akaike, H. (1974). A new look at the statistical model identifica-
tion. IEEE Transactions on Automatic Control, 19(6), 716–723.
doi:10.1109/tac.1974.1100705

Anderson, J. R., & Lebiere, C. (1998). The atomic components of
thought. Hillsdale, NJ: Erlbaum.

Barron, G., & Erev, I. (2003). Small feedback-based deci-
sions and their limited correspondence to description-based de-
cisions. Journal of Behavioral Decision Making, 16(3), 215–233.
doi:10.1002/bdm.443

Bechara, A., Damasio, A. R., Damasio, H., & Anderson, S. W.
(1994). Insensitivity to future consequences following dam-
age to human prefrontal cortex. Cognition, 50(1–3), 7–15.
doi:10.1016/0010-0277(94)90018-3

Birnbaum, M. H. (2008). New paradoxes of risky decision mak-
ing. Psychological review, 115(2), 463. doi:10.1037/0033-
295X.115.2.463

Bishop, C. M. (2006). Pattern recognition and machine learning.
New York, NY: Springer.

Busemeyer, J. R., & Diederich, A. (2010). Cognitive modeling.
Thousand Oaks, CA: Sage.

Busemeyer J. R., Myung, I. J. (1992). An adaptive approach to
human decision making: Learning theory, decision theory, and hu-
man performance. Journal of Experimental Psychology: General,
121(2), 177–194. doi:10.1037/0096-3445.121.2.177

Busemeyer, J. R., & Stout, J. C. (2002). A contribution of cog-
nitive decision models to clinical assessment: decomposing per-
formance on the Bechara gambling task. Psychological Assess-
ment, 14(3), 253–262.doi:10.1037//1040-3590.14.3.253

Busemeyer, J. R., & Wang, Y. (2000). Model comparisons and
model selections based on the generalization criterion method-
ology. Journal of Mathematical Psychology, 44(1), 171–189.
doi:10.1006/jmps.1999.1282

Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning.
Oxford, England: Wiley & Sons.

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 16

http://dx.doi.org/10.11588/jddm.2017.1.37687
http://dx.doi.org/10.1109/tac.1974.1100705
http://dx.doi.org/10.1002/bdm.443
http://dx.doi.org/10.1016/0010-0277(94)90018-3
http://dx.doi.org/10.1037/0033-295X.115.2.463
http://dx.doi.org/10.1037/0033-295X.115.2.463
http://dx.doi.org/10.1037/0096-3445.121.2.177
https://doi.org/10.1037//1040-3590.14.3.253
http://dx.doi.org/10.1006/jmps.1999.1282
http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Brandstätter, E., Gigerenzer, G., & Hertwig, R. (2006). The pri-
ority heuristic: making choices without trade-offs. Psychological
Review, 113(2), 409–432. doi:10.1037/0033-295X.113.2.409

Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., &
Dolan, R. J. (2011). Model-based influences on humans’
choices and striatal prediction errors. Neuron, 69(6), 1204–1215.
doi:10.1016/j.neuron.2011.02.027

Denrell, J. (2007). Adaptive learning and risk taking. Psychological
Review, 114(1), 177–187. doi:10.1037/0033-295X.114.1.177

Dutt, V., & Gonzalez, C. (2012). The role of inertia in modeling
decisions from experience with instance-based learning. Frontiers
in Psychology, 3(177). doi:10.3389/fpsyg.2012.001777

Dutt, V. & Gonzalez, C. (2015). Accounting for out-
come and process measures and the effects of model cali-
bration. Journal of Dynamic Decision Making, 1(2),1–10.
doi:10.11588/jddm.2015.1.17663

Erev, I., & Barron, G. (2005). On adaptation, maximization, and
reinforcement learning among cognitive strategies. Psychological
Review, 112(4), 912–31. doi:10.1037/0033-295X.112.4.912

Erev, I., Ert, E., Plonsky, O., Cohen, D., & Cohen, O. (2015). From
anomalies to forecasts: A choice prediction competition for deci-
sions under risk and ambiguity. Mimeo, 1–56.

Erev, I., Ert, E., Roth, A. E., Haruvy, E., Herzog, S. M., & Hau,
R. (2010). A choice prediction competition: Choices from ex-
perience and from description. Journal of Behavioral Decision
Making, 23(1), 15–47. doi:10.1002/bdm.683

Erev, I., Glozman, I., & Hertwig, R. (2008). What impacts the
impact of rare events. Journal of Risk and Uncertainty, 36(2),
153–177. doi:10.1007/s11166-008-9035-z

Estes, W. K., & Todd Maddox, W. (2005). Risks of drawing in-
ferences about cognitive processes from model fits to individual
versus average performance. Psychonomic Bulletin & Review,
12(3), 403–408.doi:10.3758/bf03193784

Fox, C. R., & Tversky, A. (1998). A belief-based account of
decision under uncertainty. Management Science, 44(7), 879–
895.doi:10.1287/mnsc.44.7.879

Frey, R., Mata, R., & Hertwig, R. (2015). The role of cog-
nitive abilities in decisions from experience: Age differences
emerge as a function of choice set size. Cognition, 142, 60–80.
doi:10.1016/j.cognition.2015.05.004

Gallistel, C. R., Fairhurst, S., & Balsam, P. (2004). The learn-
ing curve: implications of a quantitative analysis. Proceedings
of the National Academy of Sciences of the United States of
America, 101(36), 13124–13131. doi:10.1073/pnas.0404965101

Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast
and frugal way: models of bounded rationality. Psychological Re-
view, 103(4), 650–669.doi:10.1037//0033-295x.103.4.650

Gilboa, I., & Schmeidler, D. (1989). Maxmin expected utility with
non-unique prior. Journal of Mathematical Economics, 18(2),
141–153. doi:10.1016/0304-4068(89)90018-9

Gonzalez, C., & Dutt, V. (2011). Instance-Based Learning: In-
tegrating Sampling and Repeated Decisions From Experience.
Psychological Review, 118(4), 523–551. doi:10.1037/a0024558

Gonzalez, C., & Dutt, V. (2012).Refuting data aggregation argu-
ments and how the instance-based learning model stands crit-
icism: A reply to Hills and Hertwig. Psychological Review,
119(4), 893–898.doi:10.1037/a0029445

Hau, R., Pleskac, T. J., Kiefer, J., & Hertwig, R. (2008). The
description-experience gap in risky choice: The role of sample
size and experienced probabilities. Journal of Behavioral Decision
Making, 21(5), 493–518. doi:10.3758/s13423-015-0924-2

Hertwig, R. (2012). The psychology and rationality of
decisions from experience. Synthese, 187(1), 269–292.
doi:10.1007/s11229-011-0024-4

Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (2004). Deci-
sions from experience and the effect of rare events in risky choice.
Psychological Science, 15(8), 534–539. doi:10.1111/j.0956-
7976.2004.00715.x

Hertwig, R., & Erev, I. (2009). The description-experience gap
in risky choice. Trends in Cognitive Sciences, 13(12), 517–523.
doi:10.1016/j.tics.2009.09.004

Hertwig, R., & Pleskac, T. J. (2010). Decisions from expe-
rience: Why small samples? Cognition, 115(2), 225–237.
doi:10.1016/j.cognition.2009.12.009

Horrace, R. H., William, C., and Jeffrey, M. P. (2009), Variety:
Consumer choice and optimal diversity. Food Marketing Policy,
Center Research Report, 115.

Houck, C. R., Joines, J., & Kay, M. G. (1995). A genetic algo-
rithm for function optimization: a Matlab implementation. North
Carolina State University, Technical Report NCSU-IE TR 95-09.

Jakobsen, T. (2010). Genetic algorithms. Retrieved from
http://subsimple.com/genealgo.asp

Kahneman, D., & Tversky, A. (1979). Prospect theory: An
analysis of decision under risk. Econometrica, 47(2), 263–
291.doi:10.2307/1914185

Kudryavtsev, A., & Pavlodsky, J. (2012). Description-based and
experience-based decisions: individual analysis. Judgment and
Decision Making, 7(3), 316–331.

Lebiere, C. (1999). Blending: An ACT-R mechanism for Aggregate
retrievals. Paper presented at the 6th Annual ACT-R Workshop
at George Mason University. Fairfax County, VA.

Lee, M. D. (2008). Three case studies in the Bayesian analysis of
cognitive models. Psychonomic Bulletin & Review, 15(1), 1–15.
doi:10.3758/PBR.15.1.1

Lejarraga, T. & Dutt, V. & Gonzalez, C. (2012). Instance-
Based Learning: A general model of repeated binary
choice. Journal of Behavioral Decision Making, 25(2),143–153.
doi:10.1002/bdm.722

Luce, R. D., & Raiffa, H. (1957). Games and decisions: Introduc-
tion and critical surveys. New York: Wiley.

March, J. G. (1996). Learning to be risk averse. Psychological
Review, 103(2), 309–319. doi:10.1037/0033-295X.103.2.309

Marchiori, D., Di Guida, S., & Erev, I. (2015). Noisy retrieval mod-
els of over-and under sensitivity to rare events. Decision, 2(2),
82–106. doi:10.1037/dec0000023

Mathworks (2012). MATLAB and Statistics Toolbox Release
2012b [Computer software]. Natick, Massachusetts, United
States: The MathWorks, Inc.

Plonsky, O., Teodorescu, K., & Erev, I. (2015). Re-
liance on small samples, the wavy recency effect, and
similarity-based learning. Psychological Review, 122(4), 621–
647. doi:10.1037/a0039413

Rieskamp, J. (2008). The probabilistic nature of preferential
choice. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 34(6), 1446–1465. doi:10.1037/a0013646

Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian
hierarchical models with an application in the theory of signal
detection. Psychonomic Bulletin & Review, 12(4), 573–604.
doi:10.3758/BF03196750

Shteingart, H., Neiman, T., & Loewenstein, Y. (2013). The role
of first impression in operant learning. Journal of Experimental
Psychology: General, 142(2), 476–488. doi:10.1037/a0029550

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 17

http://dx.doi.org/10.1037/0033-295X.113.2.409
http://dx.doi.org/10.1016/j.neuron.2011.02.027
http://dx.doi.org/10.1037/0033-295X.114.1.177
http://dx.doi.org/10.3389/fpsyg.2012.001777
http://dx.doi.org/10.11588/jddm.2015.1.17663 
http://dx.doi.org/10.1037/0033-295X.112.4.912
http://dx.doi.org/10.1002/bdm.683
http://dx.doi.org/10.1007/s11166-008-9035-z
 https://doi.org/10.3758/bf03193784
https://doi.org/10.1287/mnsc.44.7.879
http://dx.doi.org/10.1016/j.cognition.2015.05.004
http://dx.doi.org/10.1073/pnas.0404965101
https://doi.org/10.1037//0033-295x.103.4.650
http://dx.doi.org/10.1016/0304-4068(89)90018-9
http://dx.doi.org/10.1037/a0024558
https://doi.org/10.1037/a0029445
http://dx.doi.org/10.3758/s13423-015-0924-2
http://dx.doi.org/10.1007/s11229-011-0024-4
http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x
http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x
http://dx.doi.org/10.1016/j.tics.2009.09.004
http://dx.doi.org/10.1016/j.cognition.2009.12.009
http://subsimple.com/genealgo.asp
https://doi.org/10.2307/1914185
https://doi.org/10.3758/PBR.15.1.1
http://dx.doi.org/10.1002/bdm.722
http://dx.doi.org/10.1037/0033-295X.103.2.309
http://dx.doi.org/10.1037/dec0000023
http://dx.doi.org/10.1037/a0039413
http://dx.doi.org/10.1037/a0013646
http://dx.doi.org/10.3758/BF03196750
http://dx.doi.org/10.1037/a0029550
http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Stevens, L. (2016, June 8). Survey shows rapid growth
in online shopping. The Wall Street Journal. Retrieved
from https://www.wsj.com/articles/survey-shows-rapid-growth
-in-online-shopping-1465358582.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An
introduction (Vol. 1, No. 1). Cambridge: MIT Press.

Tversky, A., & Kahneman, D. (1986). Rational choice and the
framing of decisions. Journal of business, 59(S4), S251–S278.
doi:10.1086/296365

Tversky, A., & Kahneman, D. (1992). Advances in prospect theory:
Cumulative representation of uncertainty. Journal of Risk and
Uncertainty, 5(4), 297–323.doi:10.1007/bf00122574

Tversky, A., & Fox, C. R. (1995). Weighing risk and uncertainty.
Psychological Review, 102(2), 269–283. doi:10.1037/0033-
295X.102.2.269

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 18

https://www.wsj.com/articles/survey-shows-rapid-growth-in-online-shopping-1465358582
https://www.wsj.com/articles/survey-shows-rapid-growth-in-online-shopping-1465358582
https://doi.org/10.1086/296365 
https://doi.org/10.1007/bf00122574
http://dx.doi.org/10.1037/0033-295X.102.2.269
http://dx.doi.org/10.1037/0033-295X.102.2.269
http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Appendix

Appendix A: Estimation Set (TPT)

Problem Set High P(High) Low Medium

1 Est -0.3 0.96 -2.1 -0.3
2 Est -0.9 0.95 -4.2 -1.0
3 Est -6.3 0.3 -15.2 -12.2
4 Est -10 0.2 -29.2 -25.6
5 Est -1.7 0.9 -3.9 -1.9
6 Est -6.3 0.99 -15.7 -6.4
7 Est -5.6 0.7 -20.2 -11.7
8 Est -0.7 0.1 -6.5 -6.0
9 Est -5.7 0.95 -16.3 -6.1
10 Est -1.5 0.92 -6.4 -1.8
11 Est -1.2 0.02 -12.3 -12.1
12 Est -5.4 0.94 -16.8 -6.4
13 Est -2.0 0.05 -10.4 -9.4
14 Est -8.8 0.6 -19.5 -15.5
15 Est -8.9 0.08 -26.3 -25.4
16 Est -7.1 0.07 -19.6 -18.7
17 Est -9.7 0.1 -24.7 -23.8
18 Est -4.0 0.2 -9.3 -8.1
19 Est -6.5 0.9 -17.5 -8.4
20 Est -4.3 0.6 -16.1 -4.5
21 Est 2.0 0.1 -5.7 -4.6
22 Est 9.6 0.91 -6.4 8.7
23 Est 7.3 0.8 -3.6 5.6
24 Est 9.2 0.05 -9.5 -7.5
25 Est 7.4 0.02 -6.6 -6.4
26 Est 6.4 0.05 -5.3 -4.9
27 Est 1.6 0.93 -8.3 1.2
28 Est 5.9 0.8 -0.8 4.6
29 Est 7.9 0.92 -2.3 7.0
30 Est 3.0 0.91 -7.7 1.4
31 Est 6.7 0.95 -1.8 6.4
32 Est 6.7 0.93 -5.0 5.6
33 Est 7.3 0.96 -8.5 6.8
34 Est 1.3 0.05 -4.3 -4.1
35 Est 3.0 0.93 -7.2 2.2
36 Est 5.0 0.08 -9.1 -7.9
37 Est 2.1 0.8 -8.4 1.3
38 Est 6.7 0.07 -6.2 -5.1
39 Est 7.4 0.3 -8.2 -6.9
40 Est 6.0 0.98 -1.3 5.9
41 Est 18.8 0.8 7.6 15.5
42 Est 17.9 0.92 7.2 17.1
43 Est 22.9 0.06 9.6 9.2
44 Est 10.0 0.96 1.7 9.9
45 Est 2.8 0.8 1.0 2.2
46 Est 17.1 0.1 6.9 8.0
47 Est 24.3 0.04 9.7 10.6
48 Est 18.2 0.98 6.9 18.1
49 Est 13.4 0.5 3.8 9.9
50 Est 5.8 0.04 2.7 2.8
51 Est 13.1 0.94 3.8 12.8
52 Est 3.5 0.09 0.1 0.5
53 Est 25.7 0.1 8.1 11.5
54 Est 16.5 0.01 6.9 7.0
55 Est 11.4 0.97 1.9 11.0
56 Est 26.5 0.94 8.3 25.2
57 Est 11.5 0.6 3.7 7.9
58 Est 20.8 0.99 8.9 20.7
59 Est 10.1 0.3 4.2 6.0
60 Est 8.0 0.92 0.8 7.7

Appendix B: Competition Set (TPT)

Problem Set High P(High) Low Medium

1 Comp -8.7 0.06 -22.8 -21.4
2 Comp -2.2 0.09 -9.6 -8.7
3 Comp -2.0 0.1 -11.2 -9.5
4 Comp -1.4 0.02 -9.1 -9.0
5 Comp -0.9 0.07 -4.8 -4.7
6 Comp -4.7 0.91 -18.1 -6.8
7 Comp -9.7 0.06 -24.8 -24.2
8 Comp -5.7 0.96 -20.6 -6.4
9 Comp -5.6 0.1 -19.4 -18.1
10 Comp -2.5 0.6 -5.5 -3.6
11 Comp -5.8 0.97 -16.4 -6.6
12 Comp -7.2 0.05 -16.1 -15.6
13 Comp -1.8 0.93 -6.7 -2.0
14 Comp -6.4 0.2 -22.4 -18.0
15 Comp -3.3 0.97 -10.5 -3.2
16 Comp -9.5 0.1 -24.5 -23.5
17 Comp -2.2 0.92 -11.5 -3.4
18 Comp -1.4 0.93 -4.7 -1.7
19 Comp -8.6 0.1 -26.5 -26.3
20 Comp -6.9 0.06 -20.5 -20.3
21 Comp 1.8 0.6 -4.1 1.7
22 Comp 9.0 0.97 -6.7 9.1
23 Comp 5.5 0.06 -3.4 -2.6
24 Comp 1.0 0.93 -7.1 0.6
25 Comp 3.0 0.2 -1.3 -0.1
26 Comp 8.9 0.1 -1.4 -0.9
27 Comp 9.4 0.95 -6.3 8.5
28 Comp 3.3 0.91 -3.5 2.7
29 Comp 5.0 0.4 -6.9 -3.8
30 Comp 2.1 0.06 -9.4 -8.4
31 Comp 0.9 0.2 -5.0 -5.3
32 Comp 9.9 0.05 -8.7 -7.6
33 Comp 7.7 0.02 -3.1 -3.0
34 Comp 2.5 0.96 -2.0 2.3
35 Comp 9.2 0.91 -0.7 8.2
36 Comp 2.9 0.98 -9.4 2.9
37 Comp 2.9 0.05 -6.5 -5.7
38 Comp 7.8 0.99 -9.3 7.6
39 Comp 6.5 0.8 -4.8 6.2
40 Comp 5.0 0.9 -3.8 4.1
41 Comp 20.1 0.95 6.5 19.6
42 Comp 5.2 0.5 1.4 5.1
43 Comp 12.0 0.5 2.4 9.0
44 Comp 20.7 0.9 9.1 19.8
45 Comp 8.4 0.07 1.2 1.6
46 Comp 22.6 0.4 7.2 12.4
47 Comp 23.4 0.93 7.6 22.1
48 Comp 17.2 0.09 5.0 5.9
49 Comp 18.9 0.9 6.7 17.7
50 Comp 12.8 0.04 4.7 4.9
51 Comp 19.1 0.03 4.8 5.2
52 Comp 12.3 0.91 1.3 12.1
53 Comp 6.8 0.9 3.0 6.7
54 Comp 22.6 0.3 9.2 11.0
55 Comp 6.4 0.09 0.5 1.5
56 Comp 15.3 0.06 5.9 7.1
57 Comp 5.3 0.9 1.5 4.7
58 Comp 21.9 0.5 8.1 12.6
59 Comp 27.5 0.7 9.2 21.9
60 Comp 4.4 0.2 0.7 1.1

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 19

http://dx.doi.org/10.11588/jddm.2017.1.37687


Sharma & Dutt: How Aggregate and Hierarchical Models Explain Individual Choices

Appendix C: SC Problems Set

Problem Set High P(High) Low Medium

1 SC Problems 4 0.8 0 3
2 SC Problems 4 0.2 0 3
3 SC Problems -3 1 0 -32
4 SC Problems -3 1 0 -4
5 SC Problems 32 0.1 0 3
6 SC Problems 32 0.025 0 3

Appendix D: CPT Models’ Value and Weighting Functions

10.11588/jddm.2017.1.37687 JDDM | 2017 | Volume 3 | Article 3 | 20

http://dx.doi.org/10.11588/jddm.2017.1.37687