Meta-Psychology, 2021, vol 5, MP.2020.2474, 
https://doi.org/10.15626/MP.2020.2474 
Article type: Replication Report 
Published under the CC-BY4.0 license 

 

Open data: Yes 
Open materials: Yes 
Open and reproducible analysis: Yes 
Open reviews and editorial process: Yes 
Preregistration: Yes 

 

Edited by:  Rickard Carlsson 
Reviewed by: Streamlined peer review 
Analysis reproduced by: Alexey Guzey 
All supplementary files can be accessed at the OSF project 
page: https://doi.org/10.17605/OSF.IO/DYNVT 
 

 
 

Frequency estimation and semantic ambiguity do not elimi-
nate conjunction bias, when it occurs: Replication and exten-

sion of Mellers, Hertwig, and Kahneman (2001) 

Subramanya Prasad Chandrashekar1 
Lee Shau Kee School of Business and Administra-
tion, Hong Kong Metropolitan University, Hong 

Kong SAR 

 
Bo Ley Cheng 

Department of Psychology, University of Hong 
Kong, Hong Kong SAR 

 

 

 Yat Hin Cheng1, Chi Long Fong1, Ying 
Chit Leung1, Yui Tung Wong1 

Department of Psychology, University of Hong 
Kong, Hong Kong SAR 

 
 

Gilad Feldman2  
Department of Psychology, University of Hong 

Kong, Hong Kong SAR

Mellers, Hertwig, and Kahneman (2001) conducted an adversarial collaboration to try and 
resolve Hertwig’s contested view that frequency formats eliminate conjunction effects, 
and that conjunction effects are largely due to semantic ambiguity. We conducted a pre-
registered well-powered very close replication (N = 1032), testing two personality pro-
files (Linda and James) in a four conditions between-subject design comparing unlikely 
and likely items to "and" and "and are" conjunctions. Linda profile findings were in sup-
port of conjunction effect and consistent with Tversky and Kahneman’s (1983) arguments 
for a representative heuristic. We found no support for semantic ambiguity. Findings for 
James profile were a likely failed replication, with no conjunction effect. We provided 
additional tests addressing possible reasons, in line with later literature suggesting con-
junction effects may be context-sensitive. We discuss implications for research on con-
junction effect, and call for further well-powered pre-registered replications and exten-
sions of classic findings in judgment and decision-making. 

Keywords: conjunction effect, frequency estimation, replication, Linda problem, judg-
ment and decision making 

 

The conjunction fallacy is one of the most well-
known judgment errors in the judgment and deci-
sion making (JDM) literature. The fallacy consists of 
judging the conjunction of two events as more likely 
the any of the two specific events, violating one of 
the most fundamental tenets of probability theory 

 
1 Joint first authors 
2 Corresponding author 
 

that postulates that probability of a conjunction of 
two events can never be higher than the probability 
any of the two individual events.  

Kahneman and colleagues initially reported the 
conjunction effect as a bias, and that resulted in an 
intense debate in the academic community (e.g., 



2 

CHANDRASHEKAR ET AL.2021 

 

Fiedler, 1988; Gigerenzer, 1996, 2005; Hertwig & 
Chase, 1998; Hertwig & Gigerenzer, 1999). One view 
opposing conjunction effect as a bias was by 
Hertwig and colleagues that argued that conjunc-
tion effect is not at all a fallacy, demonstrating that 
the effect arises out of semantic ambiguity, in that 
participants’ understanding of natural language 
words such as “probability” and “and” diverged from 
that of experimenters (e.g., Hertwig & Gigerenzer, 
1999). Daniel Kahneman and Ralph Hertwig engaged 
in an adversarial collaboration to which Barbara 
Mellers served as an arbiter. They all then jointly ex-
amined the potential semantic ambiguity of “and” 
conjunction to try and explain the conjunction ef-
fect reported in the Kahneman and Tversky’s study 
(1996). The article has been influential with over 430 
citations according to Google Scholar at the time of 
writing. 

Chosen study for replication: Outline of Mellers et 
al (2001)  

Mellers et al. (2001) conducted examined fre-
quency estimates of personality sketches. They 
tested two personality sketches in three experi-
ments, one about Linda and the other about James. 
For example, the Linda story read as: 

 
Linda is 31 years old, single, outspoken, and very 
bright. She majored in philosophy. As a student, 
she was deeply concerned with issues of discrim-
ination and social justice, and also participated in 
anti-nuclear demonstrations. 
 
Participants read the scenario and estimated 

how many of a 100 people like Linda fit a particular 
target description. The target descriptions varied 
between experimental conditions: likely (feminists), 
unlikely (bank tellers), semantic “and” (bank tellers 
and feminists), and semantic “and are”’ (bank tellers 
and are feminists). Kahneman argued that the con-
junction effect would occur despite frequency esti-
mation was used, reflected from the average fre-
quency estimates of the conjunction conditions 
“and” and “and are” higher than the unlikely item 
condition. Hertwig proposed that conjunction 
phrase “bank teller and are feminists” would not 

yield support for conjunction effects. The results for 
the Linda scenario supported Kahneman’s predic-
tion across two out of three experiments conducted 
as part of the adversarial collaboration, whereas, 
with the James scenario just one experiment sup-
ported the prediction. 

We summarized findings in the original article in 
Table 1. The divergence of findings reported across 
the three experiments made it hard for readers to 
assess the overall effect size, and we, therefore, con-
ducted a mini meta-analysis summary of their ef-
fects across experiments, summarized in Table 2. 

The need for replication 

Since the first demonstration of the conjunction 
effect, there have been attempts to develop a theory 
to explain the phenomenon. Semantic ambiguity re-
mains the strongest counterargument to the 
demonstration of conjunction effects. With the re-
cent growing recognition of the importance of re-
producibility and replicability in psychological sci-
ence (e.g., Brandt et al., 2014; Open Science collabo-
ration, 2015; van‘t Veer & Giner-Sorolla, 2016; 
Zwaan, Etz, Lucas, & Donnellan, 2018), we felt it was 
important to establish the replicability of the find-
ings noted in the Mellers et al. (2001). 

We, therefore, embarked on a well-powered pre-
registered very close replication of Mellers et al. 
(2001) employing the most current psychological 
science methods, which would allow to test for both 
the presence and possible absence of an effect.  

Present investigation 

We had several goals. First, we set out to revisit 
the original experimental design and assess the rep-
licability of the original findings. With power anal-
yses and higher power, we aimed at detecting weak 
effects that may not have been possible in the orig-
inal study. Secondly, we complemented the tradi-
tional analyses in the original article with equiva-
lence tests and Bayesian analyses to also allow for 
quantifying evidence in support of the null hypoth-
esis. Third, we added extensions to examine further 
lay perceptions of provided statistical information 
that may explain some of the differences found in 
the original findings. 



3 

CHANDRASHEKAR ET AL.2021 

 

Table 1 

 Summary of findings in Mellers et al. (2001) Experiments 1 to 3 and the replication 

Note. Exp1/Exp2/Exp3 = Experiment 1, 2, and 3. Standard errors are in the parentheses. Boldface indicates significant 
results, p <.05. 
 
 
Table 2 
 
Summary of findings of the original study versus replication 

Note. Linda story can be concluded as a successful replication. James replication is a likely failed replication. In addition, 
there was no support found for semantic ambiguity (comparing "and" and "and are"). In the original article, effect sizes 
(ES) were not reported; we computed Cohen’s d and confidence intervals based on the mean estimates and standard 
errors of the mean estimates of the outcome variables of the original study (see full tables in supplementary). The effect 
sizes of the original study presented in the table are based on the mini-meta-analysis of Experiment 1, 2, and 3 of Mellers 
et al. (2001), as the study is closest for direct comparison for replication summary.  The replication summary directly based 
on LeBel et al., (2019) category, see details in "evaluation criteria for replication design and findings".  

 

  Linda story James story  

 Target Exp1 Exp2 Exp3 Replication Target Exp1 Exp2 Exp3 Replication 

Likely  
target 

Feminists 58.1 
(2.4) 

47.7 
(3.4) 

47.9 
(4.5) 

58.43 (1.79) Artists 41.0 
(2.7) 

45.1 
(2.6) 

47.1 
(3.3) 

36.2 (1.62) 

Unlikely 
target 

Bank  
tellers 

24.6 
(1.9) 

21.4 
(2.0) 

14.3 
(2.9) 

9.87 (0.88) Republi-
cans 

28.9 
(2.1) 

19.8 
(1.8) 

12.7 
(2.6) 

18.38 (1.18) 

“and” “and” 39.9 
(2.0) 

30.4 
(2.3) 

26.4 
(3.9) 

18.8 (1.36) “and” 33.1 
(1.8) 

42.7 
(2.4) 

22.9 
(3.4) 

15.19 (1.15) 

“and are” “and are” 40.2 
(2.7) 

21.8 
(2.1) 

22.8 
(2.7) 

19.55 
(1.48) 

“and are” 32.0 
(2.5) 

20.0 
(1.9) 

21.4 
(2.7) 

15.55 (1.09) 

 Original results Replication  

Comparison Cohen's d with 95% CI  T-statistic (one-sided) 
Cohen's d with 

95% CI Replication summary 

Linda Story     

“and” and Unlikely target  0.59 [0.36, 0.82] t(431.26) = 5.51,  p < .001 0.49 [0.31, 0.67] Signal - consistent 
“and are” and  
Unlikely target  

0.38 [-0.02, 0.77] t(419.21) = 5.63,  
p < .001 0.50 [0.32, 0.67] Signal - consistent 

“and” and "and are" 0.18 [-0.09, 0.45] t(505.55) = −0.37,  p = .646 -0.03 [-0.21, 0.14] 
No signal-inconsistent 
(opposite) 

James Story     

“and” and Unlikely target  0.62 [0.08, 1.15] t(507.82) = −1.93,  p = .973 -0.17 [-0.35, 0.00] 
Signal-inconsistent 
(opposite) 

“and are” and  
Unlikely target  0.17 [-0.07, 0.41] 

t(510.69) = −1.76,  
p = .960 -0.15 [-0.33, 0.02] 

No signal-inconsistent 
(opposite) 

“and” and "and are" 0.41 [-0.26, 1.08] t(506.05) = -0.23,  p = .591  -0.02 [-0.19, 0.15] 
No signal-inconsistent 
(opposite) 



4 

CHANDRASHEKAR ET AL.2021 

 

Context: Large replication effort of judgement and 
decision-making findings 

The current replication was part of a large-scale 
pre-registered replication project aiming to revisit 
well-known research findings in the area of judg-
ment and decision making (JDM) and to examine the 
reproducibility and replicability of these findings. In 
this project, all replications are conducted by stu-
dents in undergraduate courses and undergraduate 
and masters guided thesis at the University of Hong 
Kong psychology department. Four students in two 
separate courses were randomly assigned to the 
current replication. Working independently, the 
students conducted an in-depth analysis of the tar-
get article, wrote pre-registrations with power-
analyses, conducted data analysis on the collected 
data, and then wrote manuscripts for journal sub-
mission. In each student pair, students conducted 
peer review on one another to optimize design and 
analysis. A teaching assistant (6th author) and the 
corresponding author supervised and gave feedback 
in each step of the replication process. The corre-
sponding author conducted all pre-registrations on 
the OSF and online data collection. More infor-
mation on the process is provided in the supple-
mentary, and further details and updates on this 
project can be found on: https://osf.io/5z4a8/ 
(CORE, 2020).   

Method 

Pre-registration, power analysis, and open-science 

We pre-registered the experiment on the Open 
Science Framework (OSF), and data collection was 
launched later that week. Pre-registration with 
power analyses and all materials used in the study 
are available in the supplementary materials. All 
measures, manipulations, and exclusions are re-
ported, and data collection was completed before 
analyses. OSF pre-registration review link for the 
study: https://osf.io/gb7pk. Data and R/RMark-
down code (R Core Team, 2015) is available on the 
OSF: https://osf.io/6v8e2/. Full open-science de-
tails and disclosures are provided in the supplemen-
tary. Please note the pre-registration crowdsourc-
ing process involved four students who worked in-
dependently to analyze the original article, docu-
ment hypotheses and tests in the original study, 
propose analyses for testing predictions, calculate 

original effects, conduct a power-analysis, and pro-
pose extensions. We note the differences and simi-
larities across four pre-registration documents in 
the supplementary materials (for details see Table 
S12-S14), and we followed the combination of all of 
those in our analyses. 

We aimed to detect smallest the effect size of d = 
0.20 at a power of 0.80 one-tail comparing two con-
ditions, despite the reported effects in the target ar-
ticle and original findings being much higher. This 
was meant to allow us the possibility of detecting ef-
fects not found in the target article for one of the 
two scenarios (details below).  

Participants 

A total of 1032 participants were recruited online 
through American Amazon Mechanical Turk 
(MTurk) using the TurkPrime.com platform (Litman, 
Robinson, & Abberbock, 2017) (Mage = 38.77, SDage = 
12.07; 550 females). We identified four responses to 
be excluded based on the exclusion criteria we rec-
orded in the pre-registration due to their self-re-
ported lack of seriousness or English proficiency, 
yet exclusions had no impact on the findings and so 
our main report focuses on the full sample. 

Procedure 

Participants were randomly assigned to one of 
the four experimental conditions (likely, unlikely, 
"and", and "and are"). All participants read two per-
sonality profiles, one of Linda and the other of 
James, exactly as in the original study. Each profile 
consisted of one short description of a character, 
and frequency estimation questions.  

All descriptions and questions were taken from 
the original article (Mellers et al., 2001). The presen-
tation order of the two profiles was randomized.  
Linda profile description was as follows:  

 
Linda is 31 years old, single, outspoken, and very 
bright. She majored in philosophy. As a student, 
she was deeply concerned with issues of discrim-
ination and social justice, and also participated in 
anti-nuclear demonstrations. 
 
Of 100 people like Linda, how many are [likely: 
feminists?] [unlikely: bank tellers?] ["and": bank 
tellers and feminists?] [“and are”: bank tellers and 
are feminists?] 
 
 

 



5 

CHANDRASHEKAR ET AL.2021 

James profile description was as follows: 

James grew up in a Bohemian family. His father 
was a musician, and his mother was a painter. 
They lived together for 40 years and never got 
married. James was a very talented child with a 
special gift for comedy, but he turned into a re-
bellious troublemaker in his youth. He dropped 
out of college after two years and traveled to Asia 
to learn crafts. James is now 35 years old.  
 
Of 100 people like James, how many are [likely: 
artists?] [unlikely: Republicans?] [“and”: Republi-
cans and artists?] [“and are” Republicans and are 
artists?] 
 
Participants answered questions based on two 

scenarios, one for Linda and one for James, accord-
ing to their randomly assigned condition (indicated 
in brackets in the scenarios above). The dependent 
variable was the estimated frequency of the de-
scribed personality in the scenario measured on a 
scale from 1 to 100. The supplementary details the 
experimental instructions, scenarios, and response 
variables. 

Extension 

Following the replication materials, participants 
proceeded to the next page and answered six addi-
tional questions. Depending on their assigned con-
dition participants were asked to estimate the per-
centage of people, females, and males in the United 
States that match the target item (likely, unlikely, 
"and", "and are"), and they did so for both profiles. 
For example, participants in the likely condition es-
timated the percentage of people, females, and 
males in the United States that are 1) feminists, 2) 
artists.  

We had several aims with this extension: 1) assess 
whether the conjunction effect would show for the 
generalized population without the specific descrip-
tions of James and Linda, and 2) examine possible 
gender differences in the estimations of the items 
used in the James and Linda descriptions.  

Data analysis plan 

Our analyses matched the original article's hy-
potheses, as follows:  

 
Hypothesis 1: The frequency estimate for the 
“and” conjunction phrase will be higher than the 
phrase describing unlikely target alone. 

 
Two sets of competing hypotheses suggested by 

Hertwig and Kahneman: 
 
Hypothesis 2a: The frequency estimate for the 
“and are” conjunction phrase will be higher than 
the phrase describing unlikely target alone. 
 
Hypothesis 2b: The frequency estimate for the 
“and are” conjunction phrase will not be higher 
than the phrase describing unlikely target alone. 
 
Hypothesis 3a: The frequency estimate for the 
“and are” conjunction phrase will be lower than 
the frequency estimate for ‘and” conjunction 
phrase. 
 
Hypothesis 3b: The frequency estimate for the 
“and are” conjunction phrase will not be lower 
than the frequency estimate for ‘and” conjunc-
tion phrase. 
 
A comparison of the three experiments in the 

original article and the current replication is pro-
vided in Table S4 of the Supplementary Materials. In 
Table S5, we briefly note the reasons for the chosen 
differences between original studies and the repli-
cation attempt. In the replication attempt, we did 
not include filler items, because when filler items are 
present, the responses are inherently comparative 
and therefore drive the conjunction effect observed 
(Hertwig & Chase, 1998). Supporting this view, the 
results of both Study 1 and Study 3 of the original 
study that included filler items found support for 
conjunction effect—for both “and” and “and are” 
conjunction phrases. Given the possibility of differ-
ent psychological processes between comparative 
and non-comparative responses, we excluded filler 
items, that allow for the test of competing predic-
tions from Kahneman and Hertwig theorized to be 
essentially non-comparative in nature. More im-
portantly, with the current focus on testing the main 
argument if the conjunction effects are driven by se-
mantic ambiguity of natural language term “and” in 
a frequency representation. 

We chose to focus on “and” and “and are” as the 
conjunction phrases and implement a between-sub-
jects design which would allow for a clearer test of 
the competing predictions between Kahneman and 
Hertwig. For instance, Hertwig argued that the fre-
quency judgments are possibly driven by the under-
standing that “and” is a union operator, and the use 
of a more restrictive “and are” phrase would take 



6 

CHANDRASHEKAR ET AL.2021 

 

away the conjunction effect. Kahneman argued that 
judgments were driven by a match between a per-
sonality description and porotype of a category; 
therefore, both “and” and “and are” phrases would 
likely yield conjunction effects.  

Following the analyses in the target original, we 
first conducted Welch (based on recommendations 
of Delacre, Lakens, & Leys, 2017) one-tail independ-
ent samples t-test, a null-hypothesis significance 
testing (NHST) method. When NHST analyses were 
non-significant, we complement NHST analyses 
with equivalence testing to compare effects against 
a minimal effects considered meaningful (TOSTER 
package; Lakens, 2017; Lakens, Scheel, & Isager, 
2018) and Bayesian analyses to quantify support for 
the null hypothesis given a prior (Kruschke & Liddell, 
2018; Vandekerckhove, Rouder, & Kruschke, 2018) 
using BayesFactor R package (Version 0.9.12-
4.2; Morey & Rouder, 2015). These were minor ad-
justments we made to the pre-registration data 
analysis plan, summarized in Table S6. 

Evaluation criteria for replication design and find-
ings 

Table S7 provides a classification of the replica-
tions using the criteria by LeBel, McCarthy, Earp, El-
son, and Vanpaemel (2018) criteria (see Figure S2). 
We summarize the current replication as a "very 
close replication". 

To interpret the replication results we followed 
the framework by LeBel, Vanpaemel, Cheung, and 
Campbell (2019). They suggested a replication eval-
uation using three factors: (a) whether a signal was 
detected (i.e., confidence interval for the replication 
Effect size (ES) excludes zero), (b) consistency of the 
replication ES with the original study’s ES, and (c) 
precision of the replication’s ES estimate (see Figure 
S1). 

Results 

Descriptive statistics are detailed in Table 1 and 
statistical tests and effect-size findings are summa-
rized in Table 2. 

Conjunction effects 

We first looked for the conjunction effect for 
each profile, by comparing frequency estimates for 
both “and” and “and are” conditions with the "un-
likely" condition. Considering the Linda scenario, 

“and” condition (n = 252, M = 18.80, SD = 21.62) were 
greater than for the “unlikely” condition (n = 258, M 
= 9.87, SD = 14.1; Md = 8.93, t(431.26) = 5.51, p < .001, 
ds = 0.49, 95% CI [0.31, 0.67]; see Figure 1). Similarly, 
frequency estimates of “and are” condition (n =258, 
M = 19.55, SD = 23.74) were greater than "unlikely" 
condition (n = 258, M = 9.87, SD = 14.15; Md = 9.69, 
t(419.21) = 5.63, p < .001, ds = 0.50, 95% CI [0.32, 
0.67]). Thus, results lend support toward H1 and H2a 
in the Linda scenario. 

However, differences across conditions for the 
James scenario (see summary plot in Figure 1; “and” 
condition: n = 252, M = 15.19, SD = 18.24; “unlikely” 
condition: n = 258, M = 18.38, SD = 19.03; “and are” 
condition: n = 258, M = 15.55, SD = 17.55). The "and" 
versus "unlikely" contrast (Md = −3.19, t (507.82) = 
−1.93, p = .973; ds = -0.17, 95% CI [-0.35, 0.00]) show 
that frequency estimates for “and” condition were 
lower than “unlikely” condition, although the differ-
ence was not statistically significant. Therefore, the 
results of the James scenario failed to support H1. 
Similarly, the contrast between “unlikely” and "and 
are" conditions (Md = −2.83, t(510.69) = −1.76, p 
= .960; ds = -0.15, 95% CI [-0.33, 0.02]) show that fre-
quency estimates for “and are” condition were lower 
than “unlikely” condition, though with a weak effect 
not statistically significant. In essence, the results 
support H2b. 

Semantic ambiguity? 

To examine whether the semantically ambiguous 
word “and” had an effect on participants’ judgment, 
we conducted a one-tail Welch t-test comparing 
frequency estimates of “and” and “and are” condi-
tions for each of the personality scenarios. As pre-
dicted by H3a, we found no support for differences 
for the Linda profile (Md = −0.75, t(505.55) = −0.37, p 
= .646, ds = -0.03, 95% CI [-0.21, 0.14]) or for the 
James profile (Md = -0.36, t(506.05) = -0.23, p = .591, 
ds = -0.02, 95% CI [-0.19, 0.15]). 

Next, we conducted an equivalence test of the 
semantic ambiguity effect. Based on Simonsohn’s 
(2015) recommendation for replication studies we 
calculated the smallest effect size of interest (SESOI) 
that Mellers et al.’s experiment could have detected 
with a power of 33%. We choose Experiment 2 of as 
a reference for equivalence test analysis based on 
one important similarity between the Experiment 2  
 

 



7 

CHANDRASHEKAR ET AL.2021 

Figure 1  
Linda and James profiles: violin plots for expected frequency of target item. 

             Linda profile 

James profile 

Note. Boxes represent interquartile range of the distribution, with the notch in the middle representing the mean. The 
density of the violin plots represents the density of the data at each value, with wider sections indicating higher density. 
Note that the p-values for the contrast effects are for two-tail tests, different from the one-tail tests. Plots were generated 
using ggstatsplot R package (Patil, 2018). 



8 

CHANDRASHEKAR ET AL.2021 

 

and the current replication. That is, both studies did 
not include filler items. With an N of 96 in each con-
dition, Mellers et al. (2001) had 33% power to detect 
an effect size of d = 0.22. We used it as the equiva-
lence bound for the Study (SESOI set to d = 0.22). 
Equivalence tests for both Linda story (t(505.55) = -
2.11, p = .018) and James story (t(506.05) = -2.25, p = 
.012) indicating support for the null, meaningfully 
smaller from SESOI. 

Furthermore, we conducted one-tail Bayesian t-
tests with a prior set at 0.707 with a null region of (0, 
∞) such that the results against null (i.e., against mu 
= 0) would quantify support the semantic ambiguity 
hypothesis suggested by Hertwig and colleagues. 
For the Linda profile, we found BF10 = 0.08 (or BF01 = 
13.32), which indicates that, given the data, the null-
hypothesis is over 11 times more likely than the one-
sided alternative. Similarly, for the James profile, 
BF10 = 0.08 (or BF01= 12.06), which indicates that 
given data, the null-hypothesis is over nine times 
more likely than the one-sided alternative. 

Additional analyses 

The James profile may have been less repre-
sentative of an artist in comparison to the Linda 
profile as representative of a feminist. To test this 
aspect, we compared the average frequency estima-
tions for James and Linda story within ‘likely’ exper-
imental condition, in which participants rated the 
extent to which Linda and James were representa-
tive of a feminist and an artist, respectively. Fre-
quency estimations for the “likely” condition for 
Linda profile ("feminists", n = 260, M = 58.43, SD = 
28.93) were greater than for James profile ("artists", 
M = 36.20, SD = 26.08; Md = 22.22, t (259) = 11.99, p < 
.001, ds = 0.81, 95% CI [0.61, 0.88]). Whereas, a similar 
comparison between Linda and James story within 
the unlikely condition show that frequency estimate 
for Linda ("Bank teller", n = 258, M = 9.87, SD = 14.15) 
was lower than James ("Republicans", M = 18.38, SD 
= 19.03; Md = −8.52, t (257) = −6.87, p < .001, d = -0.50, 
95% CI [-0.56, -0.30]). This pattern of the observed 
difference between Linda and James across “likely” 
and “unlikely” conditions is consistent with the pre-
vious work that found that the occurrence of con-
junction effects, for example, depends on the prob-
abilities of A (Linda is a bank teller) and B (Linda is 
active in the feminist movement). In particular, 
there is a higher chance of conjunction effect when 
people perceive lower the probability of the less 
probable constituent P(A), and P(B) was high, in 

comparison to cases where P(A) and P(B) were both 
low or both high (Fisk & Pidgeon, 1996; Wells, 1985).  

The study included additional variables that mir-
rored the outcome variables but asked the partici-
pants to rate the percentage of males and females in 
the population that fit the description. For example, 
participants in ‘and’ condition after reading Linda 
story answered “Try and estimate, what percentage 
of females in the U.S. are Bank Tellers and Femi-
nists?”, and after reading James story answered “Try 
and estimate, what percentage of males in the 
U.S. are Republicans and Artists?”. We looked at the 
contrasts between the outcome variables and these 
additional variables across experimental conditions 
to ascertain if the ratings on the outcome variable 
were driven by profile description, rather than Linda 
by virtue of the name being female and similarly 
James being male. For Linda story across three ex-
perimental conditions Linda was rated higher on the 
outcome variable in comparison to the percentage 
of females in society (likely condition: Md = 15.31; t 
(259) = 8.67, p < .001; d = 0.58, 95% CI [0.41, 0.67]; 
‘and’ condition: Md = 6.43; t (251) = 4.75, p < .001; d = 
0.32, 95% CI [0.17, 0.43]; ‘and are’ condition: Md = 
5.79; t (257) = 3.98, p < .001; d = 0.27, CI [0.12, 0.37]). 
Similarly, for the James story, across conditions we 
found that James was rated higher on the outcome 
variable in comparison to the percentage of males in 
society (likely condition: Md = 19.10; t (259) = 11.15, p 
< .001; d = 0.87, CI [0.56, 0.83]; ‘and’ condition: Md = 
3.81; t (251) = 3.36, p = .001; d = 0.23, 95% CI [0.09, 
0.34]; ‘and are’ condition: Md = 2.58; t (257) = 2.39, p 
= .018; d = 0.15, 95% CI [0.03, 0.27]). 

Summary of replication findings 

The evaluation of the replication findings is sum-
marized in Table 2. Our replication for the Linda 
profile was in support of the confirmatory predic-
tions based on the conjunction effects. Whereas the 
results for the James profile were inconsistent. Im-
portantly, the original study reported that in fre-
quency estimate for “and” condition is higher than 
Unlikely condition. This prediction forms the basis 
for testing the absence or presence of semantic am-
biguity in predicting the conjunction effects. The 
replication results for this prediction are in the op-
posite direction, i.e., we found frequency estimates 
were lower for Unlikely condition than “and” condi-
tion. Therefore, the results of the James scenario are 
inconclusive in teasing apart the semantic ambiguity 
associated with “and” conjunction term. 



9 

CHANDRASHEKAR ET AL.2021 

Extension 

Descriptive results for the extension are pro-
vided in Table S8, and plots are provided in Figures 
S3 to S6. 

We first tested whether the conjunction effect 
occurred for any of the three items (people, male, 
females; within design) for each of the profiles 
(Linda and James, between design) and their as-
signed condition (likely, unlikely, "and", "and are"). As 
expected, we found no support for a conjunction ef-
fect for general population females with the Linda 
profile items (feminist and bank teller) yet without 
the Linda description. Similarly, we found no effect 
for males with the general population James profile 
items (Republicans and artist) yet without the James 
description. These findings should be interpreted 
with caution, yet these are in support of the con-
junction effect demonstrated with the Linda and 
James problems as being affected by the description 
of Linda and James in a way that makes conjunction 
items more salient than the unlikely. Meaning, that 
the conjunction effect may be dependent on the 
representativeness heuristic (Tversky & Kahneman, 
1982) and the preceding described profile. 

Yet, we found support for a conjunction effect for 
the Linda items for the estimation of people overall 
(feminist: M = 29.36, SD = 17.13; bank teller: M = 8.56, 
SD = 12.2; "and": M = 11.01, SD = 14.01). It remains to 
be explored why there would be support for a con-
junction effect for evaluation of people overall, but 
not for females or males, yet it does point out that 
the conjunction effect may sometimes occur with-
out the representativeness heuristic description, 
and with a within-subject design. At the very least, 
this suggests that the conjunction effect is context-
sensitive, as is also indicated in the differences in ef-
fects we found between the Linda and the James 
problem. 

 There were also patterns indicating statistical 
flaws, such that given a population gender split of 
50%-50% for females-males, participants indicated 
means for the general population that were far from 
the average of the estimation for females and the es-
timation of males (e.g., people who are bank teller: 
M = 8.56, SD = 12.2; females who are bank tellers: M 
= 21.46, SD = 28.64; males who are bank tellers: M = 
9.93, SD = 15.40). This is despite the within-subject 
design and the three questions being presented to-
gether. If participants indeed understood these 
questions correctly, this may be indicative of elici-

tation of estimate separately for each of the ques-
tions irrespective of the context or priors, and/or 
an inability to process or report percentages.  

Further findings regarding gender effects for the 
items in the two profile is provided in Tables S10 and 
S11. 

Discussion 

We conducted a preregistered well-powered 
replication of the main design across the three stud-
ies of Mellers et al.’s (2001).  

Our findings regarding the Linda profile demon-
strate support for conjunction effects for both “and” 
and “and are” connectors. The findings of the Linda 
scenario are not supportive of the alternative view 
that that conjunction effects observed in the Linda 
story are a manifestation of semantic interpretation 
of “and” term by participants as union instead of the 
intersection. The semantic ambiguity arguments 
predicted that “and are” experimental condition will 
fail to provide support for conjunction effects, and 
participants’ frequency estimate in “and are” exper-
imental condition will be lower than “and” experi-
mental condition. Furthermore, in reference to 
Linda story, we compared if the frequency estimates 
in the “and are” condition was lower than “and” con-
dition. Equivalence testing and Bayesian analyses in-
dicated support for null differences. These findings 
are in support of the Kahneman view of conjunction 
effects with frequency estimates.  

 Our findings for the James profile were not in 
support of either the Kahneman or the Hertwig hy-
potheses and previous findings. Firstly, the compar-
ison between “and” and “unlikely” condition was not 
in support of a conjunction effect. Secondly, we 
found no support for differences between frequency 
estimates between “and are” an unlikely condition. 
Further, similar to Linda story the planned compar-
ison that tested if the frequency estimates in the 
“and are” condition was lower than “and” condition 
supports the view that differences between condi-
tions were statistically equivalent to zero. Failure to 
find empirical support for conjunction effects with 
James story suggests that conjunction effects are 
context specific. Conjunction effects are commonly 
demonstrated using the Linda profile, yet the find-
ings regarding other scenarios are less clear (Cos-
tello & Watts, 2017). Thus, it is quite possible that 
James and Linda scenarios are qualitatively differ-
ent.  



10 

CHANDRASHEKAR ET AL.2021 

 

A closer examination of the original findings 
showed that the effects of the James scenario varied 
considerably across the experiments from weak ef-
fects in Experiment 1 ("and" and unlikely: d = 0.21; 
"and are" and unlikely: d = 0.13) with no indication of 
semantic ambiguity (d = 0.05) to mixed effects in Ex-
periment 2 ("and" and unlikely: d = 1.11; "and are" and 
unlikely: d = 0.01) indicating strong semantic ambi-
guity effect (d = 1.08). The mini meta-analytic effect 
we computed for the three original studies seemed 
to indicate differences in effect size between the 
Linda and the James scenarios, especially in regards 
to semantic ambiguity.  

Additional analyses we conducted suggested that 
the personality sketch of James was less representa-
tive of an artist in comparison to Linda’s personality 
sketch of a feminist. The observed difference is con-
sistent with view Kahneman’s argument that con-
junction effects arises through the substitution of 
representativeness estimates for probability esti-
mates. This may have been one of the reasons why 
the current study does not find support for conjunc-
tion effect for James story even when then compar-
ison was between the unlikely and the “and” condi-
tions, which was supported in Study 2 and 3 of the 
original paper.  

The current replication effort supports the 
Tversky and Kahneman’s (1983) assertion that con-
junction effects, when those occur, are a probabilis-
tic error due to representativeness and availability 
heuristic. More precisely, the results of the current 
study for Linda story are supportive of the view that 
frequency estimates do produce conjunction effects 
that rely on judgmental heuristic and are not driven 
by semantic ambiguity of the conjunction terms. 
The results for the James profile were inconclusive 
to likely failure. 

Overall, we found some support for conjunction 
effects, but that those may be less robust than ini-
tially expected. These findings indicate the im-
portance of further conducting well-powered pre-
registered replications and extensions that would 
revisit classic experiments in this domain and aim to 

gain deeper insights of effect, to investigate the re-
liability and generalizability of previous findings, the 
contextual variations of the conjunction effect. 

Author Contact  

Subramanya Prasad Chandrashekar, 
spchandr@ouhk.edu.hk, orcid.org/0000-0002-
8599-9241 

Correspondence about this article should be ad-
dressed to Gilad Feldman at gfeldman@hku.hk. 

Conflict of Interest and Funding 

This research was supported by the European 
Association for Social Psychology seedcorn grant. 

 Subramanya Prasad Chandrashekar would like to 
thank Institute of International Business and Gov-
ernance (IIBG), established with the substantial sup-
port of a grant from the Research Grants Council of 
the Hong Kong Special Administrative Region, China 
(UGC/IDS 16/17), for its support. 

Author Contributions 

Gilad Feldman (GF) was the course instructor for 
two social psychology courses (PSYC2071/3052) and 
led the two reported replication efforts in these 
courses. GF supervised each step in the project, 
conducted the pre-registrations, and ran data col-
lection. Subramanya Prasad Chandrashekar (SPC) 
integrated the two replication efforts into a manu-
script with validation and further extensions of the 
statistical analyses. GF and SPC jointly finalized the 
manuscript for submission. 

 Yat Hin Cheng and Chi Long Fong worked on the 
replication as part of the Judgment and Decision 
Making course (identified as Students PSYC2071 in 
the table below). Ying Chit Leung and Yui Tung 
Wong worked on the replication as part of the ad-
vanced social psychology course (identified as Stu-
dents PSYC3052 in the table below). 

 
 



11 

CHANDRASHEKAR ET AL.2021 

 

Contributor Roles Taxonomy 

In the table below, employ CRediT (Contributor 
Roles Taxonomy) to identify the contribution and 
roles played by the contributors in the current 
replication effort. Please refer to the url 
(https://www.casrai.org/credit.html) on details 
and definitions of each of the roles listed below. 

Role SPC GF 
Students PSYC 

2071 
Students PSYC  

3052 TA 
Conceptualization  X    
Pre-registrations  X X X  
Data curation  X    
Formal analysis X X X X  
Funding acquisition  X    
Investigation  X X X  
Methodology  X  X  
Pre-registration peer re-
view/ verification X  X X X 
Data analysis peer review/ 
verification X  X X  
Project administration  X   X 
Resources  X    
Software X X X X  
Supervision     X 
Validation X X    
Visualization X     
Writing-original draft X X    
Writing-review and editing X X    



12 

CHANDRASHEKAR ET AL.2021 

 

Open Science Practices 

   
 
This article earned the Preregistration+, Open 

Data and the Open Materials badge for preregister-
ing the hypothesis and analysis before data collec-
tion, and for making the data and materials openly 
available. It has been verified that the analysis repro-
duced the results presented in the article. The 
editorial process for this article relied on 
streamlined peer review where peer reviews 
obtained from previous journal(s) were moved 
forward and used as the basis for the editorial 
decision. These reviews are shared in the 
supplementary files, in the authors' cover letter. The 
identities of the reviewers are shown or hidden in 
accordance with the policy of the journal that 
originally obtained them. The entire editorial pro-
cess is published in the online supplement. 

References 

Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, 
F. J., Geller, J., Giner-Sorolla, R., ... & Van't Veer, 
A. (2014). The replication recipe: What makes 
for a convincing replication?. Journal of Experi-
mental Social Psychology, 50, 217-224. 
DOI:https://doi.org/10.1016/j.jesp.2013.10.005 

Delacre, M., Lakens, D., & Leys, C. (2017). Why Psy-
chologists Should by Default Use Welch’s t-test 
Instead of Student’s t-test. International Re-
view of Social Psychology, 30, 92–101. 
DOI: http://doi.org/10.5334/irsp.82 

Collaborative Open-science REsearch (2020). 
Large-scale replications and extensions of 
findings in Judgment and Decision Making. DOI 
10.17605/OSF.IO/5Z4A8. Retrieved March 
2020 from http://osf.io/5z4a8     

Costello, F., & Watts, P. (2017). Explaining high con-
junction fallacy rates: The probability theory 
plus noise account. Journal of Behavioral Deci-
sion Making, 30, 304-321. DOI: 
https://doi.org/10.1002/bdm.1936 

Fiedler, K. (1988). The dependence of the conjunc-
tion fallacy on subtle linguistic factors. Psycho-
logical Research, 50, 123–129. DOI: 
https://doi.org/10.1007/BF00309212 

Fisk, J. E., & Pidgeon, N. (1996). Component proba-
bilities and the conjunction fallacy: Resolving 
signed summation and the low component 
model in a contingent approach. Acta Psycho-
logica, 94, 1-20. DOI: 10.1016/0001-
6918(95)00048-8 

Gigerenzer, G. (1996). On narrow norms and vague 
heuristics: A reply to Kahneman and Tversky 
(1996). Psychological Review, 103, 592–596. 
DOI: https://doi.org/10.1037/0033-
295X.103.3.592 

Gigerenzer, G. (2005). I think, therefore I err. Social 
Research: An International Quarterly, 72, 195-
218.  

Hertwig, R., & Chase, V. M. (1998). Many reasons or 
just one: How response mode affects reasoning 
in the conjunction problem. Thinking and Rea-
soning, 4, 319–352. DOI:  
https://doi.org/10.1080/135467898394102 

Hertwig, R., & Gigerenzer, G. (1999). The ‘conjunc-
tion fallacy’ revisited: How intelligent infer-
ences look like reasoning errors. Journal of Be-
havioral Decision Making, 12, 275–305. DOI: 
10.1002/(SICI)1099-0771(1999) 

Kruschke, J. K., & Liddell, T. M. (2018). The Bayesian 
New Statistics: Hypothesis testing, estimation, 
meta-analysis, and power analysis from a 
Bayesian perspective. Psychonomic Bulletin & 
Review, 25, 178-206. DOI: 10.3758/s13423-016-
1221-4 

Lakens, D. (2017). Equivalence tests: a practical pri-
mer for t tests, correlations, and meta-anal-
yses. Social Psychological and Personality Sci-
ence, 8, 355-362. DOI: 
10.1177/1948550617697177 

Lakens, D., Scheel, A. M., & Isager, P. M. (2018). 
Equivalence testing for psychological research: 
A tutorial. Advances in Methods and Practices 
in Psychological Science, 1(2), 259-269. DOI: 
10.1177/2515245918770963 

LeBel, E. P., McCarthy, R. J., Earp, B. D., Elson, M., & 
Vanpaemel, W. (2018). A unified framework to 
quantify the credibility of scientific find-
ings. Advances in Methods and Practices in 
Psychological Science, 1(3), 389-402. DOI: 
10.1177/2515245918787489 

LeBel, E. P., Vanpaemel, W., Cheung, I., & Campbell, 
L. (2019). A Brief Guide to Evaluate 



13 

CHANDRASHEKAR ET AL.2021 

Replications. Meta Psychology, 541, 1–17. DOI: 
10.31219/osf.io/paxyn 

Litman, L., Robinson, J., & Abberbock, T. (2017). 
TurkPrime. com: A versatile crowdsourcing 
data acquisition platform for the behavioral 
sciences. Behavior Research Methods, 49, 433-
442. DOI: 10.3758/s13428-016-0727-z 

Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do 
frequency representations eliminate conjunc-
tion effects? An exercise in adversarial collabo-
ration. Psychological Science, 12, 269-275.DOI: 
10.1111/1467-9280.00350 

Morey, R. D., & Rouder, J. N. (2015). BayesFactor: 
Computation of Bayes factors for common de-
signs (R Package Version 0.9.12-2). Retrieved 
from https://CRAN.R-project.org/pack-
age=BayesFactor   

Open Science Collaboration (2015). Estimating the 
reproducibility of psychological science. Sci-
ence, 349, aac4716–aac4716. DOI: 10.1126/sci-
ence.aac4716 

Patil, I. (2018). ggstatsplot:“ggplot2” Based Plots 
with Statistical Details. CRAN.  

R Core Team (2015) R: A Language and Environment 
for Statistical Computing. R Foundation for 
Statistical Computing, Vienna, Austria. ISBN 3-
900051-07- 0, URL http://www.R-project.org   

Simonsohn, U. (2015). Small telescopes: Detectabil-
ity and the evaluation of replication re-
sults. Psychological Science, 26, 559-569. DOI: 
10.1177/0956797614567341 

Tversky, A., & Kahneman, D. (1982). Judgments of 
and by representativeness. In D. Kahneman, P. 
Slovic, & A. Tversky (Eds.), Judgment under un-
certainty: Heuristics and biases. UK Cam-
bridge: Cambridge University Press. 

Tversky, A., & Kahneman, D. (1983). Extensional ver-
sus intuitive reasoning: The conjunction fallacy 
in probability judgment. Psychological Re-
view, 90, 293-315. DOI: 10.1037/0033-
295X.90.4.293 

Vandekerckhove, J., Rouder, J. N., & Kruschke, J. K. 
(2018). Bayesian methods for advancing psy-
chological science. 25, 1-4. DOI: 
10.3758/s13423-018-1443-8 

van‘t Veer, A.E., & Giner-Sorolla, R. (2016). Pre-reg-
istration in social psychology—A discussion and 
suggested template. Journal of Experimental 
Social Psychology, 67, 2-12. DOI: 
10.1016/j.jesp.2016.03.004 

Wells, G. L. (1985). The conjunction error and the 
representativeness heuristic. Social Cogni-
tion, 3, 266-279. DOI: 10.1521/soco.1985.3.3.266 

Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, M. B. 
(2018). Making replication mainstream. Behav-
ioral and Brain Sciences, 41.