Meta-Psychology, 2019, vol 3, MP.2018.843, 
https://doi.org/10.15626/MP.2018.843 
Article type: Original Article 
Published under the CC-BY4.0 license 
 

Open data: Not relevant 
Open materials: Not relevant 
Open and reproducible analysis: Not relevant 
Open reviews and editorial process: Yes 
Preregistration: Not relevant 

Edited by:  Rickard Carlsson 
Reviewed by:  Nuijten, M. & Schimmack, U. 
All supplementary files can be accessed at the OSF project page: 
https://doi.org/10.17605/OSF.IO/Q56E8 

  
 A Brief Guide to Evaluate Replications 

Etienne P. LeBel 
KU Leuven 

Irene Cheung 
Huron University College 

Wolf Vanpaemel 
KU Leuven 

Lorne Campbell 
Western University 

The importance of replication is becoming increasingly appreciated, however, 
considerably less consensus exists about how to evaluate the design and results of 
replications. We make concrete recommendations on how to evaluate replications with 
more nuance than what is typically done currently in the literature. We highlight six study 
characteristics that are crucial for evaluating replications: replication method similarity, 
replication differences, investigator independence, method/data transparency, analytic 
result reproducibility, and auxiliary hypotheses’ plausibility evidence. We also 
recommend a more nuanced approach to statistically interpret replication results at the 
individual-study and meta-analytic levels, and propose clearer language to communicate 
replication results. 

Keywords: transparency, replicability, direct replication, evaluating replications, 
reproducibility  

 
There is growing consensus in the psychology 
community regarding the fundamental scientific 
value and importance of replication. Considerably 
less consensus, however, exists about how to 
evaluate the design and results of replication 
studies. In this article, we make concrete 
recommendations on how to evaluate replications 
with more nuance than what is typically done 
currently in the literature. These recommendations 
are made to maximize the likelihood that replication 
results are interpreted in a fair and principled 
manner. 

We propose a two-stage approach. The first one 
involves considering and evaluating six crucial study 
characteristics (the first three specific to replication 
studies with the last three relevant for any study): (1) 
replication method similarity, (2) replication 
differences, (3) investigator independence, (4) 
method/data transparency, (5) analytic result 
reproducibility, and (6) auxiliary hypotheses’ 
plausibility evidence. Second, and assuming sound 
study characteristics, we recommend more 
nuanced ways to interpret replication results at the 
individual-study and meta-analytic levels. Finally, 
we propose the use of clearer and less ambiguous 
language to more effectively communicate the 
results of replication studies. 

These recommendations are directly based on 
curating N = 1,127 replications (as of August 2018) 
available at Curate Science (CurateScience.org), a 
web platform that organizes and tracks the 
transparency and replications of published findings 
in the social sciences (LeBel, McCarthy, Earp, Elson, 
& Vanpaemel, 2018). This is the largest known meta-
scientific effort to evaluate and interpret replication 

We thank the editor Rickard Carlsson and reviewers 
Michèle Nuijten and Ulrich Schimmack for valuable feedback 
on an earlier version of this article. We also thank Chiel Mues 
for copyediting our manuscript. 

Correspondence concerning this article should be 
addressed to Etienne P. LeBel, Quantitative Psychology and 
Individual Differences Unit, KU Leuven, Tiensestraat 102 - 
Box 3713, Leuven, Belgium, 3000. Email: 
etienne.lebel@gmail.com 

https://doi.org/10.17605/OSF.IO/Q56E8
http://www.curatescience.org/


LEBEL, VANPAEMEL, CHEUNG, & CAMPBELL 

 
2 

results of studies across a wide and heterogeneous 
set of study types, designs, and methodologies. 

Replication-Specific Study Characteristics 

When evaluating replication studies, the 
following three study characteristics are of crucial 
importance: 

1. Methodological similarity. 

A first aspect is whether a replication study 
employed a sufficiently similar methodology to the 
original study (i.e., at minimum, used the same 
operationalizations for the independent and 
dependent variables, as in “close replications”; LeBel 
et al., 2018). This is required because only such 
replications can cast doubt upon an original 
hypothesis (assuming sound auxiliary hypotheses, 
see section below), and hence in principle, falsify a 
hypothesis (LeBel, Berger, Campbell, & Loving, 2017; 
Pashler & Harris, 2012). Studies that are not 
sufficiently similar can only speak to the 
generalizability -- but not replicability -- of a 
phenomenon under study, and should therefore be 
treated as "generalizability studies" rather than 
“replication studies”. Such studies are sometimes 
called "conceptual replications", but this is a 
misnomer given that it is more accurate to 
conceptualize such studies as "extensions" rather 
than replications (LeBel et al., 2017; Zwaan, Etz, 
Lucas, & Donnellan, 2017). 

2. Replication differences. 

A second aspect to carefully consider is whether 
there are any study design characteristics that 
differed from the comparison original study. These 
are important to consider whether the differences 
were within or beyond a researcher’s control (LeBel 
et al., 2018). Such differences are critical to consider 
because they help the community begin to 
understand the replicability and generalizability of 
an effect. Consistent positive replication evidence 
across replications with minor design differences 
suggests an effect is likely robust across those 
design differences. On the other hand, for 
inconsistent replication evidence, such differences 
may provide initial clues regarding potential 
boundary conditions of an effect. 

3. Investigator independence. 

A final important consideration is the degree of 
independence between the replication investigators 
and researchers who conducted the original study. 
This is important to consider to mitigate against the 
problem of “correlated investigators” (Rosenthal, 
1991) whereby non-independent investigators may 
be more susceptible to confirmation biases given 
vested interest in an effect (although preregistration 
and other transparent practices can alleviate these 
issues; see next section). 

General Study Characteristics 

When evaluating studies in general, the following 
three study characteristics are important to 
consider. 

1. Study transparency. 

Sufficient transparency is required to allow 
comprehensive scrutiny of how any study was 
conducted. Sufficient transparency means posting 
the experimental materials and underlying data in a 
readable format (e.g., with a codebook) on a public 
repository (criteria for earning open materials and 
open data badges, respectively; Kidwell et al., 2016) 
and following the relevant reporting standards for 
the type of study and methodology used (e.g., 
CONSORT reporting standard for experimental 
studies; Schulz, Altman, & Moher, 2010). If a study is 
not reported with sufficient transparency, it cannot 
be properly scrutinized. The findings from such a 
study are consequently of little value because the 
target hypothesis was not tested in a sufficiently 
falsifiable manner. Preregistering a study (which 
publicly commits data collection, processing, and 
analysis plans prior to data collection) offers even 
more transparency and limits researcher degrees of 
freedom (assuming that the preregistered 
procedure was actually followed). 

2. Analytic result reproducibility. 

For any study, it is also important to consider 
whether a study’s primary result (or set of results) is 
analytically reproducible. That is, whether a study’s 
primary result can be successfully reproduced 
(within a certain margin of error) from the raw or 
transformed data (this is contingent of course on the 
fact that the data are actually available, whether 
publicly, as in the case of “open data”, or otherwise). 


A BRIEF GUIDE TO EVALUATE REPLICATIONS 

If analytic reproducibility is confirmed, then our 
confidence in a study’s reported results is boosted 
(and ideally results can also be confirmed to be 
robust across alternative justifiable data-analytic 
choices; Steegen, Tuerlinckx, Gelman, & Vanpaemel, 
2016). If analytic reproducibility is not confirmed 
and/or if discrepancies are detected, then our 
confidence should be reduced and this should be 
taken into account when interpreting a study’s 
results. 

3. Auxiliary hypotheses. 

Finally, for any study, researchers should 
consider all available evidence regarding how 
plausible it is that the relevant auxiliary hypotheses, 
needed to test the substantive hypothesis at hand, 
were true (LeBel et al., 2018). Auxiliary hypotheses 
include, for example, the psychometric validity of 
the measuring instruments, and the sound 
realizations of experimental conditions (Meehl, 
1990). This can be done by examining reported 
evidence of positive controls or evidence that a 
replication sample had the ability to detect some 
effect (e.g., replicating a past known effect; 
manipulation check evidence). These considerations 
are particularly crucial when interpreting null 
results so that one can rule out more mundane 
reasons for not having detected a signal (e.g., fatal 
experimenter or data processing errors; though 
such fatal errors can also sometimes cause false 
positive results). 

Nuanced Statistical Interpretation and Language 

Once these six study characteristics have been 
evaluated and taken into account, we recommend 
statistical approaches to interpret the results of a 
replication study at the individual-study and meta-
analytic levels that are more nuanced than what is 
currently typically done. We then propose the use of 
clearer language to communicate replication 
results. 

                                                      
1 The ES estimate precision of an original study is not 
currently accounted for because the vast majority of legacy 
literature original studies don’t report 95% CIs (and CIs most 
often cannot be calculated because insufficient information 
is reported). In rare cases that CIs are reported, they are 
typically so wide (given the underpowered nature of the 

Statistical interpretation: Individual-study level. 

At the individual-study level, we recommend that 
the following three distinct statistical aspects of a 
replication result are considered: (1) whether a 
signal was detected, (2) consistency of the 
replication effect size (ES) relative to the original 
study ES, and (3) the relative precision of the 
replication ES estimate relative to the original study. 
Such considerations yield the following replication 
outcome categories for the situation where an 
original study detected a signal (see Figure 1, Panel 
A, for visual depictions of these distinct scenarios)1: 

1. Signal – consistent: replication ES 95% 
confidence interval (CI) excludes 0 and 
includes original ES point estimate (Panel A 
replication scenario #1; e.g., Chartier’s, 2015, 
Reproducibility Project: Psychology [RPP] 
#31 replication result of McCrea’s, 2008 
Study 5; see Table 1 in the Appendix for 
details of Chartier's, 2015 RPP #31 replication 
and subsequently cited replication 
examples). 

2. Signal – inconsistent: replication ES 95% CI 
excludes 0 but also excludes original ES point 
estimate. Three sub categorizations exist 
within this outcome category: 
a. Signal – inconsistent, larger (same 

direction): replication ES is larger and in 
same direction as original ES (Panel A 
replication scenario #2; e.g., Veer et al.’s, 
2015, RPP #36 replication result of 
Armor et al.’s, 2008 Study 1). 

b. Signal – inconsistent, smaller (same 
direction): replication ES is smaller and 
in same direction as original ES (Panel A 
replication scenario #3; e.g., Ratliff’s, 
2015, RPP #26 replication result of 
Fischer et al.’s, 2008 Study 4). 

c. Signal – inconsistent, opposite 
direction/pattern: replication ES is in 
opposite direction (or reflects an 
inconsistent pattern) relative to the 
original ES direction/pattern (Panel A 
replication scenario #4; e.g., Earp et al.’s, 

legacy literature) that ES estimates are not statistically 
falsifiable in practical terms. Once it becomes the norm in 
the field to report highly precise ES estimates, however, it 
will become possible and desirable to account for original 
study ES estimate precision when statistically interpreting 
replication results. 


LEBEL, VANPAEMEL, CHEUNG, & CAMPBELL 

 
4 

2014 Study 3 replication result of Zhong 
& Liljenquist’s, 2006 Study 2). 

3. No signal – consistent: replication ES 95% CI 
includes 0 but also includes original ES point 
estimate (Panel A replication scenario #5; 
e.g., Hull et al.’s, 2002 Study 1b replication 
result of Bargh et al.’s, 1996 Study 2a). 

4. No signal – inconsistent: replication ES 95% 
CI includes 0 but excludes original ES point 
estimate (Panel A replication scenario #6; 
e.g., LeBel & Campbell’s, 2013 Study 1 
replication result of Vess’, 2012 Study 1). 

 
Figure 1. Distinct hypothetical outcomes of a replication study based on considering three statistical aspects of a 
replication result: (1) whether a signal was detected, (2) consistency of replication effect size (ES) relative to an original 
study, and (3) the precision of replication ES estimate relative to ES estimate precision in an original study. Outcomes are 
separated for situations where an original study detected a signal (Panel A) versus did not detect a signal (Panel B). 


A BRIEF GUIDE TO EVALUATE REPLICATIONS 

In cases where a replication effect size estimate 
was less precise than the original (i.e., the 
replication ES confidence interval is wider than the 
original), which can occur when a replication uses a 
smaller sample size and/or when the replication 
sample exhibits higher variability, we propose the 
label "less precise" be used to warn readers that such 
replication result should only be interpreted meta-
analytically (Panel A replication scenario #7; e.g., 
Schuler & Wanke’s, 2016 Study 2 replication result of 
Caruso et al.’s, 2013 Study 2). 

In the situation where an original study did not 
detect a signal, such considerations yield the 
following replication outcome categories (see Figure 
1, Panel B, for visual depictions of these distinct 
scenarios): 

1. No signal – consistent: replication ES 95% 
confidence interval (CI) includes 0 and 
includes original ES point estimate (Panel B 
replication scenario #1; e.g., Selterman et 
al.’s, 2015, RPP #29 replication result of 
Eastwick & Finkel’s, 2008 Study 1). 

2. No signal – consistent (less precise): 
replication ES 95% confidence interval (CI) 
includes 0 and includes original ES point 
estimate, but replication ES estimate is less 
precise than in original study (Panel B 
replication scenario #2; no replication is yet 
known to fall under this scenario). 

3. Signal – consistent: replication ES 95% 
confidence interval (CI) excludes 0 but 
includes original ES point estimate (Panel B 
replication scenario #3; Roebke & Penna’s 
2015, RPP #76 replication result of Couture 
et al.'s, 2008 Study 1). 

4. Signal – inconsistent: replication ES 95% 
confidence interval (CI) excludes 0 and 
excludes original ES point estimate. Two sub 
categorizations exist within this outcome 
category: 
a. Signal – inconsistent, positive effect: 

replication ES involves a positive effect 
(Panel B replication scenario #4; e.g., 
Cohn’s, 2015, RPP #45 replication result 
of Ranganath & Nosek’s, 2008 Study 1). 

b. Signal – inconsistent, negative effect: 
replication ES involves a negative effect 
(Panel B replication scenario #5; e.g., 
no replication is yet known to fall under 
this scenario). 

 
From this perspective, the proposed improved 

language to describe a replication study under 

replication scenario #6 would be: “We report a 
replication study of effect X. No signal was detected 
and the effect size was inconsistent with the original 
one.” This terminology contrasts favorably with 
several ambiguous or unclear replication-related 
terminologies that are currently commonly used to 
describe replication results (e.g., “unsuccessful”, 
“failed”, “failure to replicate”, “non-replication”).  
The terms “unsuccessful” or “failed” (or “failure to 
replicate”) are ambiguous: was it the replication 
methodology or the replication result that was 
unsuccessful or failed (with similar logic applied to 
the ambiguous term “non-replication”)? The terms 
“unsuccessful” or “failed” are also problematic 
because of the implicit message conveyed that 
something was “wrong” with the replication. For 
example, though the “small telescope approach” 
(Simonsohn, 2015) was an improvement over the 
prior simplistic standard of considering a replication 
p < .05 as “successful” and p > .05 as “unsuccessful”, 
the approach nonetheless uses ambiguous language 
that does not actually describe a replication result 
(e.g., “uninformative” vs. “informative failure to 
replicate”). Instead, the terminology we propose 
offers unambiguous and descriptively accurate 
language, stating both whether a signal was 
detected and the consistency of the replication ES 
estimate relative to the original study. The proposed 
nuanced approach to statistically interpreting 
replication evidence improves the clarity of the 
language to describe and communicate replication 
results. 

Statistical interpretation: Meta-analytic level. 

Interpreting the outcomes of a set of replication 
studies can proceed in two ways: an informal 
approach, when only a few replications are available, 
and a more quantitative meta-analytic approach 
when several replications are available for a specific 
operationalization of an effect. The first one 
considers whether replications can consistently 
detect a signal, each of which is consistent (i.e., of 
similar magnitude) with the ES point estimate from 
the original study (Panel A replication scenario #1). 
Under this situation, one could informally say that 
an effect is “replicable.” When several replications 
are available, a more quantitative meta-analytic 
approach can be taken: an effect can be considered 
“replicable” when the meta-analytic ES estimate 
excludes zero and is consistent with the original ES 
point estimate (also replication scenario #1, see 


LEBEL, VANPAEMEL, CHEUNG, & CAMPBELL 

 
6 

Panel A Figure 1; see also Mathur & VanderWeele, 
2018). 

Conclusion 

It is important to note that replicability should be 
seen as a minimum requirement for scientific 
progress rather than an arbiter of truth. Replicability 
ensures that a research community avoids going 
down blind alleys chasing after anomalous results 
that emerged due to chance, noise, or other 
unknown errors. However, when adjudicating the 
replicability of an effect, it is important to keep in 
mind that an effect that does not appear to be 
replicable does not necessarily mean the tested 
hypothesis is false: It is always possible that an effect 
is replicable via alternative methods or 
operationalizations and/or that there were 
problems with some of the auxiliary hypotheses 
(e.g., invalid measurement, or unclear instructions, 
etc.). This possibility, however, should not be 
exploited: eventually one must consider the value of 
continued testing of a hypothesis across different 
operationalizations and contexts. Conversely, an 
effect that appears replicable does not necessarily 
mean the tested hypothesis is true: A replicable 
effect may not necessarily reflect a valid and/or 
generalizable effect (e.g., a replicable effect may 
simply reflect a measurement artifact and/or may 
not generalize to other methods, populations, or 
contexts). 

The recommendations advocated in this article 
are based on curating over one thousand 
replications at Curate Science (as of August 2018). 
These recommendations have been applied to each 
of the replication in its database, including 
employing our suggested language to describe the 
outcome of each of its curated replication. It is 
expected, however, that these recommendations 
will evolve over time as additional replications, from 
an even wider set of studies, are curated and 
evaluated (indeed, as of September 2018, 
approximately 1,800 replications are in the queue to 
be curated at Curate Science). Consequently, these 
recommendations should be seen as a starting point 
for the research community to more accurately 
evaluate replication results, as we gradually learn 
more sophisticated approaches to interpret 
replication results. We hope, however, that our 
proposed recommendations will be a stepping stone 
in this direction and consequently accelerate 

psychology’s path on becoming a more cumulative 
and valid science. 


A BRIEF GUIDE TO EVALUATE REPLICATIONS 

 
Appendix 

Table 1. Known published replication results that fall under the distinct hypothetical replication outcomes depicted in 
Figure 1 (when available). 

Target effect Original study 
Original ES 

estimate (± 95% CI) 
Replication ES 

estimate (± 95% CI) Replication study 
Replication 

outcome 

Signal detected in original study 

math self-
handicapping 

effect 

McCrea (2008) 
Study 5 r = .34 ± .35 r = .29 ± .24 

Chartier (2015, 
RPP #31) 

signal – 
consistent 

prescribed 
optimism effect 

Armor, Massey 
et al. (2008) 

Study 1 
r = .68 ± .10 r = .76 ± .06 Veer et al. (2015, RPP #36) 

signal - 
inconsistent, 

larger 

selective 
exposure 

information 
quantity effect 

Fischer, Schulz-
Hardt et al. 

(2008) Study 4 
r = .50 ± .21 r = .22 ± .16 Ratliff (2015, RPP #26) 

signal - 
inconsistent, 

smaller 

Macbeth effect 
Zhong & 

Liljenquist (2006) 
Study 2 

r = .45 ± .31 r = -.11 ± .11 Earp et al. (2014) Study 3 

signal - 
inconsistent, 

opposite 

elderly priming 
effect 

Bargh et al. 
(1996) Study 2a d = 1.02 ± .76 d = .53 ± .63 

Hull et al. (2002) 
Study 1b 

no signal - 
consistent 

anxious 
attachment 

warm food effect 

Vess (2012) Study 
1 d = .60 ± .55 d = .03 ± .27 

LeBel & 
Campbell (2013) 

Study 1 

no signal - 
inconsistent 

money priming 
effect 

Caruso et al. 
(2013) Study 2 d = .43 ± .30 d = -.09 ± .39 

Schuler & Wänke 
(2016) Study 2 

no signal - 
inconsistent (less 

precise) 

No signal detected in original study 

generalized 
earning prospect 

predicts 
romantic 

interest effect 

Eastwick & 
Finkel (2008) 

Study 1 
r = .14 ± .16 r = .03 ± .11 

Selterman, 
Chagnon et al. 

(2015, RPP #29) 

no signal - 
consistent 

Hebb repetition 
effect revisited 

Couture, Lafond, 
& Tremblay 

(2008) Study 1 
r = .35 ± .38 r = .27 ± .24 Roebke & Penna (2015, RPP #76) 

signal - 
consistent 

implicit attitude 
generalization 

occurs 
immediately 

effect 

Ranganath & 
Nosek (2008) 

Study 1 
r = .00 ± .08 r = .11 ± .04 Cohn (2015, RPP #45) 

signal - 
inconsistent, 

larger 

 
LEBEL, VANPAEMEL, CHEUNG, & CAMPBELL 

 
8 

References 

Armor, D. A., Massey, C., & Sackett, A. M. (2008). 
Prescribed optimism: Is it right to be wrong 
about the future? Psychological Science, 19, 
329-331. doi:10.1111/j.1467-9280.2008.02089.x 

Bargh, J. A., Chen, M., & Burrows, L. (1996). 
Automaticity of social behavior: Direct effects 
of trait construct and stereotype activation on 
action. Journal of Personality and Social 
Psychology, 71(2), 230-244. doi:10.1037/0022-
3514.71.2.230 

Chartier, C. R., & Perna, O. (2015). Replication of 
“Self-handicapping, excuse making, and 
counterfactual thinking: Consequences for 
self-esteem and future motivation.” by SM 
McCrea (2008, Journal of Personality and 
Social Psychology). Retrieved from 
https://osf.io/ytxgr/ (Reproducibility Project: 
Psychology Study #31) 

Cohn, M. A. (2015). Replication of “Implicit Attitude 
Generalization Occurs Immediately; Explicit 
Attitude Generalization Takes Time” 
(Ranganath & Nosek, 2008). Retrieved from: 
https://osf.io/9xt25/ (Reproducibility Project: 
Psychology Study #45) 

Caruso, E. M., Vohs, K. D., Baxter, B., & Waytz, A. 
(2013). Mere exposure to money increases 
endorsement of free-market systems and 
social inequality. Journal of Experimental 
Psychology: General, 142, 301-306. 
doi:10.1037/a0029288 

Couture, M., Lafond, D., & Tremblay, S. (2008). 
Learning correct responses and errors in the 
hebb repetition effect: Two faces of the same 
coin. Journal of Experimental Psychology: 
Learning, Memory, and Cognition, 34, 524-532. 
doi:10.1037/0278-7393.34.3.524 

Earp, B. D., Everett, J. A. C., Madva, E. N., & Hamlin, 
J. K. (2014). Out, damned spot: Can the 
"macbeth effect" be replicated? Basic and 
Applied Social Psychology, 36, 91-98. 
doi:10.1080/01973533.2013.856792 

Eastwick, P. W., & Finkel, E. J. (2008). Sex 
differences in mate preferences revisited: Do 
people know what they initially desire in a 
romantic partner? Journal of Personality and 
Social Psychology, 94, 245-264. 
doi:10.1037/0022-3514.94.2.245 

Fischer, P., Schulz-Hardt, S., & Frey, D. (2008). 
Selective exposure and information quantity: 
How different information quantities moderate 

decision makers' preference for consistent and 
inconsistent information. Journal of Personality 
and Social Psychology, 94, 231-244. 
doi:10.1037/0022-3514.94.2.94.2.231 

Hull, J., Slone, L., Meteyer, K., & Matthews, A. 
(2002). The nonconsciousness of self-
consciousness. Journal of Personality and Social 
Psychology, 83, 406-424. doi:10.1037//0022-
3514.83.2.406 

Kidwell, M., Lazarevic, L., Baranski, E., Hardwicke, 
T., Piechowski, S., Falkenberg, L., . . . Nosek, B. 
(2016). Badges to acknowledge open practices: 
A simple, low-cost, effective method for 
increasing transparency. Plos Biology, 14, 
e1002456. doi:10.1371/journal.pbio.1002456 

LeBel, E. P., & Campbell, L. (2013). Heightened 
sensitivity to temperature cues in individuals 
with high anxious attachment: Real or elusive 
phenomenon? Psychological Science, 24, 2128-
2130. doi:10.1177/0956797613486983 

LeBel, E., Berger, D., Campbell, L., & Loving, T. 
(2017). Falsifiability is not optional. Journal of 
Personality and Social Psychology, 113, 696-696. 
doi:10.1037/pspi0000117 

LeBel, E. P., McCarthy, R., Earp, B., Elson, M. & 
Vanpaemel, W. (2018). A Unified Framework to 
Quantify the Credibility of Scientific Findings. 
Advances in Methods and Practices in 
Psychological Science, 1(3), 389-402. 

Mathur & VanderWeele (2018, May 7). Preprint: 
"New statistical metrics for multisite 
replication projects". 
https://doi.org/10.31219/osf.io/w89s5 

McCrea, S. M. (2008). Self-handicapping, excuse 
making, and counterfactual thinking: 
Consequences for self-esteem and future 
motivation. Journal of Personality and Social 
Psychology, 95, 274-292. 
http://dx.doi.org/10.1037/0022-3514.95.2.274 

Meehl, P. E. (1990). Why summaries of research on 
psychological theories are often 
uninterpretable. Psychological Reports, 66, 195-
244. doi:10.2466/PRO.66.1.195-244 

Pashler, H., & Harris, C. R. (2012). Is the replicability 
crisis overblown? Three arguments examined. 
Perspectives on Psychological Science, 7, 531–
536. doi:10.1177/1745691612463401 

Ranganath, K. A., & Nosek, B. A. (2008). Implicit 
attitude generalization occurs immediately; 
explicit attitude generalization takes time. 
Psychological Science, 19, 249-254. 
doi:10.1111/j.1467-9280.2008.02076.x 


A BRIEF GUIDE TO EVALUATE REPLICATIONS 

Ratliff, K. A. (2015). Replication of Fischer, Schulz-
Hardt, and Frey (2008). Retrieved from: 
https://osf.io/5afur/ (Reproducibility Project: 
Psychology Study #26) 

Roebke, M., & Penna, N. D. (2015). Replication of 
“Learning correct responses and errors in the 
Hebb repetition effect: two faces of the same 
coin” by M Couture, D Lafond, S Tremblay 
(2008, Journal of Experimental Psychology: 
Learning, Memory, and Cognition). Retrieved 
from: https://osf.io/qm5n6/ (Reproducibility 
Project: Psychology Study #76) 

Rosenthal, R. (1991). Applied Social Research 
Methods: Meta-analytic procedures for social 
research. Thousand Oaks, CA: SAGE. doi: 
10.4135/9781412984997 

Schuler, J., & Wänke, M. (2016). A fresh look on 
money priming: Feeling privileged or not 
makes a difference. Social Psychological and 
Personality Science, 7, 366-373. 
doi:10.1177/1948550616628608 

Schulz, K. F., Altman, D. G., Moher, D., CONSORT 
Group, & for the CONSORT Group. (2010). 
CONSORT 2010 statement: Updated guidelines 
for reporting parallel group randomised trials. 
BMJ, 340, 698-702. doi:10.1136/bmj.c332 

Selterman, D. F., Chagnon, E., & Mackinnon, S. 
(2015). Replication of: Sex Differences in Mate 
Preferences Revisited: Do People Know What 
They Initially Desire in a Romantic Partner? by 
Paul Eastwick & Eli Finkel (2008, Journal of 
Personality and Social Psychology). Retrieved 
from: https://osf.io/5pjsn/ (Reproducibility 
Project: Psychology Study #29) 

Simonsohn, U. (2015). Small telescopes: 
Detectability and the evaluation of replication 
results. Psychological Science, 26, 559–569. 
http://dx.doi .org/10.1177/0956797614567341 

Steegen, S., Tuerlinckx, F., Gelman, A., & 
Vanpaemel, W. (2016). Increasing transparency 
through a multiverse analysis. Perspectives on 
Psychological Science, 11, 702-712. 
doi:10.1177/1745691616658637  

Veer, A. vt., Lassetter, B., Brandt, M. J., & Mehta, P. 
H. (2015). The Reproducibility of Psychological 
Science The Open Science Collaboration 
Replication of Prescribed Optimism: Is it Right 
to Be Wrong About the Future? by David A. 
Armor, Cade Massey & Aaron M. Sackett (2008, 
Psychological Science). Retrieved from: 
https://osf.io/8u5v2/ (Reproducibility 
Project: Psychology Study #36) 

Vess, M. (2012). Warm thoughts: Attachment 
anxiety and sensitivity to temperature cues. 
Psychological Science, 23, 472-474. 
doi:10.1177/0956797611435919 

Zhong, C., & Liljenquist, K. (2006). Washing away 
your sins: Threatened morality and physical 
cleansing. Science, 313, 1451-1452. 
doi:10.1126/science.1130726 

Zwaan, R., Etz, A., Lucas, R., & Donnellan, M. (2017). 
Making replication mainstream. Behavioral and 
Brain Sciences, 1-50. 
doi:10.1017/S0140525X17001972  

 
	Replication-Specific Study Characteristics
	1. Methodological similarity.
	2. Replication differences.
	3. Investigator independence.

	General Study Characteristics
	1. Study transparency.
	2. Analytic result reproducibility.
	3. Auxiliary hypotheses.

	Nuanced Statistical Interpretation and Language
	Statistical interpretation: Individual-study level.
	Statistical interpretation: Meta-analytic level.

	Conclusion
	Appendix
	References