Meta-Psychology, 2022, vol 6, MP.2020.2718
https://doi.org/10.15626/MP.2020.2718
Article type: Tutorial
Published under the CC-BY4.0 license

Open data: Not Applicable
Open materials: Yes

Open and reproducible analysis: Yes
Open reviews and editorial process: Yes

Preregistration: No

Edited by: Rickard Carlsson
Reviewed by: Nordström, T., Rohrer, J., Zigerell, L.J.

Analysis reproduced by: Lucija Batinović
All supplementary files can be accessed at OSF:

https://doi.org/10.17605/OSF.IO/W98S6

Multiverse analyses in the classroom
Tom Heyman

Methodology and Statistics Unit, Institute of Psychology, Leiden University

Wolf Vanpaemel
Faculty of Psychology and Educational Sciences, KU Leuven

Abstract
Most empirical papers in psychology involve statistical analyses performed on a new or existing dataset. Sometimes
the robustness of a finding is demonstrated via data-analytical triangulation (e.g., obtaining comparable outcomes
across different operationalizations of the dependent variable), but systematically considering the plethora of al-
ternative analysis pathways is rather uncommon. However, researchers increasingly recognize the importance of
establishing the robustness of a finding. The latter can be accomplished through a so-called multiverse analysis,
which involves methodically examining the arbitrary choices pertaining to data processing and/or model building.
In the present paper, we describe how the multiverse approach can be implemented in student research projects
within psychology programs, drawing on our personal experience as instructors. Embedding a multiverse project
in students’ curricula addresses an important scientific need, as studies examining the robustness or fragility of
phenomena are largely lacking in psychology. Additionally, it offers students an ideal opportunity to put various
statistical methods into practice, thereby also raising awareness about the abundance and consequences of arbi-
trary decisions in data-analytic processing. An attractive practical feature is that one can reuse existing datasets,
which proves especially useful when resources are limited, or when circumstances such as the COVID-19 lockdown
measures restrict data collection possibilities.

Keywords: multiverse analysis; robustness; education; pedagogy; open science

An important part of many psychology students’ (un-
der)graduate programs are research-methods classes in
which students are asked to complete their own (small-
scale) research project (e.g., Kierniesky, 2005). Typ-
ically, the goal is to run through the entire empiri-
cal cycle, thus putting knowledge gained from previ-
ous theory-focused courses into practice. However, this
can be quite challenging, as time and resources are of-
ten limited in such projects. As a consequence, stu-
dents and instructors might (begrudgingly) take short-
cuts, resulting in ill-designed or underpowered studies,
poorly-motivated research questions, sloppy measure-
ment practices, and so on. Perhaps the most devastat-
ing consequence of this approach is that students could

come away with a wrong impression of what psycholog-
ical research entails, and it might even instill bad habits
in prospective researchers. In the present paper, we sug-
gest an alternative implementation of research-methods
classes that addresses these concerns. In particular, we
propose that completing a multiverse analysis project
as part of such research methods classes has several im-
portant benefits. First, we explain what a multiverse
analysis entails (see Steegen et al., 2016). Then, we de-
scribe the two main ingredients of a multiverse-in-the-
classroom project: a suitable dataset and a solid (meta-
)scientific background. Next, we give a worked example
of such a project, based on our personal experience as
instructors. Finally, we discuss the benefits and chal-

https://doi.org/10.15626/MP.2020.2718
https://doi.org/10.17605/OSF.IO/W98S6


2

lenges of the multiverse-in-the-classroom.

What is a Multiverse Analysis?

Most empirical papers in psychology involve some
kind of data analysis. Typically, there is no unique path
from the raw data to the eventual conclusions of a pa-
per. Researchers need to make a number of decisions
along the way, such as whether and how to deal with
outliers and missing data, whether and how to trans-
form variables, and so on. In some cases, theoretical
considerations provide a clear solution to such ques-
tions, yet, at times, researchers have little to go on,
so they turn to their gut feeling, lab habits, or field-
specific standards, which are often poorly motivated. As
a result, when processing and analyzing empirical data,
researchers regularly face certain choices that are ar-
bitrary in nature. These researcher degrees of freedom
(Simmons et al., 2011) lead to a garden of forking paths
(Gelman & Loken, 2014).

As an example, suppose that, for a given dataset, a re-
searcher identifies four plausible ways to deal with out-
liers, three approaches to handle missing data, and two
reasonable options to transform a particular variable.
Assuming all combinations are sensible, this would lead
to 4*3*2 = 24 unique paths, each with its own outcome
(see also Bishop, 2016). However, researchers usually
report the results for just one or a few of these paths, by
picking only one or a few options out of the pool of plau-
sible alternatives (e.g., deleting observations 2.5 stan-
dard deviations above the mean, listwise deletion when
encountering missing values, and log-transforming a
positively skewed variable; see also Elson, 2016, for a
practical illustration).

In contrast, the idea of a multiverse analysis (Steegen
et al., 2016) is to explore and report on a wide array of
imaginable (combinations of) reasonable alternatives,
each of which providing an answer to the same research
question. By explicitly considering the results of several
reasonable analyses, a multiverse analysis can give an
idea about the robustness or fragility of a certain find-
ing, and might even point to moderators of the effect
in question (i.e., key choices regarding data process-
ing and/or analysis that the conclusion depends on). A
multiverse analysis can be applied to newly collected
data (e.g., Kalokerinos et al., 2019), but also retrospec-
tively using existing data (e.g., Moors & Hesselmann,
2019). For instance, Credé and Phillips (2017) con-
ducted a multiverse analysis on data from Carney et al.
(2010) examining the power pose effect, which is the
(controversial) finding that holding a high-power body
pose affects hormone levels. Their multiverse analysis
revealed that most alternative pathways yielded null ef-
fects, whereas the original single-pathway analysis pro-

duced a significant effect.
The importance of a multiverse analysis is also nicely

illustrated by the study of Silberzahn et al. (2018),
in which 29 research teams independently examined
whether referees in soccer are more likely to give red
cards to players with a darker skin tone compared
to light-skin-toned players. All teams used the same
dataset to answer this question, yet the conclusions var-
ied considerably: 20 out of 29 teams (69%) found a
positive relation (i.e., dark-skin-toned-players tended to
receive more red cards), whereas 9 teams obtained a
null effect, which was even numerically negative in two
cases. These results underscore that there are often sev-
eral ways to process and analyze a given dataset, and
that picking a single pathway might be deceiving, which
is why conducting a multiverse analysis can be very in-
formative.

Ideas similar to that of a multiverse analysis have
been proposed under different names, such as specifi-
cation curve analysis (Simonsohn et al., 2020), vibra-
tion of effects analysis (Patel et al., 2015), multimodel
analysis (Young and Holsteen, 2017), and the many an-
alysts approach by Silberzahn et al. (2018) discussed
above (though, in contrast with the other approaches, in
the latter approach the different choices are distributed
over research teams rather than being performed by
the same team). Multiverse-style analyses are increas-
ingly being recognized as providing crucial information,
and researchers have also proposed various extensions
and refinements. For instance, multiverse analyses have
been applied in the context of meta-analyses (Voracek
et al., 2019), suggested as an approach to deal with
different random effect structures of multi-level models
(Harder, 2020), and used in combination with so-called
explorable explanations allowing readers of a paper to
dynamically move through the multiverse (Dragicevic et
al., 2019). In addition, Liu et al. (2020) recently devel-
oped a programming tool called Boba, which helps re-
searchers to conduct and visualize multiverse analyses,
whereas others have developed specific R packages to
facilitate multiverse analyses (e.g., Masur & Scharkow,
2019; Sarma & Kay, 2019).

Teaching Multiverse Analyses

The key message of this paper is that multiverse
analyses are ideally suited to be included in laboratory
or research-methods classes. In line with its general
theme, there are a multitude of ways in which multi-
verse analyses can be incorporated in research-methods
classes, taking into account the available time, place in
the curriculum, and learning objectives. Yet, they all re-
quire two essential ingredients: a suitable dataset and
a solid (meta-)scientific background. Both of these el-


3

ements will be discussed in turn, including some guid-
ance based on personal experience.

A suitable dataset

A multiverse analysis can be conducted on newly
gathered data, or one could reuse an existing dataset.
From an educational point-of-view, the former option is
fairly comparable to a typical student research project,
though the eventual statistical analyses will be consider-
ably more elaborate, sophisticated and time-consuming.
Focusing on existing data is perhaps more unusual in
the context of a research methods class, in that it in-
volves finding a suitable dataset and isolating the hy-
potheses of interest, rather than designing a study to
test a hypothesis and collecting data. For short projects
or when students are relatively inexperienced, the in-
structor could select one or a few suitable studies, thus
assuring that students can hit the ground running. Al-
ternatively, students with a stronger background could
be given the opportunity to find a suitable study them-
selves.

Selecting a study from the literature for a multiverse
analysis comes with several challenges. One obvious
requirement for such a study is that it should have pub-
lically available data, or that the original authors share
their data for the agreed-upon purposes. This already
narrows down the pool of studies, as psychological sci-
entists are often not able or willing to share research
data (Vanpaemel et al., 2015; Wicherts et al., 2006),
though since the start of the open science movement
and its various initiatives (e.g., Morey et al., 2016),
there has been an increase in data availability (Kidwell
et al., 2016). Furthermore, even if data are available, it
does not necessarily imply that they are amenable to a
multiverse analysis. It might, for instance, be unclear
what a certain variable measures, or how a data file
is structured (Hardwicke et al., 2018). Obviously, the
multiverse-using-existing-data-approach is only feasible
when one has access to reusable data.

Another important criterion is that the study should
afford plausible alternative data-analytic pathways, tai-
lored to the students’ capabilities, to test the hypothesis
of interest. We suspect that many studies in psychol-
ogy meet this requirement, by for example focusing on
outlier detection, dichotomization of variables, covari-
ate inclusion, and so on. However, the data need to be
available at a level raw enough to allow the construction
of different pathways. If one only has access to the pro-
cessed data (e.g., after dichotomization), rather than to
the raw data, certain reasonable alternative processing
and analysis options can not be explored.

A final issue to consider is analytical reproducibility
(i.e., conducting the same analyses on the same dataset

and obtaining the same results). Ideally, one selects
a study of which the (most important) results are re-
producible, or, at minimum, that the reason for non-
reproducibility is clear. This requirement restricts the
pool of possible target studies even further, as analytic
reproducibility within psychological research has been
shown to be far from ideal. For example, Hardwicke
et al. (2018) were able to independently reproduce the
key results from only 11 out of 35 articles with reusable
data published in the journal Cognition. More surpris-
ingly, even with the help of the original authors, key
results of 13 articles could not be reproduced. Artner
et al. (2021) describe similar struggles in their attempt
to reproduce 232 key statistical claims from 46 articles,
based on the raw data, without help from the original
authors (see also Wicherts et al., 2011). Although re-
producibility is not strictly necessary in order to conduct
a multiverse analysis, it does provide some reassurance
that the data were processed and interpreted in the way
intended by the authors. For example, before conduct-
ing their multiverse analysis, Steegen et al. (2016) had
to correct various minor reporting errors in the original
data, which were discovered only by first attempting to
reproduce the results (see their supplemental materi-
als).

If the original results are not (entirely) reproducible,
but the source of the inconsistencies is easily identifi-
able (e.g., use of dummy coding rather than effect cod-
ing or correctable typos in the data file), one can still
be reasonably confident in one’s understanding of the
data-analysis, and the study might be a suitable target
for the type of research project described here. In fact,
such cases can be especially interesting from an educa-
tional point of view, as they demonstrate the project’s
relevance, and illustrate that even accomplished re-
searchers might struggle with data analysis at times.
Yet, when there is no discernable explanation for non-
reproducible results, undertaking a multiverse analysis
is potentially fruitless, especially when the discrepan-
cies are substantial, because one might have misinter-
preted the data. Of course, it is also possible that the
original authors made a mistake, but it can be time-
consuming to figure this out, and the authors might not
be able or willing to help clear up any discrepancies.

Finding a study meeting all these requirements can
be quite challenging, for students and instructors alike.
A useful starting point for this search process is the
article library on curatescience.org, which provides
the possibility of filtering articles based on the avail-
ability of data (LeBel et al., 2018). Furthermore,
one could browse repositories like the Open Science
Framework (Soderberg, 2018) for articles with open
data. Consulting recent issues of journals using badges


4

to signal articles with open data and open materi-
als (https://www.cos.io/our-services/badges; Kidwell
et al., 2016) is another excellent option. Of course,
the instructor could provide a dataset of their own or
one they are already familiar with. This could either be
the primary or only option (see Example Application be-
low), or as a back-up in case (some) students wouldn’t
be able to find a suitable dataset themselves. Based on
our experience, both of these approaches work well.

A solid (meta-)scientific background

It is important to build a solid meta-scientific frame-
work, and provide students with sufficient background
information about multiverse analyses at the begin-
ning of the project (unless they are already familiar
with these concepts from other courses). For exam-
ple, one could cover some insightful meta-scientific ar-
ticles such as Simmons et al. (2011) about researcher
degrees of freedom and their effect on the false posi-
tive rate, Gelman and Loken (2014), which describes
how data-analysis can be conceived as a garden of fork-
ing paths, and Steegen et al. (2016), which introduces
multiverse analyses. That way, students are gently in-
troduced to the concept of a multiverse analysis and
the rationale behind it. In addition, it serves to foster
critical thinking and demonstrates the relevance of such
(meta-)scientific studies, including their own.

Besides these more general meta-scientific articles,
students could benefit from several (published) exam-
ples of a multiverse analysis (e.g., Credé & Phillips,
2017; Moors & Hesselmann, 2019), to give them an
idea of what it concretely entails. This serves two pur-
poses. One, it provides guidance on how to summa-
rize and interpret the outcome of a multiverse analy-
sis (e.g., plotting a distribution of p-values, or creating
a heatmap with p-values as a function of the various
analytic pathways). Two, it stimulates students in rec-
ognizing potentially arbitrary choices, thus giving them
inspiration for their own multiverse.

Still, it can be quite challenging and overwhelming
for students to generate alternative data-analytic path-
ways. A useful source, besides the papers mentioned
above, is the work of Wicherts et al. (2016), which of-
fers a comprehensive overview of researcher degrees of
freedom. Moreover, one could also encourage students
to look for alternative pathways in related work. In par-
ticular, when the project involves re-analysis of a pub-
lished study, students could critically assess the ratio-
nale behind the article’s data-analytic choices, or exam-
ine papers cited in the target article as well as previous
publications from the same authors on the same topic.
To facilitate this, the instructor could organize a (group)
discussion about the paper in question and point out

some potentially relevant or remarkable choices. Stu-
dents could (or should) also try to reproduce the orig-
inal findings, if they haven’t done so already as part
of the process to select the target study (see above).
That way, students familiarize themselves with the tar-
get study and its data, which might give them ideas for
their eventual multiverse.

Throughout the project, strong guidance is needed. It
is critical to inform students about the expectations re-
garding a multiverse analysis, and to tackle misconcep-
tions. For one, the goal should not be to merely devise
as many paths as possible. The key is that the alter-
natives are properly motivated — quality over quantity
(Del Giudice & Gangestad, 2021). Furthermore, when
multiple students use the same dataset, it is perfectly
plausible to end up with different paths, and thus po-
tentially a seemingly-contradicting answer to the same
research question. This does not mean that someone
made a mistake, rather it shows the ubiquity of arbitrary
decisions. Clear communication about these issues is
important to avoid any confusion among students. Pro-
viding feedback to students, particularly when it comes
to the construction and implementation of the multi-
verse analysis, is also instrumental to make the project
a success. Some students may come up with poorly mo-
tivated alternative pathways, in which case the supervi-
sor should steer them in the right direction or encour-
age them to carefully (re)consider the rationale for their
choices. Feedback could also take the form of a group
discussion at a later stage of the project, to address the
different pathways students came up with and compare
their outcomes.

Though not strictly necessary, basic knowledge of R
(i.e., a programming language primarily used for data
analysis and visualization; R Core Team, 2016), or even
R Markdown (i.e., an environment to create dynamic,
reproducible reports; Allaire et al., 2016), can help stu-
dents in running their analyses and reporting their re-
sults, yet there is quite a steep learning curve. Multi-
verse analyses involve combining different options (e.g.,
different outlier criteria for different dependent vari-
ables that are transformed in various ways). Especially
when this amounts to many individual pathways, it will
be more efficient to integrate them instead of perform-
ing each analysis separately, yet that does require some
programming experience or training.

Example Application

This section describes an actual implementation of
the multiverse-in-the-classroom approach in the con-
text of an undergraduate research project (see Table
1 for a summary of the syllabus). Besides illustrating
the viability of the approach, we hope that it can in-

https://www.cos.io/our-services/badges


5

Table 1
Summary of the Syllabus for the Undergraduate Research Project Involving a Multiverse Analysis

Timing Activity Primary learning objective(s)
Week 1 General introduction Understand the topic of the thesis
Week 2 Group discussion of target article (i.e., Smith et

al., 2019)
Engage in critical thinking about the target article

Class on ethics, data sharing, and reproducibility Understand the importance of data sharing and
reproducibility

Week 3 Group discussion of Wicherts et al. (2011) Understand the importance of data sharing and
reproducibility

Group discussion of Hardwicke et al. (2018) Understand the importance of data sharing and
reproducibility

Week 4 Group discussion of Simmons et al. (2011) Recognize researchers’ degrees of freedom and
realize their impact

Group discussion of Steegen et al. (2016) Understand what a multiverse analysis entails,
how to conduct one, and see how the results
could be presented

Week 5 R intro Perform data processing, visualization, and plot-
ting in R

Week 6 RMarkdown intro Write reproducible and dynamic report
Week 7-17 Conduct multiverse analysis and write thesis (in-

cluding four opportunities for individual feed-
back)

Write a thesis incorporating relevant feedback

spire instructors, course coordinators, and program di-
rectors who would consider including multiverse analy-
ses in their research-methods classes. Of course, there
are many alternative ways to implement the multiverse-
in-the-classroom approach, taking into account aspects
such as timing, group size, students’ prior knowledge,
learning objectives, and so on.

The project took place in the 2020 spring semester
with the first author as the instructor, and was inspired
by a course jointly-taught by both authors in previous
years. It was embedded in a course called Bachelorpro-
ject, which spans 17 weeks, and is organized for stu-
dents in the final year of their undergraduate psychol-
ogy program. These students have already followed sev-
eral statistics and methods courses, typically amounting
to 30 European Credits (EC).

The Bachelorproject represents a study load of 15
EC, during which students need to write an individual
thesis describing the outcomes of a research project.
The course is mandatory for all undergraduate psy-
chology students, but they are divided in small groups
each with a different instructor and a different research
topic (e.g., mental health in university students, exam-
ining people’s interest in psychedelics, individual differ-
ences in the attentional bias towards emotion,....). The
multiverse-in-the-classroom approach described here
was used in one such group, consisting of eight stu-
dents. Students ultimately had to write a thesis about

their project following the typical Introduction-Method-
Results-Discussion structure. The resulting products
were evaluated on the same criteria as other research
projects within the course by two independent graders
(including the instructor). In addition, the instructor
also graded the process as a whole.

The project involved the re-analysis of an existing
dataset, which was provided by the instructor. The se-
lected target article was a study by Smith et al. (2019),
examining the influence of acute stress on semantic
memory retrieval. Smith et al. found that participants
performed better on an open-ended trivia questionnaire
after experiencing acute stress, and when they showed a
stronger stress response. The study met all of the above
criteria: reusable processed data were available in de-
tailed enough format (the underlying raw data were, at
the time, available upon request, and are now publically
available; see Smith, 2020); the results were repro-
ducible (except for one easily-identifiable deviation);
and the data processing and analysis steps afforded var-
ious alternative pathways. In a first meeting with the
students of +- 1 hour the general topic of the thesis was
introduced by the instructor. This included a short de-
scription of the target study as well as a brief introduc-
tion to the concept of a multiverse analysis. In the next
meeting (+- 2 hours), the target article was examined
in detail through a journal club, in which the instructor
led the discussion. Students were expected to read the


6

article in advance, and were encouraged to pay special
attention to methodological and data-analytical choices.
Furthermore, any aspects of the paper that were un-
clear to the students were addressed during the meet-
ing. From this point onwards, students were encour-
aged to start thinking about alternative analysis path-
ways, inspired by the group discussion, through search-
ing for literature around the same topic, etc.

The third meeting (+- 1.5 hours) consisted of an in-
teractive lecture on data sharing (including ethical is-
sues such as protecting the privacy of participants), re-
producibility, and scientific integrity (including a discus-
sion of questionable research practices). The idea is to
introduce some concepts that are directly relevant for
their thesis (e.g., reproducibility) as well as to give stu-
dents a broad overview of meta-scientific topics.

The next four meetings (+- 2 hours each) involved
journal clubs around articles on, respectively, data shar-
ing and reproducibility (i.e., Hardwicke et al., 2018;
Wicherts et al., 2011), researcher degrees of freedom
(i.e., Simmons et al., 2011), and multiverse analysis
(Steegen et al., 2016). Each time, two students led
the discussion, but everyone was supposed to read the
paper in advance and take part in the discussion. The
instructor intervened sporadically if something was un-
clear or to point out relevant aspects. The purpose of
these meetings was three-fold. First, it served to build
a solid meta-scientific background, and to give students
inspiration for their own multiverse analysis. Second,
writing the introduction section for a thesis about multi-
verse analyses can be challenging as it differs somewhat
from that of a “regular” empirical study. Hence, dis-
cussing a few key articles puts them on the right track.
Finally, these journal clubs were also meant to improve
students’ presentation and discussion skills.

The four final collective meetings (+- 2 hours each)
served to introduce the students to R and R Markdown.
Students were guided through a custom-made script
showing how to read in data, transform and combine
datasets, use conditional statements and loops, make
graphs, and perform all the analyses that were used in
the target paper. The script already used the data from
the target paper to make sure that students understood
what the variables meant. Even though the script in-
troduced all the procedures needed to reproduce the
results of the target paper, they were illustrated using
different variables. As a take-home exercise, students
then tried to independently reproduce the key outcomes
of the target paper using R, which they later embedded
in an R Markdown document. This guaranteed that all
students were (eventually) able to follow the processing
and analysis pathways outlined by Smith et al. (2019).
Note that it wasn’t required from students to write their

thesis in R Markdown, or even use R for their eventual
analyses. In the end, all eight students conducted their
multiverse analysis in R, and two of them wrote their
final paper using R Markdown.

From that point onwards, each student had four in-
dividual feedback meetings with the instructor in which
their research proposal (i.e., rationale for the different
pathways), analysis plan, code, results, and write-up
were discussed. Seventeen weeks after the start of the
course, they were expected to submit their final thesis
and accompanying analysis script.

An exhaustive overview of all the alternatives stu-
dents came up with would take us too far, but the
following examples serve to illustrate the versatility
of a multiverse approach to (under)graduate research
projects. For instance, Smith and colleagues consid-
ered responses to a trivia questionnaire as being cor-
rect if they completely matched the correct answer, were
misspelled but easily extrapolated, were inappropriately
pluralized or capitalized, were common synonyms of
the correct answer, or if the first four or more letters
matched the correct answer. However, students con-
sidered various reasonable alternatives to this coding
scheme, such as treating incomplete responses as incor-
rect, regardless of how many letters matched the correct
answer (Boere, 2020; De Jong, 2020; Hoogeterp, 2020;
Kraaijenbrink, 2020; Kuipers, 2020; Van Dijk, 2020;
Van Rijn, 2020; Van Wijk, 2020). Exploring this vari-
ation was only possible because students had access to
the raw data (i.e., responses of each participant to each
question), as the processed data only contained accu-
racy scores per participant based on the original coding
scheme.

Furthermore, some students redefined the construct
reactivity to stress. In the original paper, it was opera-
tionalized as the change in cortisol levels relative to a
baseline, whereas students also considered the change
in the psychological stress response measured through
the State-Trait Inventory for Cognitive and Somatic Anx-
iety, described in Grös et al., 2007 (Hoogeterp, 2020;
Van Rijn, 2020). Additionally, some students added co-
variates to the analyses (e.g., age; Kraaijenbrink, 2020),
or removed covariates (i.e., gender; Hoogeterp, 2020;
Kuipers, 2020; Van Wijk, 2020). Yet other pathways in-
volved imputing missing values (Kraaijenbrink, 2020),
or removing observations (e.g., excluding participants
who did not display an elevated cortisol level after
stress-induction; Boere, 2020; Van Dijk, 2020).

Although there was some overlap in data-analytic
choices between students, each individual project fea-
tured unique pathways, which were based on existing
literature (e.g., Merz et al., 2016), statistical arguments,
and/or a critical appraisal of the original study. The


7

breadth of options is illustrated in Figure 1, showing the
distribution of p-values for Smith and colleagues’ main
finding resulting from each students’ multiverse analy-
sis (see https://osf.io/rtayk/ for the underlying R code).
On average, students’ multiverse analyses comprised 78
paths (minimum 18, maximum 160).

This outcome highlights the feasibility and potential
of undergraduate research projects incorporating mul-
tiverse analyses. We hasten to add that it does not
serve as a way to evaluate the robustness of Smith and
colleagues’ main finding, because certain data-analytic
choices explored by the students were insufficiently mo-
tivated. The work done by the students, however, offers
an ideal starting point for a more thorough multiverse
analysis of the finding (see Heyman et al., 2022).

Benefits of the Multiverse-in-the-Classroom
Approach

Incorporating multiverse analyses in (un-
der)graduate research projects (or other courses) has
many benefits for students as well as (psychological)
science in general.

One strength of the multiverse-in-the-classroom ap-
proach is that it can be flexibly adapted to the course’s
learning objectives, classroom size, time frame, back-
ground of the students, and so on. For instance, one
can conduct a multiverse analysis reusing an existing
dataset, like in the example described above, or one
could use newly gathered data. Because the latter op-
tion involves an additional step compared to a typical
research project, it is well-suited for situations where
something extra is required from students (e.g., stu-
dents enrolled in an honours program), whereas the
former option can be applied more broadly. Importantly,
as there is no need to design a new study, or to collect
any data, the students’ overall time-investment is com-
parable to that of a regular research project. Moreover,
an adapted version of such a multiverse project can be
used in a more statistics-oriented course rather than a
research-methods-oriented course. Both authors have
used a similar approach as part of a 13-week gradu-
ate statistics course within a psychology research master
track for a number of years. There, the +- 40 students
were instructed to write a report about the multiverse
analysis they conducted in small groups using existing
data. Because these graduate students are well-versed
in statistical analyses and programming, and due to the
group-nature of the project, it can easily fit in a 13-week
course as compared to the 17-week undergraduate re-
search project described above.

As a multiverse project does not necessarily require
collecting new data, one could effectively save a lot of
resources (i.e., time of participants and students, money

to pay participants,. . . ). Therefore, it is ideal for situa-
tions where collecting new data is impractical or impos-
sible, for instance, because special equipment or exper-
tise is required, getting ethical approval takes too much
time, or when one does not have access to a partici-
pant pool or money to pay participants. This proved to
be especially relevant in the lockdown situation due to
COVID-19 in spring 2020. Indeed, the lockdown mea-
sures, which involved suspending all in-vivo data col-
lection and required classes to be taught online, had
very little impact on the project discussed above, with
all students meeting the original deadline.

The flexibility also applies to the selection of a target
study. Each student could focus on a separate paper,
or, as was the case in the example above, each student
independently construes their own data-analytic path-
ways for the same data set. The latter option is compa-
rable to the many-analysts-one-dataset approach used
by Silberzahn et al. (2018), augmented with the ad-
ditional requirement that every analyst (i.e., student)
should consider several plausible alternatives rather
than a single one. We believe the many-multiverses-
one-dataset option is the most interesting of the two, be-
cause any given multiverse will rarely (if ever) exhaust
all reasonable options, hence it makes sense to adopt
a form of data-analytic triangulation. In other words,
there is a multiverse of multiverse analyses, which can
be captured to some degree by asking different students
to focus (semi-)independently on the same overarching
topic. Although it is unrealistic to expect that every in-
dividual project will be of the same quality, it can be
enlightening to see the variability, or lack thereof, in
outcomes. Indeed, as Figure 1 shows, it is possible that
some multiverse analyses suggest the effect in question
to be quite robust, whereas others suggest the effect to
be rather fragile.

Despite bridging an important gap in psychological
science by showing the robustness or fragility of find-
ings, multiverse analyses are relatively rare, owing per-
haps to their apparent complexity and/or their per-
ceived lack of novelty. In that sense, one can draw
a parallel to replication studies: once rare in psychol-
ogy (Makel et al., 2012), they are now becoming more
mainstream through various initiatives (see Zwaan et
al., 2018). Moreover, Frank and colleagues (Frank &
Saxe, 2012; Hawkins et al., 2018) promoted conducting
replication studies in student research projects (see also
Grahe et al., 2012; Wagge et al., 2019). The current
proposal seeks to accomplish a similar goal for multi-
verse analyses. Note that both approaches can comple-
ment each other, in that one can conduct a multiverse
analysis on replication data, either as part of the same
project or across different iterations of the course (e.g.,


8

Figure 1. Distribution of P-values for Smith and Colleagues’ Main Finding Resulting from Each Students’ Multiverse
Analysis.

Histogram of Student 1 (N = 160)

p−value

F
re

q
u

e
n

cy

0.0 0.2 0.4 0.6 0.8 1.0

0

5

10

15

20

Histogram of Student 2 (N = 140)

p−value

F
re

q
u

e
n

cy

0.0 0.2 0.4 0.6 0.8 1.0

0

10

20

30

Histogram of Student 3 (N = 48)

p−value
F

re
q

u
e

n
cy

0.0 0.2 0.4 0.6 0.8 1.0

0

2

4

6

8

10

Histogram of Student 4 (N = 36)

p−value

F
re

q
u

e
n

cy

0.0 0.2 0.4 0.6 0.8 1.0

0

5

10

15

20

25

Histogram of Student 5 (N = 78)

p−value

F
re

q
u

e
n

cy

0.0 0.2 0.4 0.6 0.8 1.0

0

2

4

6

8

10

Histogram of Student 6 (N = 18)

p−value

F
re

q
u

e
n

cy

0.0 0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

6

Histogram of Student 7 (N = 110)

p−value

F
re

q
u

e
n

cy

0.0 0.2 0.4 0.6 0.8 1.0

0

5

10

15

Histogram of Student 8 (N = 36)

p−value

F
re

q
u

e
n

cy

0.0 0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

6

Note. The red dotted line indicates a p-value of .05. The figure in brackets indicates the number of pathways in each student’s
multiverse. Remark that not all pathways were properly motivated, so these results should not be considered an evaluation of
the robustness of Smith and colleagues’ main finding.

one group conducting the replication study, and another
group performing multiverse analyses, possibly the fol-
lowing semester or academic year).

Another major benefit of adopting the multiverse-in-
the-classroom approach, besides its flexibility, is that
it gives students the opportunity to make a tangible
contribution to psychological science, something that
might not always occur with (under)graduate research
projects. Moreover, under some conditions, the work
done by students and instructor(s) can be solidified in
a joint research paper, suitable for publication, as was
the case for the example application. The classroom
phase then serves as an elicitation step of possible rea-
sonable variations, which, in a second step, are evalu-
ated for adoption in a multiverse analysis by a domain
expert. Such a two-step multiverse analysis - where
data-analytical pathways are first elicited from different

sources which then get synthesized and applied to the
data - can even yield more comprehensive and less bi-
ased results compared to a regular multiverse analysis.

The multiverse in the classroom approach also pro-
vides ample pedagogical opportunities. Conducting a
multiverse analysis typically requires students to per-
form a number of different statistical analyses. It
thereby addresses an often-heard complaint from psy-
chology students regarding the relevance of statistics.
Even though most research projects involve the prac-
tical application of statistics, it rarely is a focal point
(in some cases, the analysis part might actually be con-
sidered a nuisance). Furthermore, a multiverse project
may help students to better understand the intricacies
of statistical analyses. Importantly, it is not a purely
methodological or statistical project, as it also involves
an empirical research question such as “to what extent


9

does power posing have an effect on hormone levels”
or “what is the effect of stress on semantic memory”.
Hence, there is still the thrill of discovery, which helps
fuel students’ engagement.

At a more abstract level, a multiverse project also al-
lows students to gain first-hand experience with the im-
portance of open science, reproducibility, proper docu-
mentation of data, and so on. In addition, it teaches
them to critically evaluate the rationale behind a study,
especially its methodology, and it gives them an idea
about the imperfections of psychological science. As fu-
ture consumers of research, it is relevant to recognize
that arbitrary decisions abound in research, and to real-
ize their consequences. A multiverse-in-the-classroom
project really drives this point home. Moreover, for
those students aspiring to become producers of re-
search, it is paramount to adopt responsible research
practices, such as assessing the robustness of key out-
comes. In fact, students spontaneously mentioned these
aspects in an informal evaluation of the course (e.g., “it
has really changed my perspective on research. . . and
sparked my interest” or “it was interesting to see what
happens to the p-values when conducting different anal-
yses”). One could argue that typical research projects,
in which students are required to develop a new hy-
pothesis, design a new study and collect data, teaches
them bad habits or even questionable research practices
as it is rather difficult to accomplish all this in a rigorous
manner within the, usually limited, timeframe.

Challenges and Objections

A multiverse-in-the-classroom project can involve de-
signing a new study, but that might not be feasible
within the confines of a single semester, because de-
veloping and conducting such an analysis in itself is
rather time-consuming. The option involving existing
data is more readily applicable, yet one potential objec-
tion is that such a project does not cover the entire em-
pirical cycle. Although a multiverse project requires a
thorough literature search, motivating a research ques-
tion, and a comprehensive data-analysis of which the
results ought to be interpreted and discussed, students
may miss out on learning specific skills (e.g., regarding
data collection). When the development of such skills
is a central objective of the course, one might need to
look for a creative solution. For instance, in the exam-
ple application described above, the absence of a data
collection phase was addressed by having students re-
code the participants’ responses to the trivia question-
naire. Note though that one can raise similar concerns
about more widely-applied projects such as those in-
volving online data collection. In fact, there is often
quite a bit of variability in what is demanded of students

across projects within the same (under)graduate pro-
gram. More fundamentally, accreditation guidelines for
research projects in psychology often explicitly mention
the possibility to conduct secondary data analyses (e.g.,
Australian Psychology Accreditation Council, 2019; The
British Psychological Society, 2019).

Another challenge of conducting a multiverse anal-
ysis is that it requires combining various alternatives
(e.g., three different outlier criteria and four different
data transformations yield 12 outcomes). In principle,
every analysis can be conducted separately, but this be-
comes unwieldy quite quickly, so one could use a script
to increase efficiency. Depending on the students’ back-
ground, the latter option might prove to be unattainable
unless one would include some programming classes in
the curriculum (e.g., teaching the language R).

Another potential hurdle for students (and instruc-
tors alike) revolves around the interpretation of a mul-
tiverse analysis. In contrast to a typical research project,
one does not end up with a single outcome, but with a
collection of outcomes. This elicits questions such as
when should a finding be considered robust, when is
it presumably a fluke, and how should the results be
summarized and presented. Indeed, published papers
involving multiverse analyses typically eyeball the pat-
tern of results, for instance, by plotting the distribution
of p-values. Steegen et al. (2016) tentatively suggest
to focus inference on the average p-value, but beyond
that, there is little guidance as to how to synthesize a
multiverse analysis (but see Simonsohn et al., 2020).

A more fundamental objection could be that ap-
proaches such as pre-registration are more desirable,
so that students should spend their time learning about
pre-registration rather than about multiverse analyses.
Pre-registration entails that one specifies the analysis
plan before knowing its results, if possible even before
starting the data collection (Nosek et al., 2018). As
such, pre-registration makes transparent which choices
could be data-driven and which are not. However, if a
researcher pre-registers one or few analytic pathways,
one is still left in the dark about how robust or fragile
the effect is, or about whether certain choices are more
critical than others (for a similar argument see Steegen
et al., 2016). To that end, one would need to conduct
a multiverse-style analysis. Of course, one could pre-
register a multiverse analysis to combine the strengths
of both approaches, but this increases the complexity of
the project.

Finally, one should be cautious that students do not
completely lose faith in (psychological) science. Indeed,
whereas the goal is to make students critical consumers
of scientific output, and, as a result, careful producers
of scientific output, they should not come away with


10

the idea that science is inherently flawed or that all re-
searchers are opportunistic or fraudulent. Along the
same lines, students should be made aware that not
all hypotheses can necessarily be tested in a myriad
of ways. Based on the informal evaluation mentioned
above, students did not come away with such incorrect
notions, but future research on the effectiveness of the
multiverse-in-the-classroom approach should determine
whether this is indeed the case.

Conclusion

The present paper proposes to implement multiverse
analyses in student research projects, and provides a
practical demonstration that we hope will encourage,
help and inspire instructors to adopt it in their own
courses. Because multiverse analyses speak to the ro-
bustness of a (published) finding, it can fulfill an im-
portant need in psychological science, thus making the
results of such projects truly relevant. Furthermore, it
is an excellent way to put statistics in practice, it fos-
ters critical thinking, and raises awareness about the
prevalence and consequences of arbitrary data-analytic
decisions. Finally, the flexibility of the multiverse-in-
the-classroom approach makes it suitable for all kinds
of projects, even when data collection is not feasible.

Author Contact

Correspondence concerning this article should be
addressed to Tom Heyman, Methodology and Statis-
tics Unit, Institute of Psychology, Leiden University,
Wassenaarseweg 52, 2333 AK Leiden, The Netherlands.
E-mail: t.d.p.heyman@fsw.leidenuniv.nl. ORCID TH
0000-0003-0565-441X WV 0000-0002-5855-3885.

Conflict of Interest and Funding

The authors declare that there were no conflicts of in-
terest or specific funding with respect to the authorship
or the publication of this article.

Author Contributions

Both authors conceptualized the idea. TH wrote the
first draft of the manuscript and WV provided extensive
feedback. Both authors approved the final version for
submission.

Open Science Practices

This article earned the Open Materials badge for
making the materials openly available. It has been ver-
ified that the analysis reproduced the results presented
in the article. The entire editorial process, including the
open reviews, are published in the online supplement.

References

Allaire, J., Cheng, J., Xie, Y., McPherson, J., Chang, W.,
Allen, J., Wickham, H., Atkins, A., & Hyndman,
R. (2016). rmarkdown: Dynamic Documents for
R [R package version 1.6]. https : / / CRAN . R -
project.org/package=rmarkdown

Artner, R., Verliefde, T., Steegen, S., Gomes, S., Traets,
F., Tuerlinckx, F., & Vanpaemel, W. (2021). The
reproducibility of statistical results in psycho-
logical research: An investigation using unpub-
lished raw data. Psychological Methods, 26(5),
527–546. https : / / doi . org / 10 . 1037 /
met0000365

Australian Psychology Accreditation Council. (2019).
Accreditation Standards for Psychology Pro-
grams Evidence Guide (Version 1.2). https : / /
psychologycouncil . org . au / wp - content /
uploads/2021/03/APAC-Evidence-guide_v1.2.
pdf

Bishop, D. (2016). Open research practices: Unin-
tended consequences and suggestions for avert-
ing them.(commentary on the peer reviewers’
openness initiative). Royal Society Open Science,
3(4), 160109. https://doi.org/10.1098/rsos.
160109

Boere, R. (2020). Het belang van reproduceerbare en
transparante wetenschap: Een multiverse be-
nadering. [Unpublished bachelor’s thesis]. Lei-
den University.

Carney, D. R., Cuddy, A. J., & Yap, A. J. (2010). Power
posing: Brief nonverbal displays affect neuroen-
docrine levels and risk tolerance. Psychological
Science, 21(10), 1363–1368. https://doi.org/
10.1177/0956797610383437

Credé, M., & Phillips, L. A. (2017). Revisiting the power
pose effect: How robust are the results reported
by Carney, Cuddy, and Yap (2010) to data ana-
lytic decisions? Social Psychological and Person-
ality Science, 8(5), 493–499. https://doi.org/
10.1177/1948550617714584

De Jong, S. (2020). Het effect van stress op het seman-
tisch geheugen: Een multiverse benadering. [Un-
published bachelor’s thesis]. Leiden University.

Del Giudice, M., & Gangestad, S. W. (2021). A traveler’s
guide to the multiverse: Promises, pitfalls, and
a framework for the evaluation of analytic deci-
sions. Advances in Methods and Practices in Psy-

https://CRAN.R-project.org/package=rmarkdown
https://CRAN.R-project.org/package=rmarkdown
https://doi.org/10.1037/met0000365
https://doi.org/10.1037/met0000365
https://psychologycouncil.org.au/wp-content/uploads/2021/03/APAC-Evidence-guide_v1.2.pdf
https://psychologycouncil.org.au/wp-content/uploads/2021/03/APAC-Evidence-guide_v1.2.pdf
https://psychologycouncil.org.au/wp-content/uploads/2021/03/APAC-Evidence-guide_v1.2.pdf
https://psychologycouncil.org.au/wp-content/uploads/2021/03/APAC-Evidence-guide_v1.2.pdf
https://doi.org/10.1098/rsos.160109
https://doi.org/10.1098/rsos.160109
https://doi.org/10.1177/0956797610383437
https://doi.org/10.1177/0956797610383437
https://doi.org/10.1177/1948550617714584
https://doi.org/10.1177/1948550617714584


11

chological Science, 4(1), 1–15. https://doi.org/
10.1177/2515245920954925

Dragicevic, P., Jansen, Y., Sarma, A., Kay, M., & Cheva-
lier, F. (2019). Increasing the transparency
of research papers with explorable multiverse
analyses. Proceedings of the 2019 CHI Confer-
ence on Human Factors in Computing Systems,
1–15.

Elson, M. (2016). Flexibility in methods & measures of
social science. https : / / www. flexiblemeasures .
com/

Frank, M. C., & Saxe, R. (2012). Teaching replica-
tion. Perspectives on Psychological Science, 7(6),
600–604. https : / / doi . org / 10 . 1177 /
1745691612460686

Gelman, A., & Loken, E. (2014). The statistical crisis in
science. American Scientist, 102(6), 460–465.

Grahe, J. E., Reifman, A., Hermann, A. D., Walker, M.,
Oleson, K. C., Nario-Redmond, M., & Wiebe,
R. P. (2012). Harnessing the undiscovered re-
source of student research projects. Perspectives
on Psychological Science, 7(6), 605–607. https:
//doi.org/10.1177/1745691612459057

Grös, D. F., Antony, M. M., Simms, L. J., & McCabe, R. E.
(2007). Psychometric properties of the State-
Trait Inventory for Cognitive and Somatic Anx-
iety (STICSA): Comparison to the State-Trait
Anxiety Inventory (STAI). Psychological Assess-
ment, 19(4), 369–381. https : / / doi . org / 10 .
1037/1040-3590.19.4.369

Harder, J. A. (2020). The multiverse of methods: Ex-
tending the multiverse analysis to address data-
collection decisions. Perspectives on Psychologi-
cal Science, 15(5), 1158–1177. https://doi.org/
10.1177/1745691620917678

Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nil-
sonne, G., Banks, G. C., Kidwell, M. C., Hofelich
Mohr, A., Clayton, E., Yoon, E. J., Henry Tessler,
M., Lenne, R. L., Altman, S., Long, B., & Frank,
M. C. (2018). Data availability, reusability, and
analytic reproducibility: Evaluating the impact
of a mandatory open data policy at the jour-
nal Cognition. Royal Society Open Science, 5(8),
180448. https://doi.org/10.1098/rsos.180448

Hawkins, R. X., Smith, E. N., Au, C., Arias, J. M., Cata-
pano, R., Hermann, E., Keil, M., Lampinen, A.,
Raposo, S., Reynolds, J., Salehi, S., Salloum, J.,
Tan, J., & Frank, M. C. (2018). Improving the
replicability of psychological science through
pedagogy. Advances in Methods and Practices in
Psychological Science, 1(1), 7–18. https : / / doi .
org/10.1177/2515245917740427

Heyman, T., Boere, R., de Jong, S., Hoogeterp, L., Kraai-
jenbrink, J., Kuipers, C., van Dijk, M., van Rijn,
L., & van Wijk, T. (2022). The effect of stress on
semantic memory retrieval: A multiverse anal-
ysis. Collabra: Psychology, 8(1), 35745. https :
//doi.org/10.1525/collabra.35745

Hoogeterp, L. (2020). Het effect van stress op het seman-
tisch geheugen: Een multiverse benadering. [Un-
published bachelor’s thesis]. Leiden University.

Kalokerinos, E. K., Erbas, Y., Ceulemans, E., & Kup-
pens, P. (2019). Differentiate to regulate:
Low negative emotion differentiation is asso-
ciated with ineffective use but not selection of
emotion-regulation strategies. Psychological Sci-
ence, 30(6), 863–879. https : / / doi . org / 10 .
1177/0956797619838763

Kidwell, M. C., Lazarević, L. B., Baranski, E., Hard-
wicke, T. E., Piechowski, S., Falkenberg, L.-S.,
Kennett, C., Slowik, A., Sonnleitner, C., Hess-
Holden, C., Errington, T. M., Fiedler, S., &
Nosek, B. A. (2016). Badges to acknowledge
open practices: A simple, low-cost, effective
method for increasing transparency. PLoS Biol-
ogy, 14(5), e1002456. https : / / doi . org / 10 .
1371/journal.pbio.1002456

Kierniesky, N. C. (2005). Undergraduate research in
small psychology departments: Two decades
later. Teaching of Psychology, 32(2), 84–90.
https://doi.org/10.1207/s15328023top3202_
1

Kraaijenbrink, J. (2020). The effect of stress on the se-
mantic memory: A multiverse approach. [Unpub-
lished bachelor’s thesis]. Leiden University.

Kuipers, C. (2020). The effect of stress on the semantic
memory: A multiverse approach. [Unpublished
bachelor’s thesis]. Leiden University.

LeBel, E. P., McCarthy, R. J., Earp, B. D., Elson, M., &
Vanpaemel, W. (2018). A unified framework to
quantify the credibility of scientific findings. Ad-
vances in Methods and Practices in Psychological
Science, 1(3), 389–402. https : / / doi . org / 10 .
1177/2515245918787489

Liu, Y., Kale, A., Althoff, T., & Heer, J. (2020). Boba:
Authoring and visualizing multiverse analyses.
IEEE Transactions on Visualization and Com-
puter Graphics, 27(2), 1753–1763. https://doi.
org/10.1109/TVCG.2020.3028985

Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Repli-
cations in psychology research: How often do
they really occur? Perspectives on Psychological
Science, 7(6), 537–542. https : / / doi . org / 10 .
1177/1745691612460688

https://doi.org/10.1177/2515245920954925
https://doi.org/10.1177/2515245920954925
https://www.flexiblemeasures.com/
https://www.flexiblemeasures.com/
https://doi.org/10.1177/1745691612460686
https://doi.org/10.1177/1745691612460686
https://doi.org/10.1177/1745691612459057
https://doi.org/10.1177/1745691612459057
https://doi.org/10.1037/1040-3590.19.4.369
https://doi.org/10.1037/1040-3590.19.4.369
https://doi.org/10.1177/1745691620917678
https://doi.org/10.1177/1745691620917678
https://doi.org/10.1098/rsos.180448
https://doi.org/10.1177/2515245917740427
https://doi.org/10.1177/2515245917740427
https://doi.org/10.1525/collabra.35745
https://doi.org/10.1525/collabra.35745
https://doi.org/10.1177/0956797619838763
https://doi.org/10.1177/0956797619838763
https://doi.org/10.1371/journal.pbio.1002456
https://doi.org/10.1371/journal.pbio.1002456
https://doi.org/10.1207/s15328023top3202_1
https://doi.org/10.1207/s15328023top3202_1
https://doi.org/10.1177/2515245918787489
https://doi.org/10.1177/2515245918787489
https://doi.org/10.1109/TVCG.2020.3028985
https://doi.org/10.1109/TVCG.2020.3028985
https://doi.org/10.1177/1745691612460688
https://doi.org/10.1177/1745691612460688


12

Masur, P., & Scharkow, M. (2019). specr: Statistical func-
tions for conducting specification curve analyses.
https://github.com/masurp/specr

Merz, C. J., Dietsch, F., & Schneider, M. (2016). The
impact of psychosocial stress on conceptual
knowledge retrieval. Neurobiology of Learning
and Memory, 134, 392–399. https : / / doi . org /
10.1016/j.nlm.2016.08.020

Moors, P., & Hesselmann, G. (2019). Unconscious arith-
metic: Assessing the robustness of the results
reported by Karpinski, Briggs, and Yale (2018).
Consciousness and Cognition, 68, 97–106. https:
//doi.org/10.1016/j.concog.2019.01.003

Morey, R. D., Chambers, C. D., Etchells, P. J., Harris,
C. R., Hoekstra, R., Lakens, D., Lewandowsky,
S., Morey, C. C., Newman, D. P., Schönbrodt,
F. D., Vanpaemel, W., Wagenmakers, E.-J.,
& Zwaan, R. A. (2016). The Peer Review-
ers’ Openness Initiative: Incentivizing open re-
search practices through peer review. Royal So-
ciety Open Science, 3(1), 150547. https://doi.
org/10.1098/rsos.150547

Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mel-
lor, D. T. (2018). The preregistration revolu-
tion. Proceedings of the National Academy of Sci-
ences, 115(11), 2600–2606. https : / / doi . org /
10.1073/pnas.1708274114

Patel, C. J., Burford, B., & Ioannidis, J. P. (2015). Assess-
ment of vibration of effects due to model spec-
ification can demonstrate the instability of ob-
servational associations. Journal of Clinical Epi-
demiology, 68(9), 1046–1058. https://doi.org/
10.1016/j.jclinepi.2015.05.029

R Core Team. (2016). R: A language and environment for
statistical computing. R Foundation for Statisti-
cal Computing. Vienna, Austria. https://www.
R-project.org/

Sarma, A., & Kay, M. (2019). multiverse: Explorable Mul-
tiverse data analysis and reports in R [R package
version 0.1.4]. https : / / CRAN . R - project . org /
package=multiverse

Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi,
P., Aust, F., Awtrey, E., Bahnik, S., Bai, F., Ban-
nard, C., Bonnier, E., Carlsson, R., Cheung, F.,
Christensen, G., Clay, R., Craig, M. A., Dalla
Rosa, A., Dam, L., Evans, M. H., Flores Cer-
vantes, I., Nosek, B. A., et al. (2018). Many an-
alysts, one data set: Making transparent how
variations in analytic choices affect results. Ad-
vances in Methods and Practices in Psychological
Science, 1(3), 337–356. https : / / doi . org / 10 .
1177/2515245917747646

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011).
False-positive psychology: Undisclosed flexibil-
ity in data collection and analysis allows pre-
senting anything as significant. Psychological
Science, 22(11), 1359–1366. https://doi.org/
10.1177/0956797611417632

Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020).
Specification curve analysis. Nature Human Be-
haviour, 4, 1208–1214. https : / / doi . org / 10 .
1038/s41562-020-0912-z

Smith, A. M. (2020). Acute stress enhances general-
knowledge semantic memory. https://doi.org/
10.17605/OSF.IO/EQ8SY

Smith, A. M., Hughes, G. I., Davis, F. C., & Thomas,
A. K. (2019). Acute stress enhances general-
knowledge semantic memory. Hormones and be-
havior, 109, 38–43. https://doi.org/10.1016/j.
yhbeh.2019.02.003

Soderberg, C. K. (2018). Using OSF to share data:
A step-by-step guide. Advances in Methods
and Practices in Psychological Science, 1(1),
115–120. https : / / doi . org / 10 . 1177 /
2515245918757689

Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel,
W. (2016). Increasing transparency through a
multiverse analysis. Perspectives on Psychologi-
cal Science, 11(5), 702–712. https : / / doi . org /
10.1177/1745691616658637

The British Psychological Society. (2019). Standards for
the accreditation of undergraduate, conversion
and integrated Masters programmes in psychol-
ogy. https : / / www . psychologicalsociety . ie /
source/Undergraduate%5C%20Accreditation%
5C % 20Guidelines % 5C % 202019update _ file _
674.pdf

Van Dijk, M. (2020). Acute stress enhances semantic
memory: The robustness of the findings of Smith,
Hughes, Davis, and Thomas (2019). [Unpub-
lished bachelor’s thesis]. Leiden University.

Van Rijn, L. (2020). The effect of stress on semantic mem-
ory: A multiverse approach. [Unpublished bach-
elor’s thesis]. Leiden University.

Van Wijk, T. (2020). Het effect van stress op semantisch
geheugen: Een multiverse benadering. [Unpub-
lished bachelor’s thesis]. Leiden University.

Vanpaemel, W., Vermorgen, M., Deriemaecker, L., &
Storms, G. (2015). Are we wasting a good cri-
sis? The availability of psychological research
data after the storm. Collabra, 1(1), 3. https :
//doi.org/10.1525/collabra.13

Voracek, M., Kossmeier, M., & Tran, U. S. (2019).
Which data to meta-analyze, and how? A
specification-curve and multiverse-analysis ap-

https://github.com/masurp/specr
https://doi.org/10.1016/j.nlm.2016.08.020
https://doi.org/10.1016/j.nlm.2016.08.020
https://doi.org/10.1016/j.concog.2019.01.003
https://doi.org/10.1016/j.concog.2019.01.003
https://doi.org/10.1098/rsos.150547
https://doi.org/10.1098/rsos.150547
https://doi.org/10.1073/pnas.1708274114
https://doi.org/10.1073/pnas.1708274114
https://doi.org/10.1016/j.jclinepi.2015.05.029
https://doi.org/10.1016/j.jclinepi.2015.05.029
https://www.R-project.org/
https://www.R-project.org/
https://CRAN.R-project.org/package=multiverse
https://CRAN.R-project.org/package=multiverse
https://doi.org/10.1177/2515245917747646
https://doi.org/10.1177/2515245917747646
https://doi.org/10.1177/0956797611417632
https://doi.org/10.1177/0956797611417632
https://doi.org/10.1038/s41562-020-0912-z
https://doi.org/10.1038/s41562-020-0912-z
https://doi.org/10.17605/OSF.IO/EQ8SY
https://doi.org/10.17605/OSF.IO/EQ8SY
https://doi.org/10.1016/j.yhbeh.2019.02.003
https://doi.org/10.1016/j.yhbeh.2019.02.003
https://doi.org/10.1177/2515245918757689
https://doi.org/10.1177/2515245918757689
https://doi.org/10.1177/1745691616658637
https://doi.org/10.1177/1745691616658637
https://www.psychologicalsociety.ie/source/Undergraduate%5C%20Accreditation%5C%20Guidelines%5C%202019update_file_674.pdf
https://www.psychologicalsociety.ie/source/Undergraduate%5C%20Accreditation%5C%20Guidelines%5C%202019update_file_674.pdf
https://www.psychologicalsociety.ie/source/Undergraduate%5C%20Accreditation%5C%20Guidelines%5C%202019update_file_674.pdf
https://www.psychologicalsociety.ie/source/Undergraduate%5C%20Accreditation%5C%20Guidelines%5C%202019update_file_674.pdf
https://doi.org/10.1525/collabra.13
https://doi.org/10.1525/collabra.13


13

proach to meta-analysis. Zeitschrift für Psycholo-
gie, 227(1), 64–82. https://doi.org/10.1027/
2151-2604/a000357

Wagge, J. R., Baciu, C., Banas, K., Nadler, J. T., Schwarz,
S., Weisberg, Y., IJzerman, H., Legate, N., &
Grahe, J. (2019). A demonstration of the Col-
laborative Replication and Education Project:
Replication attempts of the red-romance effect.
Collabra: Psychology, 5(1), 5. https://doi.org/
10.1525/collabra.177

Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Will-
ingness to share research data is related to the
strength of the evidence and the quality of re-
porting of statistical results. PloS ONE, 6(11),
e26828. https : / / doi . org / 10 . 1371 / journal .
pone.0026828

Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D.
(2006). The poor availability of psychological
research data for reanalysis. American Psycholo-

gist, 61(7), 726–728. https://doi.org/10.1037/
0003-066X.61.7.726

Wicherts, J. M., Veldkamp, C. L., Augusteijn, H. E.,
Bakker, M., Van Aert, R., & Van Assen, M. A.
(2016). Degrees of freedom in planning, run-
ning, analyzing, and reporting psychological
studies: A checklist to avoid p-hacking. Fron-
tiers in Psychology, 7, 1832. https : / / doi . org /
10.3389/fpsyg.2016.01832

Young, C., & Holsteen, K. (2017). Model uncertainty
and robustness: A computational framework
for multimodel analysis. Sociological Methods &
Research, 46(1), 3–40. https : / / doi . org / 10 .
1177/0049124115610347

Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, M. B.
(2018). Making replication mainstream. Behav-
ioral and Brain Sciences, 41, e120. https://doi.
org/10.1017/S0140525X17001972

https://doi.org/10.1027/2151-2604/a000357
https://doi.org/10.1027/2151-2604/a000357
https://doi.org/10.1525/collabra.177
https://doi.org/10.1525/collabra.177
https://doi.org/10.1371/journal.pone.0026828
https://doi.org/10.1371/journal.pone.0026828
https://doi.org/10.1037/0003-066X.61.7.726
https://doi.org/10.1037/0003-066X.61.7.726
https://doi.org/10.3389/fpsyg.2016.01832
https://doi.org/10.3389/fpsyg.2016.01832
https://doi.org/10.1177/0049124115610347
https://doi.org/10.1177/0049124115610347
https://doi.org/10.1017/S0140525X17001972
https://doi.org/10.1017/S0140525X17001972