Meta-Psychology, 2022, vol 6, MP.2020.2718 https://doi.org/10.15626/MP.2020.2718 Article type: Tutorial Published under the CC-BY4.0 license Open data: Not Applicable Open materials: Yes Open and reproducible analysis: Yes Open reviews and editorial process: Yes Preregistration: No Edited by: Rickard Carlsson Reviewed by: Nordström, T., Rohrer, J., Zigerell, L.J. Analysis reproduced by: Lucija Batinović All supplementary files can be accessed at OSF: https://doi.org/10.17605/OSF.IO/W98S6 Multiverse analyses in the classroom Tom Heyman Methodology and Statistics Unit, Institute of Psychology, Leiden University Wolf Vanpaemel Faculty of Psychology and Educational Sciences, KU Leuven Abstract Most empirical papers in psychology involve statistical analyses performed on a new or existing dataset. Sometimes the robustness of a finding is demonstrated via data-analytical triangulation (e.g., obtaining comparable outcomes across different operationalizations of the dependent variable), but systematically considering the plethora of al- ternative analysis pathways is rather uncommon. However, researchers increasingly recognize the importance of establishing the robustness of a finding. The latter can be accomplished through a so-called multiverse analysis, which involves methodically examining the arbitrary choices pertaining to data processing and/or model building. In the present paper, we describe how the multiverse approach can be implemented in student research projects within psychology programs, drawing on our personal experience as instructors. Embedding a multiverse project in students’ curricula addresses an important scientific need, as studies examining the robustness or fragility of phenomena are largely lacking in psychology. Additionally, it offers students an ideal opportunity to put various statistical methods into practice, thereby also raising awareness about the abundance and consequences of arbi- trary decisions in data-analytic processing. An attractive practical feature is that one can reuse existing datasets, which proves especially useful when resources are limited, or when circumstances such as the COVID-19 lockdown measures restrict data collection possibilities. Keywords: multiverse analysis; robustness; education; pedagogy; open science An important part of many psychology students’ (un- der)graduate programs are research-methods classes in which students are asked to complete their own (small- scale) research project (e.g., Kierniesky, 2005). Typ- ically, the goal is to run through the entire empiri- cal cycle, thus putting knowledge gained from previ- ous theory-focused courses into practice. However, this can be quite challenging, as time and resources are of- ten limited in such projects. As a consequence, stu- dents and instructors might (begrudgingly) take short- cuts, resulting in ill-designed or underpowered studies, poorly-motivated research questions, sloppy measure- ment practices, and so on. Perhaps the most devastat- ing consequence of this approach is that students could come away with a wrong impression of what psycholog- ical research entails, and it might even instill bad habits in prospective researchers. In the present paper, we sug- gest an alternative implementation of research-methods classes that addresses these concerns. In particular, we propose that completing a multiverse analysis project as part of such research methods classes has several im- portant benefits. First, we explain what a multiverse analysis entails (see Steegen et al., 2016). Then, we de- scribe the two main ingredients of a multiverse-in-the- classroom project: a suitable dataset and a solid (meta- )scientific background. Next, we give a worked example of such a project, based on our personal experience as instructors. Finally, we discuss the benefits and chal- https://doi.org/10.15626/MP.2020.2718 https://doi.org/10.17605/OSF.IO/W98S6 2 lenges of the multiverse-in-the-classroom. What is a Multiverse Analysis? Most empirical papers in psychology involve some kind of data analysis. Typically, there is no unique path from the raw data to the eventual conclusions of a pa- per. Researchers need to make a number of decisions along the way, such as whether and how to deal with outliers and missing data, whether and how to trans- form variables, and so on. In some cases, theoretical considerations provide a clear solution to such ques- tions, yet, at times, researchers have little to go on, so they turn to their gut feeling, lab habits, or field- specific standards, which are often poorly motivated. As a result, when processing and analyzing empirical data, researchers regularly face certain choices that are ar- bitrary in nature. These researcher degrees of freedom (Simmons et al., 2011) lead to a garden of forking paths (Gelman & Loken, 2014). As an example, suppose that, for a given dataset, a re- searcher identifies four plausible ways to deal with out- liers, three approaches to handle missing data, and two reasonable options to transform a particular variable. Assuming all combinations are sensible, this would lead to 4*3*2 = 24 unique paths, each with its own outcome (see also Bishop, 2016). However, researchers usually report the results for just one or a few of these paths, by picking only one or a few options out of the pool of plau- sible alternatives (e.g., deleting observations 2.5 stan- dard deviations above the mean, listwise deletion when encountering missing values, and log-transforming a positively skewed variable; see also Elson, 2016, for a practical illustration). In contrast, the idea of a multiverse analysis (Steegen et al., 2016) is to explore and report on a wide array of imaginable (combinations of) reasonable alternatives, each of which providing an answer to the same research question. By explicitly considering the results of several reasonable analyses, a multiverse analysis can give an idea about the robustness or fragility of a certain find- ing, and might even point to moderators of the effect in question (i.e., key choices regarding data process- ing and/or analysis that the conclusion depends on). A multiverse analysis can be applied to newly collected data (e.g., Kalokerinos et al., 2019), but also retrospec- tively using existing data (e.g., Moors & Hesselmann, 2019). For instance, Credé and Phillips (2017) con- ducted a multiverse analysis on data from Carney et al. (2010) examining the power pose effect, which is the (controversial) finding that holding a high-power body pose affects hormone levels. Their multiverse analysis revealed that most alternative pathways yielded null ef- fects, whereas the original single-pathway analysis pro- duced a significant effect. The importance of a multiverse analysis is also nicely illustrated by the study of Silberzahn et al. (2018), in which 29 research teams independently examined whether referees in soccer are more likely to give red cards to players with a darker skin tone compared to light-skin-toned players. All teams used the same dataset to answer this question, yet the conclusions var- ied considerably: 20 out of 29 teams (69%) found a positive relation (i.e., dark-skin-toned-players tended to receive more red cards), whereas 9 teams obtained a null effect, which was even numerically negative in two cases. These results underscore that there are often sev- eral ways to process and analyze a given dataset, and that picking a single pathway might be deceiving, which is why conducting a multiverse analysis can be very in- formative. Ideas similar to that of a multiverse analysis have been proposed under different names, such as specifi- cation curve analysis (Simonsohn et al., 2020), vibra- tion of effects analysis (Patel et al., 2015), multimodel analysis (Young and Holsteen, 2017), and the many an- alysts approach by Silberzahn et al. (2018) discussed above (though, in contrast with the other approaches, in the latter approach the different choices are distributed over research teams rather than being performed by the same team). Multiverse-style analyses are increas- ingly being recognized as providing crucial information, and researchers have also proposed various extensions and refinements. For instance, multiverse analyses have been applied in the context of meta-analyses (Voracek et al., 2019), suggested as an approach to deal with different random effect structures of multi-level models (Harder, 2020), and used in combination with so-called explorable explanations allowing readers of a paper to dynamically move through the multiverse (Dragicevic et al., 2019). In addition, Liu et al. (2020) recently devel- oped a programming tool called Boba, which helps re- searchers to conduct and visualize multiverse analyses, whereas others have developed specific R packages to facilitate multiverse analyses (e.g., Masur & Scharkow, 2019; Sarma & Kay, 2019). Teaching Multiverse Analyses The key message of this paper is that multiverse analyses are ideally suited to be included in laboratory or research-methods classes. In line with its general theme, there are a multitude of ways in which multi- verse analyses can be incorporated in research-methods classes, taking into account the available time, place in the curriculum, and learning objectives. Yet, they all re- quire two essential ingredients: a suitable dataset and a solid (meta-)scientific background. Both of these el- 3 ements will be discussed in turn, including some guid- ance based on personal experience. A suitable dataset A multiverse analysis can be conducted on newly gathered data, or one could reuse an existing dataset. From an educational point-of-view, the former option is fairly comparable to a typical student research project, though the eventual statistical analyses will be consider- ably more elaborate, sophisticated and time-consuming. Focusing on existing data is perhaps more unusual in the context of a research methods class, in that it in- volves finding a suitable dataset and isolating the hy- potheses of interest, rather than designing a study to test a hypothesis and collecting data. For short projects or when students are relatively inexperienced, the in- structor could select one or a few suitable studies, thus assuring that students can hit the ground running. Al- ternatively, students with a stronger background could be given the opportunity to find a suitable study them- selves. Selecting a study from the literature for a multiverse analysis comes with several challenges. One obvious requirement for such a study is that it should have pub- lically available data, or that the original authors share their data for the agreed-upon purposes. This already narrows down the pool of studies, as psychological sci- entists are often not able or willing to share research data (Vanpaemel et al., 2015; Wicherts et al., 2006), though since the start of the open science movement and its various initiatives (e.g., Morey et al., 2016), there has been an increase in data availability (Kidwell et al., 2016). Furthermore, even if data are available, it does not necessarily imply that they are amenable to a multiverse analysis. It might, for instance, be unclear what a certain variable measures, or how a data file is structured (Hardwicke et al., 2018). Obviously, the multiverse-using-existing-data-approach is only feasible when one has access to reusable data. Another important criterion is that the study should afford plausible alternative data-analytic pathways, tai- lored to the students’ capabilities, to test the hypothesis of interest. We suspect that many studies in psychol- ogy meet this requirement, by for example focusing on outlier detection, dichotomization of variables, covari- ate inclusion, and so on. However, the data need to be available at a level raw enough to allow the construction of different pathways. If one only has access to the pro- cessed data (e.g., after dichotomization), rather than to the raw data, certain reasonable alternative processing and analysis options can not be explored. A final issue to consider is analytical reproducibility (i.e., conducting the same analyses on the same dataset and obtaining the same results). Ideally, one selects a study of which the (most important) results are re- producible, or, at minimum, that the reason for non- reproducibility is clear. This requirement restricts the pool of possible target studies even further, as analytic reproducibility within psychological research has been shown to be far from ideal. For example, Hardwicke et al. (2018) were able to independently reproduce the key results from only 11 out of 35 articles with reusable data published in the journal Cognition. More surpris- ingly, even with the help of the original authors, key results of 13 articles could not be reproduced. Artner et al. (2021) describe similar struggles in their attempt to reproduce 232 key statistical claims from 46 articles, based on the raw data, without help from the original authors (see also Wicherts et al., 2011). Although re- producibility is not strictly necessary in order to conduct a multiverse analysis, it does provide some reassurance that the data were processed and interpreted in the way intended by the authors. For example, before conduct- ing their multiverse analysis, Steegen et al. (2016) had to correct various minor reporting errors in the original data, which were discovered only by first attempting to reproduce the results (see their supplemental materi- als). If the original results are not (entirely) reproducible, but the source of the inconsistencies is easily identifi- able (e.g., use of dummy coding rather than effect cod- ing or correctable typos in the data file), one can still be reasonably confident in one’s understanding of the data-analysis, and the study might be a suitable target for the type of research project described here. In fact, such cases can be especially interesting from an educa- tional point of view, as they demonstrate the project’s relevance, and illustrate that even accomplished re- searchers might struggle with data analysis at times. Yet, when there is no discernable explanation for non- reproducible results, undertaking a multiverse analysis is potentially fruitless, especially when the discrepan- cies are substantial, because one might have misinter- preted the data. Of course, it is also possible that the original authors made a mistake, but it can be time- consuming to figure this out, and the authors might not be able or willing to help clear up any discrepancies. Finding a study meeting all these requirements can be quite challenging, for students and instructors alike. A useful starting point for this search process is the article library on curatescience.org, which provides the possibility of filtering articles based on the avail- ability of data (LeBel et al., 2018). Furthermore, one could browse repositories like the Open Science Framework (Soderberg, 2018) for articles with open data. Consulting recent issues of journals using badges 4 to signal articles with open data and open materi- als (https://www.cos.io/our-services/badges; Kidwell et al., 2016) is another excellent option. Of course, the instructor could provide a dataset of their own or one they are already familiar with. This could either be the primary or only option (see Example Application be- low), or as a back-up in case (some) students wouldn’t be able to find a suitable dataset themselves. Based on our experience, both of these approaches work well. A solid (meta-)scientific background It is important to build a solid meta-scientific frame- work, and provide students with sufficient background information about multiverse analyses at the begin- ning of the project (unless they are already familiar with these concepts from other courses). For exam- ple, one could cover some insightful meta-scientific ar- ticles such as Simmons et al. (2011) about researcher degrees of freedom and their effect on the false posi- tive rate, Gelman and Loken (2014), which describes how data-analysis can be conceived as a garden of fork- ing paths, and Steegen et al. (2016), which introduces multiverse analyses. That way, students are gently in- troduced to the concept of a multiverse analysis and the rationale behind it. In addition, it serves to foster critical thinking and demonstrates the relevance of such (meta-)scientific studies, including their own. Besides these more general meta-scientific articles, students could benefit from several (published) exam- ples of a multiverse analysis (e.g., Credé & Phillips, 2017; Moors & Hesselmann, 2019), to give them an idea of what it concretely entails. This serves two pur- poses. One, it provides guidance on how to summa- rize and interpret the outcome of a multiverse analy- sis (e.g., plotting a distribution of p-values, or creating a heatmap with p-values as a function of the various analytic pathways). Two, it stimulates students in rec- ognizing potentially arbitrary choices, thus giving them inspiration for their own multiverse. Still, it can be quite challenging and overwhelming for students to generate alternative data-analytic path- ways. A useful source, besides the papers mentioned above, is the work of Wicherts et al. (2016), which of- fers a comprehensive overview of researcher degrees of freedom. Moreover, one could also encourage students to look for alternative pathways in related work. In par- ticular, when the project involves re-analysis of a pub- lished study, students could critically assess the ratio- nale behind the article’s data-analytic choices, or exam- ine papers cited in the target article as well as previous publications from the same authors on the same topic. To facilitate this, the instructor could organize a (group) discussion about the paper in question and point out some potentially relevant or remarkable choices. Stu- dents could (or should) also try to reproduce the orig- inal findings, if they haven’t done so already as part of the process to select the target study (see above). That way, students familiarize themselves with the tar- get study and its data, which might give them ideas for their eventual multiverse. Throughout the project, strong guidance is needed. It is critical to inform students about the expectations re- garding a multiverse analysis, and to tackle misconcep- tions. For one, the goal should not be to merely devise as many paths as possible. The key is that the alter- natives are properly motivated — quality over quantity (Del Giudice & Gangestad, 2021). Furthermore, when multiple students use the same dataset, it is perfectly plausible to end up with different paths, and thus po- tentially a seemingly-contradicting answer to the same research question. This does not mean that someone made a mistake, rather it shows the ubiquity of arbitrary decisions. Clear communication about these issues is important to avoid any confusion among students. Pro- viding feedback to students, particularly when it comes to the construction and implementation of the multi- verse analysis, is also instrumental to make the project a success. Some students may come up with poorly mo- tivated alternative pathways, in which case the supervi- sor should steer them in the right direction or encour- age them to carefully (re)consider the rationale for their choices. Feedback could also take the form of a group discussion at a later stage of the project, to address the different pathways students came up with and compare their outcomes. Though not strictly necessary, basic knowledge of R (i.e., a programming language primarily used for data analysis and visualization; R Core Team, 2016), or even R Markdown (i.e., an environment to create dynamic, reproducible reports; Allaire et al., 2016), can help stu- dents in running their analyses and reporting their re- sults, yet there is quite a steep learning curve. Multi- verse analyses involve combining different options (e.g., different outlier criteria for different dependent vari- ables that are transformed in various ways). Especially when this amounts to many individual pathways, it will be more efficient to integrate them instead of perform- ing each analysis separately, yet that does require some programming experience or training. Example Application This section describes an actual implementation of the multiverse-in-the-classroom approach in the con- text of an undergraduate research project (see Table 1 for a summary of the syllabus). Besides illustrating the viability of the approach, we hope that it can in- https://www.cos.io/our-services/badges 5 Table 1 Summary of the Syllabus for the Undergraduate Research Project Involving a Multiverse Analysis Timing Activity Primary learning objective(s) Week 1 General introduction Understand the topic of the thesis Week 2 Group discussion of target article (i.e., Smith et al., 2019) Engage in critical thinking about the target article Class on ethics, data sharing, and reproducibility Understand the importance of data sharing and reproducibility Week 3 Group discussion of Wicherts et al. (2011) Understand the importance of data sharing and reproducibility Group discussion of Hardwicke et al. (2018) Understand the importance of data sharing and reproducibility Week 4 Group discussion of Simmons et al. (2011) Recognize researchers’ degrees of freedom and realize their impact Group discussion of Steegen et al. (2016) Understand what a multiverse analysis entails, how to conduct one, and see how the results could be presented Week 5 R intro Perform data processing, visualization, and plot- ting in R Week 6 RMarkdown intro Write reproducible and dynamic report Week 7-17 Conduct multiverse analysis and write thesis (in- cluding four opportunities for individual feed- back) Write a thesis incorporating relevant feedback spire instructors, course coordinators, and program di- rectors who would consider including multiverse analy- ses in their research-methods classes. Of course, there are many alternative ways to implement the multiverse- in-the-classroom approach, taking into account aspects such as timing, group size, students’ prior knowledge, learning objectives, and so on. The project took place in the 2020 spring semester with the first author as the instructor, and was inspired by a course jointly-taught by both authors in previous years. It was embedded in a course called Bachelorpro- ject, which spans 17 weeks, and is organized for stu- dents in the final year of their undergraduate psychol- ogy program. These students have already followed sev- eral statistics and methods courses, typically amounting to 30 European Credits (EC). The Bachelorproject represents a study load of 15 EC, during which students need to write an individual thesis describing the outcomes of a research project. The course is mandatory for all undergraduate psy- chology students, but they are divided in small groups each with a different instructor and a different research topic (e.g., mental health in university students, exam- ining people’s interest in psychedelics, individual differ- ences in the attentional bias towards emotion,....). The multiverse-in-the-classroom approach described here was used in one such group, consisting of eight stu- dents. Students ultimately had to write a thesis about their project following the typical Introduction-Method- Results-Discussion structure. The resulting products were evaluated on the same criteria as other research projects within the course by two independent graders (including the instructor). In addition, the instructor also graded the process as a whole. The project involved the re-analysis of an existing dataset, which was provided by the instructor. The se- lected target article was a study by Smith et al. (2019), examining the influence of acute stress on semantic memory retrieval. Smith et al. found that participants performed better on an open-ended trivia questionnaire after experiencing acute stress, and when they showed a stronger stress response. The study met all of the above criteria: reusable processed data were available in de- tailed enough format (the underlying raw data were, at the time, available upon request, and are now publically available; see Smith, 2020); the results were repro- ducible (except for one easily-identifiable deviation); and the data processing and analysis steps afforded var- ious alternative pathways. In a first meeting with the students of +- 1 hour the general topic of the thesis was introduced by the instructor. This included a short de- scription of the target study as well as a brief introduc- tion to the concept of a multiverse analysis. In the next meeting (+- 2 hours), the target article was examined in detail through a journal club, in which the instructor led the discussion. Students were expected to read the 6 article in advance, and were encouraged to pay special attention to methodological and data-analytical choices. Furthermore, any aspects of the paper that were un- clear to the students were addressed during the meet- ing. From this point onwards, students were encour- aged to start thinking about alternative analysis path- ways, inspired by the group discussion, through search- ing for literature around the same topic, etc. The third meeting (+- 1.5 hours) consisted of an in- teractive lecture on data sharing (including ethical is- sues such as protecting the privacy of participants), re- producibility, and scientific integrity (including a discus- sion of questionable research practices). The idea is to introduce some concepts that are directly relevant for their thesis (e.g., reproducibility) as well as to give stu- dents a broad overview of meta-scientific topics. The next four meetings (+- 2 hours each) involved journal clubs around articles on, respectively, data shar- ing and reproducibility (i.e., Hardwicke et al., 2018; Wicherts et al., 2011), researcher degrees of freedom (i.e., Simmons et al., 2011), and multiverse analysis (Steegen et al., 2016). Each time, two students led the discussion, but everyone was supposed to read the paper in advance and take part in the discussion. The instructor intervened sporadically if something was un- clear or to point out relevant aspects. The purpose of these meetings was three-fold. First, it served to build a solid meta-scientific background, and to give students inspiration for their own multiverse analysis. Second, writing the introduction section for a thesis about multi- verse analyses can be challenging as it differs somewhat from that of a “regular” empirical study. Hence, dis- cussing a few key articles puts them on the right track. Finally, these journal clubs were also meant to improve students’ presentation and discussion skills. The four final collective meetings (+- 2 hours each) served to introduce the students to R and R Markdown. Students were guided through a custom-made script showing how to read in data, transform and combine datasets, use conditional statements and loops, make graphs, and perform all the analyses that were used in the target paper. The script already used the data from the target paper to make sure that students understood what the variables meant. Even though the script in- troduced all the procedures needed to reproduce the results of the target paper, they were illustrated using different variables. As a take-home exercise, students then tried to independently reproduce the key outcomes of the target paper using R, which they later embedded in an R Markdown document. This guaranteed that all students were (eventually) able to follow the processing and analysis pathways outlined by Smith et al. (2019). Note that it wasn’t required from students to write their thesis in R Markdown, or even use R for their eventual analyses. In the end, all eight students conducted their multiverse analysis in R, and two of them wrote their final paper using R Markdown. From that point onwards, each student had four in- dividual feedback meetings with the instructor in which their research proposal (i.e., rationale for the different pathways), analysis plan, code, results, and write-up were discussed. Seventeen weeks after the start of the course, they were expected to submit their final thesis and accompanying analysis script. An exhaustive overview of all the alternatives stu- dents came up with would take us too far, but the following examples serve to illustrate the versatility of a multiverse approach to (under)graduate research projects. For instance, Smith and colleagues consid- ered responses to a trivia questionnaire as being cor- rect if they completely matched the correct answer, were misspelled but easily extrapolated, were inappropriately pluralized or capitalized, were common synonyms of the correct answer, or if the first four or more letters matched the correct answer. However, students con- sidered various reasonable alternatives to this coding scheme, such as treating incomplete responses as incor- rect, regardless of how many letters matched the correct answer (Boere, 2020; De Jong, 2020; Hoogeterp, 2020; Kraaijenbrink, 2020; Kuipers, 2020; Van Dijk, 2020; Van Rijn, 2020; Van Wijk, 2020). Exploring this vari- ation was only possible because students had access to the raw data (i.e., responses of each participant to each question), as the processed data only contained accu- racy scores per participant based on the original coding scheme. Furthermore, some students redefined the construct reactivity to stress. In the original paper, it was opera- tionalized as the change in cortisol levels relative to a baseline, whereas students also considered the change in the psychological stress response measured through the State-Trait Inventory for Cognitive and Somatic Anx- iety, described in Grös et al., 2007 (Hoogeterp, 2020; Van Rijn, 2020). Additionally, some students added co- variates to the analyses (e.g., age; Kraaijenbrink, 2020), or removed covariates (i.e., gender; Hoogeterp, 2020; Kuipers, 2020; Van Wijk, 2020). Yet other pathways in- volved imputing missing values (Kraaijenbrink, 2020), or removing observations (e.g., excluding participants who did not display an elevated cortisol level after stress-induction; Boere, 2020; Van Dijk, 2020). Although there was some overlap in data-analytic choices between students, each individual project fea- tured unique pathways, which were based on existing literature (e.g., Merz et al., 2016), statistical arguments, and/or a critical appraisal of the original study. The 7 breadth of options is illustrated in Figure 1, showing the distribution of p-values for Smith and colleagues’ main finding resulting from each students’ multiverse analy- sis (see https://osf.io/rtayk/ for the underlying R code). On average, students’ multiverse analyses comprised 78 paths (minimum 18, maximum 160). This outcome highlights the feasibility and potential of undergraduate research projects incorporating mul- tiverse analyses. We hasten to add that it does not serve as a way to evaluate the robustness of Smith and colleagues’ main finding, because certain data-analytic choices explored by the students were insufficiently mo- tivated. The work done by the students, however, offers an ideal starting point for a more thorough multiverse analysis of the finding (see Heyman et al., 2022). Benefits of the Multiverse-in-the-Classroom Approach Incorporating multiverse analyses in (un- der)graduate research projects (or other courses) has many benefits for students as well as (psychological) science in general. One strength of the multiverse-in-the-classroom ap- proach is that it can be flexibly adapted to the course’s learning objectives, classroom size, time frame, back- ground of the students, and so on. For instance, one can conduct a multiverse analysis reusing an existing dataset, like in the example described above, or one could use newly gathered data. Because the latter op- tion involves an additional step compared to a typical research project, it is well-suited for situations where something extra is required from students (e.g., stu- dents enrolled in an honours program), whereas the former option can be applied more broadly. Importantly, as there is no need to design a new study, or to collect any data, the students’ overall time-investment is com- parable to that of a regular research project. Moreover, an adapted version of such a multiverse project can be used in a more statistics-oriented course rather than a research-methods-oriented course. Both authors have used a similar approach as part of a 13-week gradu- ate statistics course within a psychology research master track for a number of years. There, the +- 40 students were instructed to write a report about the multiverse analysis they conducted in small groups using existing data. Because these graduate students are well-versed in statistical analyses and programming, and due to the group-nature of the project, it can easily fit in a 13-week course as compared to the 17-week undergraduate re- search project described above. As a multiverse project does not necessarily require collecting new data, one could effectively save a lot of resources (i.e., time of participants and students, money to pay participants,. . . ). Therefore, it is ideal for situa- tions where collecting new data is impractical or impos- sible, for instance, because special equipment or exper- tise is required, getting ethical approval takes too much time, or when one does not have access to a partici- pant pool or money to pay participants. This proved to be especially relevant in the lockdown situation due to COVID-19 in spring 2020. Indeed, the lockdown mea- sures, which involved suspending all in-vivo data col- lection and required classes to be taught online, had very little impact on the project discussed above, with all students meeting the original deadline. The flexibility also applies to the selection of a target study. Each student could focus on a separate paper, or, as was the case in the example above, each student independently construes their own data-analytic path- ways for the same data set. The latter option is compa- rable to the many-analysts-one-dataset approach used by Silberzahn et al. (2018), augmented with the ad- ditional requirement that every analyst (i.e., student) should consider several plausible alternatives rather than a single one. We believe the many-multiverses- one-dataset option is the most interesting of the two, be- cause any given multiverse will rarely (if ever) exhaust all reasonable options, hence it makes sense to adopt a form of data-analytic triangulation. In other words, there is a multiverse of multiverse analyses, which can be captured to some degree by asking different students to focus (semi-)independently on the same overarching topic. Although it is unrealistic to expect that every in- dividual project will be of the same quality, it can be enlightening to see the variability, or lack thereof, in outcomes. Indeed, as Figure 1 shows, it is possible that some multiverse analyses suggest the effect in question to be quite robust, whereas others suggest the effect to be rather fragile. Despite bridging an important gap in psychological science by showing the robustness or fragility of find- ings, multiverse analyses are relatively rare, owing per- haps to their apparent complexity and/or their per- ceived lack of novelty. In that sense, one can draw a parallel to replication studies: once rare in psychol- ogy (Makel et al., 2012), they are now becoming more mainstream through various initiatives (see Zwaan et al., 2018). Moreover, Frank and colleagues (Frank & Saxe, 2012; Hawkins et al., 2018) promoted conducting replication studies in student research projects (see also Grahe et al., 2012; Wagge et al., 2019). The current proposal seeks to accomplish a similar goal for multi- verse analyses. Note that both approaches can comple- ment each other, in that one can conduct a multiverse analysis on replication data, either as part of the same project or across different iterations of the course (e.g., 8 Figure 1. Distribution of P-values for Smith and Colleagues’ Main Finding Resulting from Each Students’ Multiverse Analysis. Histogram of Student 1 (N = 160) p−value F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Histogram of Student 2 (N = 140) p−value F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 Histogram of Student 3 (N = 48) p−value F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 Histogram of Student 4 (N = 36) p−value F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 Histogram of Student 5 (N = 78) p−value F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 Histogram of Student 6 (N = 18) p−value F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 Histogram of Student 7 (N = 110) p−value F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 Histogram of Student 8 (N = 36) p−value F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 Note. The red dotted line indicates a p-value of .05. The figure in brackets indicates the number of pathways in each student’s multiverse. Remark that not all pathways were properly motivated, so these results should not be considered an evaluation of the robustness of Smith and colleagues’ main finding. one group conducting the replication study, and another group performing multiverse analyses, possibly the fol- lowing semester or academic year). Another major benefit of adopting the multiverse-in- the-classroom approach, besides its flexibility, is that it gives students the opportunity to make a tangible contribution to psychological science, something that might not always occur with (under)graduate research projects. Moreover, under some conditions, the work done by students and instructor(s) can be solidified in a joint research paper, suitable for publication, as was the case for the example application. The classroom phase then serves as an elicitation step of possible rea- sonable variations, which, in a second step, are evalu- ated for adoption in a multiverse analysis by a domain expert. Such a two-step multiverse analysis - where data-analytical pathways are first elicited from different sources which then get synthesized and applied to the data - can even yield more comprehensive and less bi- ased results compared to a regular multiverse analysis. The multiverse in the classroom approach also pro- vides ample pedagogical opportunities. Conducting a multiverse analysis typically requires students to per- form a number of different statistical analyses. It thereby addresses an often-heard complaint from psy- chology students regarding the relevance of statistics. Even though most research projects involve the prac- tical application of statistics, it rarely is a focal point (in some cases, the analysis part might actually be con- sidered a nuisance). Furthermore, a multiverse project may help students to better understand the intricacies of statistical analyses. Importantly, it is not a purely methodological or statistical project, as it also involves an empirical research question such as “to what extent 9 does power posing have an effect on hormone levels” or “what is the effect of stress on semantic memory”. Hence, there is still the thrill of discovery, which helps fuel students’ engagement. At a more abstract level, a multiverse project also al- lows students to gain first-hand experience with the im- portance of open science, reproducibility, proper docu- mentation of data, and so on. In addition, it teaches them to critically evaluate the rationale behind a study, especially its methodology, and it gives them an idea about the imperfections of psychological science. As fu- ture consumers of research, it is relevant to recognize that arbitrary decisions abound in research, and to real- ize their consequences. A multiverse-in-the-classroom project really drives this point home. Moreover, for those students aspiring to become producers of re- search, it is paramount to adopt responsible research practices, such as assessing the robustness of key out- comes. In fact, students spontaneously mentioned these aspects in an informal evaluation of the course (e.g., “it has really changed my perspective on research. . . and sparked my interest” or “it was interesting to see what happens to the p-values when conducting different anal- yses”). One could argue that typical research projects, in which students are required to develop a new hy- pothesis, design a new study and collect data, teaches them bad habits or even questionable research practices as it is rather difficult to accomplish all this in a rigorous manner within the, usually limited, timeframe. Challenges and Objections A multiverse-in-the-classroom project can involve de- signing a new study, but that might not be feasible within the confines of a single semester, because de- veloping and conducting such an analysis in itself is rather time-consuming. The option involving existing data is more readily applicable, yet one potential objec- tion is that such a project does not cover the entire em- pirical cycle. Although a multiverse project requires a thorough literature search, motivating a research ques- tion, and a comprehensive data-analysis of which the results ought to be interpreted and discussed, students may miss out on learning specific skills (e.g., regarding data collection). When the development of such skills is a central objective of the course, one might need to look for a creative solution. For instance, in the exam- ple application described above, the absence of a data collection phase was addressed by having students re- code the participants’ responses to the trivia question- naire. Note though that one can raise similar concerns about more widely-applied projects such as those in- volving online data collection. In fact, there is often quite a bit of variability in what is demanded of students across projects within the same (under)graduate pro- gram. More fundamentally, accreditation guidelines for research projects in psychology often explicitly mention the possibility to conduct secondary data analyses (e.g., Australian Psychology Accreditation Council, 2019; The British Psychological Society, 2019). Another challenge of conducting a multiverse anal- ysis is that it requires combining various alternatives (e.g., three different outlier criteria and four different data transformations yield 12 outcomes). In principle, every analysis can be conducted separately, but this be- comes unwieldy quite quickly, so one could use a script to increase efficiency. Depending on the students’ back- ground, the latter option might prove to be unattainable unless one would include some programming classes in the curriculum (e.g., teaching the language R). Another potential hurdle for students (and instruc- tors alike) revolves around the interpretation of a mul- tiverse analysis. In contrast to a typical research project, one does not end up with a single outcome, but with a collection of outcomes. This elicits questions such as when should a finding be considered robust, when is it presumably a fluke, and how should the results be summarized and presented. Indeed, published papers involving multiverse analyses typically eyeball the pat- tern of results, for instance, by plotting the distribution of p-values. Steegen et al. (2016) tentatively suggest to focus inference on the average p-value, but beyond that, there is little guidance as to how to synthesize a multiverse analysis (but see Simonsohn et al., 2020). A more fundamental objection could be that ap- proaches such as pre-registration are more desirable, so that students should spend their time learning about pre-registration rather than about multiverse analyses. Pre-registration entails that one specifies the analysis plan before knowing its results, if possible even before starting the data collection (Nosek et al., 2018). As such, pre-registration makes transparent which choices could be data-driven and which are not. However, if a researcher pre-registers one or few analytic pathways, one is still left in the dark about how robust or fragile the effect is, or about whether certain choices are more critical than others (for a similar argument see Steegen et al., 2016). To that end, one would need to conduct a multiverse-style analysis. Of course, one could pre- register a multiverse analysis to combine the strengths of both approaches, but this increases the complexity of the project. Finally, one should be cautious that students do not completely lose faith in (psychological) science. Indeed, whereas the goal is to make students critical consumers of scientific output, and, as a result, careful producers of scientific output, they should not come away with 10 the idea that science is inherently flawed or that all re- searchers are opportunistic or fraudulent. Along the same lines, students should be made aware that not all hypotheses can necessarily be tested in a myriad of ways. Based on the informal evaluation mentioned above, students did not come away with such incorrect notions, but future research on the effectiveness of the multiverse-in-the-classroom approach should determine whether this is indeed the case. Conclusion The present paper proposes to implement multiverse analyses in student research projects, and provides a practical demonstration that we hope will encourage, help and inspire instructors to adopt it in their own courses. Because multiverse analyses speak to the ro- bustness of a (published) finding, it can fulfill an im- portant need in psychological science, thus making the results of such projects truly relevant. Furthermore, it is an excellent way to put statistics in practice, it fos- ters critical thinking, and raises awareness about the prevalence and consequences of arbitrary data-analytic decisions. Finally, the flexibility of the multiverse-in- the-classroom approach makes it suitable for all kinds of projects, even when data collection is not feasible. Author Contact Correspondence concerning this article should be addressed to Tom Heyman, Methodology and Statis- tics Unit, Institute of Psychology, Leiden University, Wassenaarseweg 52, 2333 AK Leiden, The Netherlands. E-mail: t.d.p.heyman@fsw.leidenuniv.nl. ORCID TH 0000-0003-0565-441X WV 0000-0002-5855-3885. Conflict of Interest and Funding The authors declare that there were no conflicts of in- terest or specific funding with respect to the authorship or the publication of this article. Author Contributions Both authors conceptualized the idea. TH wrote the first draft of the manuscript and WV provided extensive feedback. Both authors approved the final version for submission. Open Science Practices This article earned the Open Materials badge for making the materials openly available. It has been ver- ified that the analysis reproduced the results presented in the article. The entire editorial process, including the open reviews, are published in the online supplement. References Allaire, J., Cheng, J., Xie, Y., McPherson, J., Chang, W., Allen, J., Wickham, H., Atkins, A., & Hyndman, R. (2016). rmarkdown: Dynamic Documents for R [R package version 1.6]. https : / / CRAN . R - project.org/package=rmarkdown Artner, R., Verliefde, T., Steegen, S., Gomes, S., Traets, F., Tuerlinckx, F., & Vanpaemel, W. (2021). The reproducibility of statistical results in psycho- logical research: An investigation using unpub- lished raw data. Psychological Methods, 26(5), 527–546. https : / / doi . org / 10 . 1037 / met0000365 Australian Psychology Accreditation Council. (2019). Accreditation Standards for Psychology Pro- grams Evidence Guide (Version 1.2). https : / / psychologycouncil . org . au / wp - content / uploads/2021/03/APAC-Evidence-guide_v1.2. pdf Bishop, D. (2016). Open research practices: Unin- tended consequences and suggestions for avert- ing them.(commentary on the peer reviewers’ openness initiative). Royal Society Open Science, 3(4), 160109. https://doi.org/10.1098/rsos. 160109 Boere, R. (2020). Het belang van reproduceerbare en transparante wetenschap: Een multiverse be- nadering. [Unpublished bachelor’s thesis]. Lei- den University. Carney, D. R., Cuddy, A. J., & Yap, A. J. (2010). Power posing: Brief nonverbal displays affect neuroen- docrine levels and risk tolerance. Psychological Science, 21(10), 1363–1368. https://doi.org/ 10.1177/0956797610383437 Credé, M., & Phillips, L. A. (2017). Revisiting the power pose effect: How robust are the results reported by Carney, Cuddy, and Yap (2010) to data ana- lytic decisions? Social Psychological and Person- ality Science, 8(5), 493–499. https://doi.org/ 10.1177/1948550617714584 De Jong, S. (2020). Het effect van stress op het seman- tisch geheugen: Een multiverse benadering. [Un- published bachelor’s thesis]. Leiden University. Del Giudice, M., & Gangestad, S. W. (2021). A traveler’s guide to the multiverse: Promises, pitfalls, and a framework for the evaluation of analytic deci- sions. Advances in Methods and Practices in Psy- https://CRAN.R-project.org/package=rmarkdown https://CRAN.R-project.org/package=rmarkdown https://doi.org/10.1037/met0000365 https://doi.org/10.1037/met0000365 https://psychologycouncil.org.au/wp-content/uploads/2021/03/APAC-Evidence-guide_v1.2.pdf https://psychologycouncil.org.au/wp-content/uploads/2021/03/APAC-Evidence-guide_v1.2.pdf https://psychologycouncil.org.au/wp-content/uploads/2021/03/APAC-Evidence-guide_v1.2.pdf https://psychologycouncil.org.au/wp-content/uploads/2021/03/APAC-Evidence-guide_v1.2.pdf https://doi.org/10.1098/rsos.160109 https://doi.org/10.1098/rsos.160109 https://doi.org/10.1177/0956797610383437 https://doi.org/10.1177/0956797610383437 https://doi.org/10.1177/1948550617714584 https://doi.org/10.1177/1948550617714584 11 chological Science, 4(1), 1–15. https://doi.org/ 10.1177/2515245920954925 Dragicevic, P., Jansen, Y., Sarma, A., Kay, M., & Cheva- lier, F. (2019). Increasing the transparency of research papers with explorable multiverse analyses. Proceedings of the 2019 CHI Confer- ence on Human Factors in Computing Systems, 1–15. Elson, M. (2016). Flexibility in methods & measures of social science. https : / / www. flexiblemeasures . com/ Frank, M. C., & Saxe, R. (2012). Teaching replica- tion. Perspectives on Psychological Science, 7(6), 600–604. https : / / doi . org / 10 . 1177 / 1745691612460686 Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460–465. Grahe, J. E., Reifman, A., Hermann, A. D., Walker, M., Oleson, K. C., Nario-Redmond, M., & Wiebe, R. P. (2012). Harnessing the undiscovered re- source of student research projects. Perspectives on Psychological Science, 7(6), 605–607. https: //doi.org/10.1177/1745691612459057 Grös, D. F., Antony, M. M., Simms, L. J., & McCabe, R. E. (2007). Psychometric properties of the State- Trait Inventory for Cognitive and Somatic Anx- iety (STICSA): Comparison to the State-Trait Anxiety Inventory (STAI). Psychological Assess- ment, 19(4), 369–381. https : / / doi . org / 10 . 1037/1040-3590.19.4.369 Harder, J. A. (2020). The multiverse of methods: Ex- tending the multiverse analysis to address data- collection decisions. Perspectives on Psychologi- cal Science, 15(5), 1158–1177. https://doi.org/ 10.1177/1745691620917678 Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nil- sonne, G., Banks, G. C., Kidwell, M. C., Hofelich Mohr, A., Clayton, E., Yoon, E. J., Henry Tessler, M., Lenne, R. L., Altman, S., Long, B., & Frank, M. C. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the jour- nal Cognition. Royal Society Open Science, 5(8), 180448. https://doi.org/10.1098/rsos.180448 Hawkins, R. X., Smith, E. N., Au, C., Arias, J. M., Cata- pano, R., Hermann, E., Keil, M., Lampinen, A., Raposo, S., Reynolds, J., Salehi, S., Salloum, J., Tan, J., & Frank, M. C. (2018). Improving the replicability of psychological science through pedagogy. Advances in Methods and Practices in Psychological Science, 1(1), 7–18. https : / / doi . org/10.1177/2515245917740427 Heyman, T., Boere, R., de Jong, S., Hoogeterp, L., Kraai- jenbrink, J., Kuipers, C., van Dijk, M., van Rijn, L., & van Wijk, T. (2022). The effect of stress on semantic memory retrieval: A multiverse anal- ysis. Collabra: Psychology, 8(1), 35745. https : //doi.org/10.1525/collabra.35745 Hoogeterp, L. (2020). Het effect van stress op het seman- tisch geheugen: Een multiverse benadering. [Un- published bachelor’s thesis]. Leiden University. Kalokerinos, E. K., Erbas, Y., Ceulemans, E., & Kup- pens, P. (2019). Differentiate to regulate: Low negative emotion differentiation is asso- ciated with ineffective use but not selection of emotion-regulation strategies. Psychological Sci- ence, 30(6), 863–879. https : / / doi . org / 10 . 1177/0956797619838763 Kidwell, M. C., Lazarević, L. B., Baranski, E., Hard- wicke, T. E., Piechowski, S., Falkenberg, L.-S., Kennett, C., Slowik, A., Sonnleitner, C., Hess- Holden, C., Errington, T. M., Fiedler, S., & Nosek, B. A. (2016). Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biol- ogy, 14(5), e1002456. https : / / doi . org / 10 . 1371/journal.pbio.1002456 Kierniesky, N. C. (2005). Undergraduate research in small psychology departments: Two decades later. Teaching of Psychology, 32(2), 84–90. https://doi.org/10.1207/s15328023top3202_ 1 Kraaijenbrink, J. (2020). The effect of stress on the se- mantic memory: A multiverse approach. [Unpub- lished bachelor’s thesis]. Leiden University. Kuipers, C. (2020). The effect of stress on the semantic memory: A multiverse approach. [Unpublished bachelor’s thesis]. Leiden University. LeBel, E. P., McCarthy, R. J., Earp, B. D., Elson, M., & Vanpaemel, W. (2018). A unified framework to quantify the credibility of scientific findings. Ad- vances in Methods and Practices in Psychological Science, 1(3), 389–402. https : / / doi . org / 10 . 1177/2515245918787489 Liu, Y., Kale, A., Althoff, T., & Heer, J. (2020). Boba: Authoring and visualizing multiverse analyses. IEEE Transactions on Visualization and Com- puter Graphics, 27(2), 1753–1763. https://doi. org/10.1109/TVCG.2020.3028985 Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Repli- cations in psychology research: How often do they really occur? Perspectives on Psychological Science, 7(6), 537–542. https : / / doi . org / 10 . 1177/1745691612460688 https://doi.org/10.1177/2515245920954925 https://doi.org/10.1177/2515245920954925 https://www.flexiblemeasures.com/ https://www.flexiblemeasures.com/ https://doi.org/10.1177/1745691612460686 https://doi.org/10.1177/1745691612460686 https://doi.org/10.1177/1745691612459057 https://doi.org/10.1177/1745691612459057 https://doi.org/10.1037/1040-3590.19.4.369 https://doi.org/10.1037/1040-3590.19.4.369 https://doi.org/10.1177/1745691620917678 https://doi.org/10.1177/1745691620917678 https://doi.org/10.1098/rsos.180448 https://doi.org/10.1177/2515245917740427 https://doi.org/10.1177/2515245917740427 https://doi.org/10.1525/collabra.35745 https://doi.org/10.1525/collabra.35745 https://doi.org/10.1177/0956797619838763 https://doi.org/10.1177/0956797619838763 https://doi.org/10.1371/journal.pbio.1002456 https://doi.org/10.1371/journal.pbio.1002456 https://doi.org/10.1207/s15328023top3202_1 https://doi.org/10.1207/s15328023top3202_1 https://doi.org/10.1177/2515245918787489 https://doi.org/10.1177/2515245918787489 https://doi.org/10.1109/TVCG.2020.3028985 https://doi.org/10.1109/TVCG.2020.3028985 https://doi.org/10.1177/1745691612460688 https://doi.org/10.1177/1745691612460688 12 Masur, P., & Scharkow, M. (2019). specr: Statistical func- tions for conducting specification curve analyses. https://github.com/masurp/specr Merz, C. J., Dietsch, F., & Schneider, M. (2016). The impact of psychosocial stress on conceptual knowledge retrieval. Neurobiology of Learning and Memory, 134, 392–399. https : / / doi . org / 10.1016/j.nlm.2016.08.020 Moors, P., & Hesselmann, G. (2019). Unconscious arith- metic: Assessing the robustness of the results reported by Karpinski, Briggs, and Yale (2018). Consciousness and Cognition, 68, 97–106. https: //doi.org/10.1016/j.concog.2019.01.003 Morey, R. D., Chambers, C. D., Etchells, P. J., Harris, C. R., Hoekstra, R., Lakens, D., Lewandowsky, S., Morey, C. C., Newman, D. P., Schönbrodt, F. D., Vanpaemel, W., Wagenmakers, E.-J., & Zwaan, R. A. (2016). The Peer Review- ers’ Openness Initiative: Incentivizing open re- search practices through peer review. Royal So- ciety Open Science, 3(1), 150547. https://doi. org/10.1098/rsos.150547 Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mel- lor, D. T. (2018). The preregistration revolu- tion. Proceedings of the National Academy of Sci- ences, 115(11), 2600–2606. https : / / doi . org / 10.1073/pnas.1708274114 Patel, C. J., Burford, B., & Ioannidis, J. P. (2015). Assess- ment of vibration of effects due to model spec- ification can demonstrate the instability of ob- servational associations. Journal of Clinical Epi- demiology, 68(9), 1046–1058. https://doi.org/ 10.1016/j.jclinepi.2015.05.029 R Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statisti- cal Computing. Vienna, Austria. https://www. R-project.org/ Sarma, A., & Kay, M. (2019). multiverse: Explorable Mul- tiverse data analysis and reports in R [R package version 0.1.4]. https : / / CRAN . R - project . org / package=multiverse Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., Bahnik, S., Bai, F., Ban- nard, C., Bonnier, E., Carlsson, R., Cheung, F., Christensen, G., Clay, R., Craig, M. A., Dalla Rosa, A., Dam, L., Evans, M. H., Flores Cer- vantes, I., Nosek, B. A., et al. (2018). Many an- alysts, one data set: Making transparent how variations in analytic choices affect results. Ad- vances in Methods and Practices in Psychological Science, 1(3), 337–356. https : / / doi . org / 10 . 1177/2515245917747646 Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibil- ity in data collection and analysis allows pre- senting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/ 10.1177/0956797611417632 Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Be- haviour, 4, 1208–1214. https : / / doi . org / 10 . 1038/s41562-020-0912-z Smith, A. M. (2020). Acute stress enhances general- knowledge semantic memory. https://doi.org/ 10.17605/OSF.IO/EQ8SY Smith, A. M., Hughes, G. I., Davis, F. C., & Thomas, A. K. (2019). Acute stress enhances general- knowledge semantic memory. Hormones and be- havior, 109, 38–43. https://doi.org/10.1016/j. yhbeh.2019.02.003 Soderberg, C. K. (2018). Using OSF to share data: A step-by-step guide. Advances in Methods and Practices in Psychological Science, 1(1), 115–120. https : / / doi . org / 10 . 1177 / 2515245918757689 Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychologi- cal Science, 11(5), 702–712. https : / / doi . org / 10.1177/1745691616658637 The British Psychological Society. (2019). Standards for the accreditation of undergraduate, conversion and integrated Masters programmes in psychol- ogy. https : / / www . psychologicalsociety . ie / source/Undergraduate%5C%20Accreditation% 5C % 20Guidelines % 5C % 202019update _ file _ 674.pdf Van Dijk, M. (2020). Acute stress enhances semantic memory: The robustness of the findings of Smith, Hughes, Davis, and Thomas (2019). [Unpub- lished bachelor’s thesis]. Leiden University. Van Rijn, L. (2020). The effect of stress on semantic mem- ory: A multiverse approach. [Unpublished bach- elor’s thesis]. Leiden University. Van Wijk, T. (2020). Het effect van stress op semantisch geheugen: Een multiverse benadering. [Unpub- lished bachelor’s thesis]. Leiden University. Vanpaemel, W., Vermorgen, M., Deriemaecker, L., & Storms, G. (2015). Are we wasting a good cri- sis? The availability of psychological research data after the storm. Collabra, 1(1), 3. https : //doi.org/10.1525/collabra.13 Voracek, M., Kossmeier, M., & Tran, U. S. (2019). Which data to meta-analyze, and how? A specification-curve and multiverse-analysis ap- https://github.com/masurp/specr https://doi.org/10.1016/j.nlm.2016.08.020 https://doi.org/10.1016/j.nlm.2016.08.020 https://doi.org/10.1016/j.concog.2019.01.003 https://doi.org/10.1016/j.concog.2019.01.003 https://doi.org/10.1098/rsos.150547 https://doi.org/10.1098/rsos.150547 https://doi.org/10.1073/pnas.1708274114 https://doi.org/10.1073/pnas.1708274114 https://doi.org/10.1016/j.jclinepi.2015.05.029 https://doi.org/10.1016/j.jclinepi.2015.05.029 https://www.R-project.org/ https://www.R-project.org/ https://CRAN.R-project.org/package=multiverse https://CRAN.R-project.org/package=multiverse https://doi.org/10.1177/2515245917747646 https://doi.org/10.1177/2515245917747646 https://doi.org/10.1177/0956797611417632 https://doi.org/10.1177/0956797611417632 https://doi.org/10.1038/s41562-020-0912-z https://doi.org/10.1038/s41562-020-0912-z https://doi.org/10.17605/OSF.IO/EQ8SY https://doi.org/10.17605/OSF.IO/EQ8SY https://doi.org/10.1016/j.yhbeh.2019.02.003 https://doi.org/10.1016/j.yhbeh.2019.02.003 https://doi.org/10.1177/2515245918757689 https://doi.org/10.1177/2515245918757689 https://doi.org/10.1177/1745691616658637 https://doi.org/10.1177/1745691616658637 https://www.psychologicalsociety.ie/source/Undergraduate%5C%20Accreditation%5C%20Guidelines%5C%202019update_file_674.pdf https://www.psychologicalsociety.ie/source/Undergraduate%5C%20Accreditation%5C%20Guidelines%5C%202019update_file_674.pdf https://www.psychologicalsociety.ie/source/Undergraduate%5C%20Accreditation%5C%20Guidelines%5C%202019update_file_674.pdf https://www.psychologicalsociety.ie/source/Undergraduate%5C%20Accreditation%5C%20Guidelines%5C%202019update_file_674.pdf https://doi.org/10.1525/collabra.13 https://doi.org/10.1525/collabra.13 13 proach to meta-analysis. Zeitschrift für Psycholo- gie, 227(1), 64–82. https://doi.org/10.1027/ 2151-2604/a000357 Wagge, J. R., Baciu, C., Banas, K., Nadler, J. T., Schwarz, S., Weisberg, Y., IJzerman, H., Legate, N., & Grahe, J. (2019). A demonstration of the Col- laborative Replication and Education Project: Replication attempts of the red-romance effect. Collabra: Psychology, 5(1), 5. https://doi.org/ 10.1525/collabra.177 Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Will- ingness to share research data is related to the strength of the evidence and the quality of re- porting of statistical results. PloS ONE, 6(11), e26828. https : / / doi . org / 10 . 1371 / journal . pone.0026828 Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psycholo- gist, 61(7), 726–728. https://doi.org/10.1037/ 0003-066X.61.7.726 Wicherts, J. M., Veldkamp, C. L., Augusteijn, H. E., Bakker, M., Van Aert, R., & Van Assen, M. A. (2016). Degrees of freedom in planning, run- ning, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Fron- tiers in Psychology, 7, 1832. https : / / doi . org / 10.3389/fpsyg.2016.01832 Young, C., & Holsteen, K. (2017). Model uncertainty and robustness: A computational framework for multimodel analysis. Sociological Methods & Research, 46(1), 3–40. https : / / doi . org / 10 . 1177/0049124115610347 Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, M. B. (2018). Making replication mainstream. Behav- ioral and Brain Sciences, 41, e120. https://doi. org/10.1017/S0140525X17001972 https://doi.org/10.1027/2151-2604/a000357 https://doi.org/10.1027/2151-2604/a000357 https://doi.org/10.1525/collabra.177 https://doi.org/10.1525/collabra.177 https://doi.org/10.1371/journal.pone.0026828 https://doi.org/10.1371/journal.pone.0026828 https://doi.org/10.1037/0003-066X.61.7.726 https://doi.org/10.1037/0003-066X.61.7.726 https://doi.org/10.3389/fpsyg.2016.01832 https://doi.org/10.3389/fpsyg.2016.01832 https://doi.org/10.1177/0049124115610347 https://doi.org/10.1177/0049124115610347 https://doi.org/10.1017/S0140525X17001972 https://doi.org/10.1017/S0140525X17001972