Meta-Psychology, 2023, vol 7, MP.2021.2932 https://doi.org/10.15626/MP.2021.2932 Article type: Replication Report Published under the CC-BY4.0 license Open data: Yes Open materials: Yes Open and reproducible analysis: Yes Open reviews and editorial process: Yes Preregistration: Yes Edited by: Rickard Carlsson Reviewed by: Joachim Hüffmeier, Lukas Röseler Analysis reproduced by: Jens Fust All supplementary files can be accessed at OSF: https://doi.org/10.17605/OSF.IO/HDXNY We are all less risky and more skillful than our fellow drivers: Successful replication and extension of Svenson (1981) Lina Koppel1, David Andersson1, Gustav Tinghög1,2, Daniel Västfjäll3,4, and Gilad Feldman5 1Department of Management and Engineering, Division of Economics, Linköping University 2Department of Health, Medicine and Caring Sciences, Division of Health Care Analysis, Linköping University 3Department of Behavioral Sciences and Learning, Division of Psychology, Linköping University 4Decision Research, Eugene, OR 5Department of Psychology, University of Hong Kong, Hong Kong SAR The better-than-average effect refers to the tendency to rate oneself as better than the average person on desirable traits and skills. In a classic study, Svenson (1981) asked participants to rate their driving safety and skill compared to other participants in the experiment. Results showed that the majority of participants rated themselves as far above the median, despite the statistical impossibility of more than 50% of participants being above the median. We report a preregistered, well-powered (total N = 1,203), very close replication and extension of the Svenson (1981) study. Our results indicate that the majority of participants rated their driving skill and safety as above average. We added different response scales as an extension and findings were stable across all three measures. Thus, our findings are consistent with the original findings by Svenson (1981). Materials, data, and code are available at https://osf.io/fxpwb/. Keywords: better-than-average effect, self-evaluation, self-enhancement, replication When people are asked to rate themselves on de- sirable traits and skills, most people rate themselves as above average. This is known as the better-than- average effect and has been demonstrated in a variety of domains. In one of the most well-known examples, Svenson (1981) asked participants to rate their safety and skill as drivers compared to other participants in the experiment. Results showed that the majority of participants rated themselves far above the median, de- spite the statistical impossibility of more than 50% to be above the median. Here, we embarked on a pre- registered very close replication and extension of the Svenson study to examine the replicability of the origi- nal finding. The better-than-average effect The better-than-average effect has been demon- strated in a variety of domains and is generally con- sidered a manifestation of self-evaluation bias. Drivers believe that they are better drivers (Svenson, 1981; in- spired by Preston and Harris, 1965), college instruc- tors believe they are better teachers (Cross, 1977), so- cial psychologists believe they are better researchers (Van Lange et al., 1997), couples believe they have bet- ter marriages (Rusbult et al., 2000), and undergradu- ates believe they have better leadership skills, athletic prowess, and ability to get along with others (Brown, 1986). People even believe that they are less biased than others, an effect known as the bias blind spot (Pronin et al., 2002). A recent meta-analysis of 124 published articles found that the better-than-average effect was large and robust across studies (Zell et al., 2020). Although it is closely related to several other bi- ases, including unrealistic optimism (predicting that pos- itive outcomes are more likely and negative outcomes are less likely to happen to oneself compared to oth- ers; Shepperd et al., 2013; Weinstein, 1980) and the Dunning-Kruger effect (overestimating the rank of one’s performance compared to objective measures; Dunning, 2011), the better-than-average effect is unique in that it involves comparing the present self to an average other on a relatively enduring attribute or skill. Much research has been dedicated to finding bound- ary conditions and explanations for the effect (for re- views, see Alicke and Govorun, 2005; Chambers and Windschitl, 2004; Moore and Healy, 2008; Sedikides and Alicke, 2012, 2019; Sedikides and Gregg, 2008; Zell et al., 2020). Yet, to the best of our knowledge, no direct replications exist of the original finding by Sven- son (1981). The importance of replicability has received increasing recognition in the field of psychological sci- ence over the past few years (e.g., Asendorpf et al., https://doi.org/10.15626/MP.2021.2932 https://doi.org/10.17605/OSF.IO/HDXNY https://osf.io/fxpwb/ 2 2013; Brandt et al., 2014; Camerer et al., 2018; Nosek and Errington, 2020; Nosek et al., 2021; Open Science Collaboration, 2015; Zwaan et al., 2017). Replication is considered a cornerstone of science, yet it is only re- cently that researchers have begun to systematically in- vestigate the replicability of published findings. We here revisit the classic phenomenon to examine the replica- bility of the original finding with an independent repli- cation. Choice of study for replication We chose the Svenson (1981) study based on two fac- tors: absence of direct replications and impact. To the best of our knowledge, there are no published direct replications of this study 1 thus far.1 The article has had significant impact on scholarly research in several ar- eas of psychology, including social psychology and judg- ment and decision making. At the time of writing, there were 2,112 citations of the article in Google Scholar. Findings in the original article In the original study by Svenson (1981), participants were asked to rate either their skill or their safety as drivers in relation to other participants in the experi- ment. Data was collected in Sweden (n = 80) and in the US (n = 81) in lab experiments. The results indi- cated that the majority of participants regarded them- selves as more skillful and less risky than the average driver in each group respectively. Among the Swedish participants, 77% ranked their safety as above average and 69% ranked their skill as above average. Among the US participants, 88% ranked their safety as above average and 93% ranked their skill as above average. Adjustments and extensions We had to make several adjustments to the orig- inal design. First, rather than including two differ- ent samples for the two different questions, we ran the questions together in a within- subjects design that would allow us to compare the effects of the two ques- tions and their associated dependent variables. Second, we had to adjust the questionnaire to match the tar- get sample—online American Amazon Mechanical Turk (MTurk) workers. We first introduced a few verification questions to ensure that workers were drivers. Because our study was conducted online, we also had to adjust the reference group. We chose to focus on the US state as the reference group. Third, we had issues with re- producing the question used to elicit rankings and so had to make adjustments. When doing so, we noticed issues with the 10 categories used for percentile ranks (e.g., the midpoint is grouped as 41–50%, and the first category includes a range of 11 percentiles compared to other categories with a range of 10). We therefore added an extension and chose to randomize the depen- dent variable question across three designs: (1) our best estimate of what the target article used, (2) an adjusted 11-item scale with a mid-point indicated as 50% (aver- age), and (3) a simple 7-item Likert scale asking partic- ipants to compare to the average. We compared effects across the three designs. Thus, the use of three different response scales helps to check the robustness of the ef- fect, as minor methodological features can influence the results of hypothesis tests (e.g., Baribault et al., 2018; Landy et al., 2020). Method We report all measures, conditions, data exclusions, and how we determined sample size. Participants A total of 1,203 American Amazon Mechanical Turk (MTurk) participants completed the study using TurkPrime.com (Mage = 40.40, SD = 12.21; 641 fe- males). A comparison of the target article sample and the replication samples is provided in Table 1. An a priori power analysis in G*Power 3.1 (exact test, two- tailed, with 95% power) indicated that 90 participants were needed to obtain the smallest effect size from the original paper, Cohen’s g = 0.19 (see Supplementary Materials). However, a sample size of n = 90 is smaller than the sample size in the original study (n = 161) and is based on an effect size estimate that might be larger than the true effect size. Therefore, we decided to fol- low suggestions from Simonsohn (2015) and aim for 2.5 times the original sample size. The data collection was combined with data collection for a different study (see Chen et al., 2021, Experiment 2) that required a much larger sample size (studies displayed in random- ized order). Participants first consented to participate in the study and were then asked verification questions regarding having a driver’s license, year and location of license, 1At the time of writing, a Google Scholar search for the term “replication” within works citing Svenson (1981) yielded 259 results, but of these, we found none that would count as a very close replication in the framework of LeBel et al. (2017) while also having high statistical power. There are some stud- ies that essentially replicate the original study (Groeger and Brown, 1989; Svenson et al., 1985); however, sample sizes are relatively small and there are methodological differences especially in terms of the response scale format. In sum, al- though not a perfect method, we take the results of our search as a strong likelihood that no close replication has been con- ducted. 3 Table 1 Differences and similarities between samples in original study and replication Svenson (1981) Replication Sample size 161 1,203 Geographic origin US American & Sweden US American Gender Unknown 562 males, 641 females Median age (years) 22 (US), 33 (Sweden) 37 Average age (years) Unknown 40.40 Age range (years) Unknown 18–87 Medium (location) Lab (Sweden & US) Computer (online) Compensation Unknown Nominal payment Year Before 1981 2019 and state of residence. Participants (n = 84) who indi- cated they did not have a drivers’ license were filtered out. Procedure Participants indicated how safe and how skilled they were as drivers (both questions included, displayed in random order). They then answered a funneling sec- tion and provided demographic information (age, gen- der, country of birth, family social class, English under- standing of study), before being debriefed. Measures There were two dependent variables: driving safety and driving skill. The question about safety was phrased as follows: We would like to know what you think about how safely you drive an automobile. All drivers are not equally safe drivers. We want you to compare your own skill to the skills of other people in your state. By definition, there is a least safe and a most safe driver. We want you to indicate your own estimated position among the people in your state. Of course, this is a difficult question because you do not know all the people in your state, much less how safely they drive. But please make the most accurate estimate you can. The question about skill was phrased as follows: We would like to know what you think about how skilled you are at driving an automo- bile. All drivers are not equally skilled drivers. We want you to compare your own skill to the skills of other people in your state. By definition, there is a least skilled and a most skilled driver. We want you to indicate your own estimated position among the people in your state. Of course, this is a difficult question because you do not know all the people in your state, much less how skilled drivers they are. But please make the most accurate estimate you can. For each question, participants indicated their driving safety/skill compared to the average driver in their state using one of the three following response scales: 1. Reproduced materials: “Please indicate how [safely you drive/skilled you are] compared to others by marking your estimated position among drivers in your state” in 10 categories from 0–10% (least safe/skilled drivers) [...] 41-50%, 51-60%, [...] 91–100% (most safe/skilled drivers). 2. 11-item scale to include midpoint (changes un- derlined): Same question as above but with the following scale: 0–9% (least safe/skilled drivers) [...] 40–49%, 50% (average), 51-60%, [...] 91- 100% (most safe/skilled drivers). 3. Standard comparison 7-item Likert scale: “Please indicate how [safely you drive/skilled you are] compared to others by marking your esti- mated position compared to other drivers in your state” (1 = far below average; 4 = average; 7 = far above average). Evaluation criteria for replication Table 2 provides a classification of the replication us- ing criteria by LeBel et al. (2017). We summarize the replication as a “very close replication”. We compare the replication effects with the original effects in the target article using criteria from LeBel et al. (2019). 4 Table 2 Classification of the replication, based on LeBel et al. (2017) Design facet Replication IV operationalization Same DV operationalization Same IV stimuli Same DV stimuli Same Procedural details Different Physical settings Different Contextual variables Different Replication classification Very close replication Data analysis The original article did not include any statistical tests, and the scale and design make it difficult to con- duct such a test. Yet, our best estimation of an analysis is to compare the percentages of participants who an- swered the 50%+ categories and compare those to an expected 50% (binomial test). For the Likert scale, we conducted a one-sample t-test comparing to the mean of 4, the scale midpoint. We examined normality in the distribution of frequencies, including parameters of skewness and kurtosis. Analysis code can be found in the supplementary materials. Results Replication Descriptive statistics of all measures are presented in Table 3. Statistical tests of the hypotheses are summa- rized in Tables 4–5 and plotted in Figures 1–3. The medians for the distributions of safety judgments in Table 3 fall in the interval 71–80%, for both per- centile category response scales. This indicates that half of the participants believed themselves to be among the safest 30 percent of drivers. Over 90% of participants (93% for the reproduced materials and 91% for the ad- justed materials with a 50% midpoint) believed them- selves to be safer than the median driver. Binomial tests against test proportion 0.50 (two- tailed) indicated that this effect was statistically significant, ps < .001 (see Ta- ble 4). In comparison, the original study found that the medians for the distributions of safety judgments fell in the interval 81–90% for the US group and 71–80% for the Swedish group, indicating that half of the partic- ipants believed themselves to be among the safest 20 (US) or 30 (Sweden) percent of the drivers in the two groups respectively. 88% in the US group and 77% in the Swedish group believed themselves to be safer than the median driver. The medians for the distributions of skill judgments in Table 3 fall in the interval 71–80% (both for the re- produced and for the adjusted materials). This indicates that half of the participants believed themselves to be among the most skilled 30 percent of drivers. 91% (for the reproduced materials) and 78% (for the adjusted materials) believed themselves to be more skilled than the median driver. Binomial tests against test propor- tion 0.50 (two-tailed) indicated that this effect was sta- tistically significant, ps < .001 (see Table 4). In com- parison, the original study found that the medians for the distributions of skill judgments fell in the interval 61-70% for the US group and 51-60% for the Swedish group. 93% in the US sample and 69% in the Swedish sample believed themselves to be more skilled than the median driver. When participants rated themselves on a 7-item Lik- ert response scale, they also rated themselves as sig- nificantly safer than average, M = 5.50 (SD = 1.08), t(386) = 27.28, p < .001, g = 1.39, 95% CI [1.25, 1.53], and more skilled than average, M = 5.28 (SD = 1.08), t(378) = 23.13, p < .001, g = 1.19, 95% CI [1.06, 1.32] (see Table 5). Extensions Figure 4 shows the effect size (Hedges’s g) and 95% confidence intervals for each rating scale. For skills rat- ings, the CIs are overlapping in all cases, suggesting no evidence for a difference in the size of the better-than- average effect depending on the type of rating scale used. For safety ratings, the CIs for the two scales that involve percentile categories are overlapping, but the CIs for the Likert scale are slightly lower, suggesting a slightly smaller better-than- average effect when safety is rated on a Likert scale. Nevertheless, the effect is very large in all cases. For effect sizes, confidence intervals, and important study characteristics of the replication, original study, and meta-analysis by Zell et al. (2020), see Supplementary Table S7. Figure 5 shows the mean safety and skills ratings in each state (excluding states with fewer than 5 re- sponses). We find no obvious pattern in the effect across states. However, some states had very few observations and CIs are generally very large, which complicates in- terpretation. Therefore, we chose not to analyze this data further. Exploratory analyses (not pre-registered) A series of exploratory OLS regressions were run to investigate whether participants’ gender, age, and driv- ing experience (i.e., years since driver’s license was obtained) predicted their ratings of driving safety and skill. The regressions also included item order (i.e. 5 Table 3 Proportion of participants in each category Panel A: Percentile categories used in original study N 0-10 11-20 21-30 31-40 41-50 51-60 61–70 71-80 81-90 91-100 Safety 413 0.2% 0.0% 0.5% 1.9% 3.9% 7.5% 15.3% 24.2% 28.3% 18.2% Skill 405 0.2% 0.0% 0.5% 2.2% 5.7% 14.6% 16.0% 26.2% 21.7% 12.8% Panel B: 10-percentile categories with 50% midpoint N 0-9 10-19 20-29 30-39 40-49 50 51-60 61-70 71-80 81-90 91-100 Safety 403 0.0% 0.2% 0.5% 1.2% 1.5% 6.0% 8.2% 13.2% 26.6% 29.5% 13.2% Skill 419 0.0% 0.2% 1.2% 2.9% 2.9% 14.6% 10.0% 16.2% 25.5% 18.4% 8.1% Panel C: Likert response scale N 1 2 3 4 5 6 7 Safety 387 0.5% 2.6% 1.8% 16.3% 25.3% 38.5% 17.3% Skill 379 0.0% 1.3% 3.2% 20.6% 25.9% 39.3% 9.8% Table 4 Summary of statistical tests for the items with percentile categories Category N Observed prop. [95% CI] Test prop. p Interpretation Percentile categories used in original study Safety >50 <50 386 27 0.94 0.06 [0.91, 0.96] 0.50 <.001 Signal – consistent Skill >50 <50 370 35 0.91 0.09 [0.88, 0.94] 0.50 <.001 Signal – consistent 10-percentile categories with 50% midpoint Safety >50 <50 365 38 0.91 0.09 [0.87, 0.93] 0.50 <.001 Signal – consistent Skill >50 <50 328 91 0.78 0.22 [0.74, 0.82] 0.50 <.001 Signal – consistent Note. Binomial tests comparing the percentage of participants who rated their driving safety and skill as above average to an expected 50%. whether participants rated safety or skills first) and study order (i.e., whether participants completed this study or the study reported in Chen et al., 2021 first). The analyses revealed that age and driving experience were associated with both safety and skills ratings, such that the rating increased with increasing age and expe- rience (see Supplementary Tables S1–S6). In addition, there was a significant link between gender and safety ratings using the Likert scale and between gender and skills ratings using the Likert scale and the adjusted ma- terials, indicating that women rated themselves lower. However, there was no such link in the other scales; thus, the results involving gender seem to depend on the response scale format and item content. Including item order and study order in the regression analyses did not alter the interpretation of the effects of gender, age, and driving experience. Item order and study or- der also had no consistent effect on participants’ rat- ings, although completing the Svenson (1981) replica- tion first was associated with higher safety ratings in one of the scales (the reproduced materials) and rat- ing safety before skills was associated with lower safety ratings in another (the Likert scale; see Supplementary Tables S1–S6). Nevertheless, the regression results ad- dress the question of whether gender, age, and driving experience are associated with participants’ ratings of driving safety and skill; they do not address the ques- tion of whether gender, age, and driving experience affect whether participants rate themselves above av- erage. Because the vast majority of participants rated themselves as above average, we did not conduct such an analysis. Finally, we investigated the correlation between skills and safety ratings in the three response scales. This analysis indicated that participants’ skills ratings were positively correlated with their safety ratings in all three scales (original scale: tau = .52, p < .001, n = 122; ad- justed scale with 50% midpoint: tau = .48, p < .001, n = 136; Likert scale: tau = .47, p < .001, n = 121). 6 Table 5 Summary of statistical tests for the Likert scale t df p Mean diff [95% CI] Hedges’s g [95% CI] Interpretation Safety 27.38 386 <.001 1.50 [1.40, 1.61] 1.39 [1.25, 1.53] Signal – consistent Skill 23.13 378 <.001 1.28 [1.17, 1.39] 1.19 [1.06, 1.32] Signal – consistent Note. One-sample t-test, test value: 4. Figure 1 Proportion of participants in each percentile category of safety ratings and skills ratings, using the same percentile categories as the original article. mean = 8.1 test = 5.510% 20% 30% 1 (0 −1 0) 2 (1 1− 20 ) 3 (2 1− 30 ) 4 (3 1− 40 ) 5 (4 1− 50 ) 6 (5 1− 60 ) 7 (6 1− 70 ) 8 (7 1− 80 ) 9 (8 1− 90 ) 10 (9 1− 10 0) Rating (percentile category) P e rc e n ta g e Safety ratings mean = 7.7 test = 5.510% 20% 30% 1 (0 −1 0) 2 (1 1− 20 ) 3 (2 1− 30 ) 4 (3 1− 40 ) 5 (4 1− 50 ) 6 (5 1− 60 ) 7 (6 1− 70 ) 8 (7 1− 80 ) 9 (8 1− 90 ) 10 (9 1− 10 0) Rating (percentile category) P e rc e n ta g e Skills ratings Figure 2 Proportion of participants in each percentile category of safety ratings and skills ratings, using the adjusted percentile categories. mean = 8.9 test = 6.0 10% 20% 30% 1 (0 −9 ) 2 (1 0− 19 ) 3 (2 0− 29 ) 4 (3 0− 39 ) 5 (4 0− 49 ) 6 (5 0) 7 (5 1− 60 ) 8 (6 1− 70 ) 9 (7 1− 80 ) 10 (8 1− 90 ) 11 (9 1− 10 0) Rating (percentile category) P e rc e n ta g e Safety ratings mean = 8.2 test = 6.0 10% 20% 30% 1 (0 −9 ) 2 (1 0− 19 ) 3 (2 0− 29 ) 4 (3 0− 39 ) 5 (4 0− 49 ) 6 (5 0) 7 (5 1− 60 ) 8 (6 1− 70 ) 9 (7 1− 80 ) 10 (8 1− 90 ) 11 (9 1− 10 0) Rating (percentile category) P e rc e n ta g e Skills ratings 7 Figure 3 Proportion of participants in each percentile category of safety ratings and skills ratings, using the Likert scale. mean = 5.5 test = 4.010% 20% 30% 40% 1 2 3 4 5 6 7 Rating P e rc e n ta g e Safety ratings mean = 5.3 test = 4.010% 20% 30% 40% 1 2 3 4 5 6 7 Rating P e rc e n ta g e Skills ratings Figure 4 Effect sizes (Hedges’s g) and 95% CIs for each rating scale. Discussion We embarked on a preregistered replication and ex- tension of a classic phenomenon in the judgment and decision-making literature known as the better-than- average effect. The original article found that the ma- jority of participants reported that they were safer and more skilled than the average driver (Svenson, 1981). The findings from our replication are consistent with the original findings. That is, the majority of partici- pants rated their driving safety and skill as above the median. Results were stable across three different re- sponse scales: our best estimate of the original materi- als, an adjusted scale with a 50% midpoint, and a 7-item Likert scale. Our replication adds to a larger literature investigat- ing the replicability of published research in psycholog- ical science (e.g., Camerer et al., 2018; Open Science Collaboration, 2015). Importantly, our study design closely follows the original study by Svenson (1981) and thereby classifies as a very close replication accord- ing to replication criteria by LeBel et al. (2017). Re- cently, Ziano et al., 2020 conducted a replication of an- other classic study on the better-than- average effect (Alicke, 1985), which indicated that college students’ ratings of how characteristic a trait was of them (vs. an average student) increased with increasing desirability of the trait, and that this effect was stronger among more controllable traits. Findings from Ziano et al. (2020) were consistent with the original findings. In sum, findings from the present study are in line with the view of the better-than-average effect as a robust phenomenon. 8 Maryland (n= 6 ) Iowa (n= 5 ) Colorado (n= 6 ) Pennsylvania (n= 25 ) Illinois (n= 19 ) New York (n= 17 ) Minnesota (n= 5 ) Kentucky (n= 6 ) Arkansas (n= 5 ) Connecticut (n= 7 ) Oregon (n= 9 ) Texas (n= 31 ) California (n= 33 ) Indiana (n= 9 ) Washington (n= 13 ) Georgia (n= 17 ) Mississippi (n= 5 ) Utah (n= 5 ) Virginia (n= 13 ) Michigan (n= 20 ) North Carolina (n= 12 ) South Carolina (n= 7 ) Ohio (n= 23 ) Florida (n= 30 ) Tennessee (n= 9 ) Alabama (n= 6 ) Arizona (n= 9 ) Nevada (n= 5 ) New Jersey (n= 10 ) Missouri (n= 7 ) Oklahoma (n= 7 ) Massachusetts (n= 7 ) 6 8 10 Safety rating using original scale Maryland (n= 8 ) Wisconsin (n= 8 ) Arkansas (n= 6 ) Pennsylvania (n= 23 ) Kentucky (n= 8 ) Arizona (n= 9 ) Iowa (n= 9 ) Illinois (n= 17 ) Texas (n= 26 ) Missouri (n= 10 ) New York (n= 18 ) Oregon (n= 5 ) Ohio (n= 21 ) Florida (n= 39 ) Rhode Island (n= 5 ) Washington (n= 10 ) North Carolina (n= 16 ) Georgia (n= 11 ) Massachusetts (n= 8 ) Tennessee (n= 11 ) New Jersey (n= 12 ) Indiana (n= 10 ) Virginia (n= 8 ) California (n= 26 ) West Virginia (n= 5 ) Oklahoma (n= 5 ) South Carolina (n= 7 ) Colorado (n= 8 ) Alabama (n= 11 ) Michigan (n= 13 ) 6 8 10 Skill rating using original scale Kentucky (n= 6 ) Florida (n= 33 ) New Jersey (n= 7 ) Kansas (n= 5 ) Georgia (n= 12 ) Massachusetts (n= 9 ) Oregon (n= 5 ) Maryland (n= 7 ) Illinois (n= 18 ) Pennsylvania (n= 25 ) Missouri (n= 13 ) Colorado (n= 6 ) Michigan (n= 21 ) Tennessee (n= 11 ) Washington (n= 7 ) Wisconsin (n= 6 ) New York (n= 26 ) North Carolina (n= 19 ) California (n= 29 ) Minnesota (n= 6 ) Ohio (n= 14 ) Indiana (n= 8 ) Texas (n= 21 ) Virginia (n= 17 ) Arizona (n= 13 ) Alabama (n= 9 ) 6 8 10 12 Safety rating using adjusted scale with midpoint Louisiana (n= 6 ) Oregon (n= 8 ) Illinois (n= 27 ) Colorado (n= 7 ) New York (n= 27 ) Missouri (n= 9 ) North Carolina (n= 14 ) New Jersey (n= 8 ) Nevada (n= 6 ) Florida (n= 31 ) Pennsylvania (n= 28 ) Michigan (n= 25 ) Minnesota (n= 6 ) Arizona (n= 14 ) Georgia (n= 22 ) Oklahoma (n= 8 ) Massachusetts (n= 5 ) Tennessee (n= 7 ) Virginia (n= 15 ) California (n= 37 ) South Carolina (n= 5 ) Indiana (n= 6 ) Texas (n= 25 ) Wisconsin (n= 6 ) Ohio (n= 16 ) 4 6 8 10 Skill rating using adjusted scale with midpoint Maryland (n= 5 ) Missouri (n= 10 ) Virginia (n= 10 ) Georgia (n= 13 ) Kentucky (n= 8 ) Tennessee (n= 8 ) Pennsylvania (n= 18 ) New York (n= 25 ) Ohio (n= 11 ) Wisconsin (n= 11 ) Illinois (n= 21 ) Indiana (n= 5 ) Massachusetts (n= 5 ) Minnesota (n= 5 ) North Carolina (n= 16 ) Texas (n= 24 ) New Jersey (n= 11 ) Connecticut (n= 5 ) Oklahoma (n= 5 ) Florida (n= 35 ) Michigan (n= 17 ) Colorado (n= 7 ) California (n= 30 ) West Virginia (n= 5 ) Oregon (n= 11 ) Nevada (n= 7 ) Washington (n= 8 ) Arizona (n= 11 ) South Carolina (n= 6 ) 3 4 5 6 7 Safety rating using Likert scale Tennessee (n= 10 ) Missouri (n= 11 ) Connecticut (n= 8 ) Michigan (n= 20 ) Hawaii (n= 6 ) Kentucky (n= 8 ) Nevada (n= 7 ) Texas (n= 25 ) Wisconsin (n= 7 ) South Carolina (n= 5 ) Ohio (n= 11 ) Minnesota (n= 7 ) Virginia (n= 17 ) Florida (n= 28 ) Georgia (n= 9 ) Illinois (n= 14 ) Massachusetts (n= 8 ) California (n= 29 ) Utah (n= 5 ) North Carolina (n= 17 ) Pennsylvania (n= 17 ) Maryland (n= 6 ) Oregon (n= 12 ) New York (n= 23 ) Arizona (n= 10 ) Washington (n= 14 ) Indiana (n= 6 ) Mississippi (n= 6 ) New Jersey (n= 8 ) 4 5 6 7 Skill rating using Likert scale Figure 5 Mean safety and skills ratings in each state (excluding states with fewer than 5 observations). Error bars represent 95% CIs. 9 Author Contact Lina Koppel, ORCID: 0000-0002-6302-0047. Gustav Tinghög, ORCID: 0000-0002-8159-1249. Daniel Väst- fjäll, ORCID: 0000-0003-2873-4500. Correspondence: Gilad Feldman, Department of Psychology, University of Hong Kong, Hong Kong SAR; gfeldman@hku.hk; OR- CID: 0000-0003-2812-6599 Conflict of Interest and Funding The author(s) declared no potential conflicts of inter- ests with respect to the authorship and/or publication of this article. The author(s) received no financial support for the research and/or authorship of this article. Author Contributions Role LK DA GT DV GF Conceptualization X X X X X Pre-registration X X X X X Data curation X Formal analysis X X X Funding acquisition X Investigation X X X X X Pre-registration peer review/verification X X X X X Data analysis peer review/verification X X X X X Methodology X X X X X Project administration X X Resources X Software X X X X X Supervision X Validation X X X X X Visualization X X X Writing – original draft X Writing – review and editing X X X X X Target article Svenson, O. (1981). Are we all less risky and more skillful than our fellow drivers? Acta Psycho- logica, 47, 143-148. https://doi.org/10.1016/0001- 6918(81)90005-6. Links to project files Project page on OSF with datasets and code: https://osf.io/fxpwb/. Pre-registration (including ma- terials and analysis code): https://osf.io/jky24. Acknowledgments We thank members of the JEDI Lab for valuable con- tributions to this study during a workshop at Linköping University in August 2019. Open Science Practices This article earned the Preregistration+, Open Data and the Open Materials badge for preregistering the hypothesis and analysis before data collection, and for making the data and materials openly available. It has been verified that the analysis reproduced the results presented in the article. The entire editorial process, including the open reviews, are published in the online supplement. References Alicke, M. D. (1985). Global self-evaluation as deter- mined by the desirability and controllability of trait adjectives. Journal of Personality and Social Psychology, 49(6), 1621–1630. https://doi.org/ 10.1037/0022-3514.49.6.1621 Alicke, M. D., & Govorun, O. (2005). The better-than- average effect. In M. D. Alicke, D. A. Dunning, & L. E. Krueger (Eds.), The self in social judg- ment (pp. 85–106). Psychology Press. Asendorpf, J. B., Conner, M., De Fruyt, F., De Houwer, J., Denissen, J. J., Fiedler, K., Fiedler, S., Fun- der, D. C., Kliegl, R., Nosek, B. A., Perugini, M., Roberts, B. W., Schmitt, M., Van Aken, M. A., Weber, H., & Wicherts, J. M. (2013). Recom- mendations for increasing replicability in psy- chology. European Journal of Personality, 27(2), 108–119. https://doi.org/10.1002/per.1919 Baribault, B., Donkin, C., Little, D. R., Trueblood, J. S., Oravecz, Z., Van Ravenzwaaij, D., White, C. N., De Boeck, P., & Vandekerckhove, J. (2018). Metastudies for robust tests of theory. Proceed- ings of the National Academy of Sciences of the United States of America, 115(11), 2607–2612. https://doi.org/10.1073/pnas.1708285114 Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., Grange, J. A., Perugini, M., Spies, J. R., & van ’t Veer, A. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimen- tal Social Psychology, 50(1), 217–224. https:// doi.org/10.1016/j.jesp.2013.10.005 https://doi.org/10.1016/0001-6918(81)90005-6 https://doi.org/10.1016/0001-6918(81)90005-6 https://osf.io/fxpwb/ https://osf.io/jky24 https://doi.org/10.1037/0022-3514.49.6.1621 https://doi.org/10.1037/0022-3514.49.6.1621 https://doi.org/10.1002/per.1919 https://doi.org/10.1073/pnas.1708285114 https://doi.org/10.1016/j.jesp.2013.10.005 https://doi.org/10.1016/j.jesp.2013.10.005 10 Brown, I. D. (1986). Evaluations of self and others: Self- enhancement biases in social judgments. Social Cognition, 4(4), 353–376. https://doi.org/10. 1521/soco.1986.4.4.353 Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., But- trick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., . . . Wu, H. (2018). Evaluating the replicability of social science experiments in nature and sci- ence between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644. https://doi.org/10. 1038/s41562-018-0399-z Chambers, J. R., & Windschitl, P. D. (2004). Bi- ases in social comparative judgments: The role of nonmotivated factors in above-average and comparative-optimism effects. Psychologi- cal Bulletin, 130(5), 813–838. https://doi.org/ 10.1037/0033-2909.130.5.813 Chen, J., Hui, L. S., Yu, T., Feldman, G., Zeng, S., Ching, T. L., Ng, C. H., Wu, K. W., Yuen, C. M., Lau, T. K., Cheng, B. L., & Ng, K. W. (2021). Foregone opportunities and choosing not to act: Replications of inaction inertia ef- fect. Social Psychological and Personality Science, 12(3), 333–345. https : / / doi . org / 10 . 1177 / 1948550619900570 Cross, K. P. (1977). Not can, but will college teaching be improved? New Directions for Higher Educa- tion, 17, 1–15. https : / / doi . org / 10 . 1002 / he . 36919771703 Dunning, D. (2011). The dunning-kruger effect: On be- ing ignorant of one’s own ignorance. In J. M. Olson & M. P. Zanna (Eds.), Advances in exper- imental social psychology (pp. 247–296). Aca- demic Press. https://doi.org/10.1016/B978-0- 12-385522-0.00005-6 Groeger, J., & Brown, I. (1989). Assessing one’s own and others’ driving ability: Influences of sex, age, and experience. Accident Analysis Preven- tion, 21(2), 155–168. https : / / doi . org / 10 . 1016/0001-4575(89)90083-3 Landy, J., Jia, M., Ding, I., Viganola, D., Tierney, W., Dreber, A., Johanneson, M., Pfeiffer, T., Eber- sole, C., Gronau, Q., Ly, A., van den Bergh, D., Marsman, M., Derks, K., Wagenmakers, E.-J., Proctor, A., Bartels, D. M., Bauman, C. W., Brady, W. J., . . . Uhlmann, E. L. (2020). Crowd-sourcing hypothesis tests: Making trans- parent how design choices shape research re- sults. SSRN Electronic Journal, 146(5), 451– 479. https://doi.org/10.2139/ssrn.3654406 LeBel, E., Berger, D., Campbell, L., & Loving, T. (2017). Falsifiability is not optional. Journal of Person- ality and Social Psychology, 113(2), 254–261. https://doi.org/10.1037/pspi0000106 LeBel, E., Vanpaemel, W., Cheung, I., & Campbell, L. (2019). A brief guide to evaluate replications. Meta-Psychology, 3. https://doi.org/10.15626/ mp.2018.843 Moore, D. A., & Healy, P. J. (2008). The trouble with overconfidence. Psychological Review, 115(2), 502–517. https : / / doi . org / 10 . 1037 / 0033 - 295X.115.2.502 Nosek, B. A., & Errington, T. M. (2020). What is repli- cation? PLoS Biology, 18(3), 1–8. https://doi. org/10.1371/journal.pbio.3000691 Nosek, B. A., Hardwicke, T. E., Corker, K. S., & Rohrer, J. (2021). Replicability , robustness , and repro- ducibility in psychological science. Annual Re- view of Psychology. https://doi.org/10.31234/ osf.io/ksfvq Open Science Collaboration. (2015). Estimating the re- producibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/ science.aac4716 Preston, C. E., & Harris, S. (1965). Psychology of drivers in traffic accidents. Journal of Applied Psychol- ogy, 49(4), 284–288. https://doi.org/10.1037/ h0022453 Pronin, E., Lin, D. Y., & Ross, L. (2002). The bias blind spot: Perceptions of bias in self versus oth- ers. Personality and Social Psychology Bulletin, 28(3), 369–381. https : / / doi . org / 10 . 1177 / 0146167202286008 Rusbult, C. E., Van Lange, P. A., Wildschut, T., Yovetich, N. A., & Verette, J. (2000). Perceived superior- ity in close relationships: Why it exists and per- sists. Journal of Personality and Social Psychol- ogy, 79(4), 521–545. https://doi.org/10.1037/ 0022-3514.79.4.521 Sedikides, C., & Alicke, M. D. (2012). Self-enhancement and self-protection motives. In R. M. Ryan (Ed.), Oxford handbook of motivation (pp. 303– 322). Oxford University Press. https://doi.org/ 10.7873/date.2014.002 Sedikides, C., & Alicke, M. D. (2019). The five pillars of self-enhancement and self-protection. In M. Ryan (Ed.), The oxford handbook of human mo- tivation (2nd ed., pp. 307–319). Oxford Univer- sity Press. Sedikides, C., & Gregg, A. P. (2008). Self-enhancement: Food for thought. Perspectives on Psychological Science, 3(2), 102–116. https : / / doi . org / 10 . 1111/j.1745-6916.2008.00068.x https://doi.org/10.1521/soco.1986.4.4.353 https://doi.org/10.1521/soco.1986.4.4.353 https://doi.org/10.1038/s41562-018-0399-z https://doi.org/10.1038/s41562-018-0399-z https://doi.org/10.1037/0033-2909.130.5.813 https://doi.org/10.1037/0033-2909.130.5.813 https://doi.org/10.1177/1948550619900570 https://doi.org/10.1177/1948550619900570 https://doi.org/10.1002/he.36919771703 https://doi.org/10.1002/he.36919771703 https://doi.org/10.1016/B978-0-12-385522-0.00005-6 https://doi.org/10.1016/B978-0-12-385522-0.00005-6 https://doi.org/10.1016/0001-4575(89)90083-3 https://doi.org/10.1016/0001-4575(89)90083-3 https://doi.org/10.2139/ssrn.3654406 https://doi.org/10.1037/pspi0000106 https://doi.org/10.15626/mp.2018.843 https://doi.org/10.15626/mp.2018.843 https://doi.org/10.1037/0033-295X.115.2.502 https://doi.org/10.1037/0033-295X.115.2.502 https://doi.org/10.1371/journal.pbio.3000691 https://doi.org/10.1371/journal.pbio.3000691 https://doi.org/10.31234/osf.io/ksfvq https://doi.org/10.31234/osf.io/ksfvq https://doi.org/10.1126/science.aac4716 https://doi.org/10.1126/science.aac4716 https://doi.org/10.1037/h0022453 https://doi.org/10.1037/h0022453 https://doi.org/10.1177/0146167202286008 https://doi.org/10.1177/0146167202286008 https://doi.org/10.1037/0022-3514.79.4.521 https://doi.org/10.1037/0022-3514.79.4.521 https://doi.org/10.7873/date.2014.002 https://doi.org/10.7873/date.2014.002 https://doi.org/10.1111/j.1745-6916.2008.00068.x https://doi.org/10.1111/j.1745-6916.2008.00068.x 11 Shepperd, J. A., Klein, W. M., Waters, E. A., & Wein- stein, N. D. (2013). Taking stock of unrealistic optimism. Perspectives on Psychological Science, 8(4), 395–411. https : / / doi . org / 10 . 1177 / 1745691613485247 Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psy- chological Science, 26(5), 559–569. https://doi. org/10.1177/0956797614567341 Svenson, O. (1981). Are we all less risky and more skill- ful than our fellow drivers? Acta Psychologica, 47, 143–148. https://doi.org/10.1016/0001- 6918(81)90005-6 Svenson, O., Fischhoff, B., & Macgregor, D. (1985). Per- ceived driving safety and seatbelt usage. Acci- dent; analysis and prevention, 17(2), 119–113. https : / / doi . org / 10 . 1016 / 0001 - 4575(85 ) 90015-6 Van Lange, P. A., Taris, T. W., & Vonk, R. (1997). Dilem- mas of academic practice: Perceptions of su- periority among social psychologists. European Journal of Social Psychology, 27(6), 675–685. https : / / doi . org / 10 . 1002 / (SICI ) 1099 - 0992(199711 / 12 ) 27 : 6{\textless } 675 :: AID - EJSP838{\textgreater}3.0.CO;2-F Weinstein, N. D. (1980). Unrealistic optimism about fu- ture life events. Journal of Personality and Social Psychology, 39(5), 806–820. https : / / doi . org / 10.1037/0022-3514.39.5.806 Zell, E., Strickhouser, J. E., Sedikides, C., & Alicke, M. D. (2020). The better-than-average effect in com- parative self-evaluation: A comprehensive re- view and meta-analysis. Psychological Bulletin, 146(2), 118–149. https : / / doi . org / 10 . 1037 / bul0000218 Ziano, I., Mok, P. Y., & Feldman, G. (2020). Repli- cation and extension of alicke (1985) better- than-average effect for desirable and control- lable traits. Social Psychological and Personal- ity Science. https : / / doi . org / 10 . 1177 / 1948550620948973 Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, M. B. (2017). Making replication mainstream. Behav- ioral and Brain Sciences, 41, 1–50. https://doi. org/10.1017/S0140525X17001972 https://doi.org/10.1177/1745691613485247 https://doi.org/10.1177/1745691613485247 https://doi.org/10.1177/0956797614567341 https://doi.org/10.1177/0956797614567341 https://doi.org/10.1016/0001-6918(81)90005-6 https://doi.org/10.1016/0001-6918(81)90005-6 https://doi.org/10.1016/0001-4575(85)90015-6 https://doi.org/10.1016/0001-4575(85)90015-6 https://doi.org/10.1002/(SICI)1099-0992(199711/12)27:6{\textless}675::AID-EJSP838{\textgreater}3.0.CO;2-F https://doi.org/10.1002/(SICI)1099-0992(199711/12)27:6{\textless}675::AID-EJSP838{\textgreater}3.0.CO;2-F https://doi.org/10.1002/(SICI)1099-0992(199711/12)27:6{\textless}675::AID-EJSP838{\textgreater}3.0.CO;2-F https://doi.org/10.1037/0022-3514.39.5.806 https://doi.org/10.1037/0022-3514.39.5.806 https://doi.org/10.1037/bul0000218 https://doi.org/10.1037/bul0000218 https://doi.org/10.1177/1948550620948973 https://doi.org/10.1177/1948550620948973 https://doi.org/10.1017/S0140525X17001972 https://doi.org/10.1017/S0140525X17001972 The better-than-average effect Choice of study for replication Findings in the original article Adjustments and extensions Method Participants Procedure Measures Evaluation criteria for replication Data analysis Results Replication Extensions Exploratory analyses (not pre-registered) Discussion Author Contact Conflict of Interest and Funding Author Contributions Target article Links to project files Acknowledgments Open Science Practices