MP.2018.880.Imhoff_20190218 Meta-Psychology, 2019, vol 3, MP.2018.880, https://doi.org/10.15626/MP.2018.880 Article type: File drawer report Published under the CC-BY4.0 license Open data: Yes Open materials: Yes Open and reproducible analysis: Yes Open reviews and editorial process: Yes Preregistration: No Edited by: Rickard Carlsson Reviewed by: Lee Jussim, Åse Innes-Ker, Ulrich Schimmack Analysis reproduced by: Tobias Mühlmeister All supplementary files can be accessed at the OSF project page: https://doi.org/10.17605/OSF.IO/Z6SDM In Search of Experimental Evidence for Secondary Antisemitism : A File Drawer Report Roland Imhoff Johannes Gutenberg University Mainz, Germany; Social Cognition Center Cologne, Germany Mario Messer Social Cognition Center Cologne, Germany In 1955, Adorno attributed antisemitic sentiments voiced by Germans to a paradox projection: The only latently experienced feelings of guilt were warded off by antisemitic defense mecha- nisms. Similar predictions of increases in antisemitic prejudice in response to increased Holo- caust salience follow from other theoretical apparatuses (e.g., social identity theory as well as just-world theory). Based on the – to the best of our knowledge – only experimental evidence for such an effect (published in Psychological Science in 2009), the present research reports a series of studies originally conducted to better understand the contribution of the different assumed mechanisms. In light of a failure to replicate the basic effect, however, the studies shifted to an effort to demonstrate the basic process. We report all studies our lab has con- ducted on the issue. Overall, the data did not provide any evidence for the original effect. In addition to the obvious possibility of an original false positive, we speculate what might be responsible for this conceptual replication failure. Keywords: file drawer; secondary antisemitism; victim blaming; guilt defense; replication Back in 2007, we conducted an experimental study to test the widespread notion that ongoing reminders of Jewish suffering due to Nazi crimes will evoke some kind of prejudicial reaction in Germans, a defensive “secondary” antisemitism. The (in hindsight severely underpowered) study “worked” perfectly: Reminding German participants of ongoing Jewish suffering led to an increase in antisemitism (compared to baseline), but only if they felt that untruthful (but socially desirable) responding was futile as we would detect such lies. All built-in validity checks almost made perfect sense. We had never seen such a pretty data pattern before (and never thereafter) and were very happy when others agreed and the paper got accepted for publication in Psychological Science (Imhoff & Banse, 2009). Fueled by this success, we applied for and received a grant to explore this fascinating effect in more detail. The origi- nal plan to infer the underlying theoretical process by identifying moderators and mediators failed, however, as we could not even replicate the basic effect. The fol- lowing is the tale of a long series of (mostly conceptual) non-replications. We will summarize the theoretical background of our original study, explain the goals we had with an expansion of the line of research, and de- scribe a total of eight studies intended to replicate and The reported research and preparation of this paper was supported by a Deutsche Forschungsgemeinschaft (DFG) grant (IM147/1-1) awarded to Roland Imhoff. We thank Claudia Beck, Maren-Julia Boden, Lena Drees, Laura Melzer, Nanette Münnich, and Ben Sturm for help with data collection and Amanda Seyle Jones for help in editing the manuscript. Correspondence should be addressed to Roland Imhoff via ro- land.imhoff@uni-mainz.de. IMHOFF & MESSER 2 expand the original findings (Studies 1 and 2) or em- pirically address the failure to replicate the basic finding (Studies 3a to 5). The notion of secondary antisemitism is a highly popular concept across several disciplines. Although there are nuances in how exactly it was conceptualized, most definitions encapsulate the idea of an antisemitism not despite but because of the Holocaust. Briefly after World War II (WWII) and the Nazi’s efforts to literally annihilate Jews all over Europe, Peter Schönbach (1961) observed remarkable levels of antisemitism in German youths. This seemed to be puzzling as the now widespread awareness of the antisemitic atrocities com- mitted only a few years earlier should have served as a potent warning sign against all forms of antisemitism. He thus proposed that the adolescents knew about their parents’ complicity (guilt by either action or omission) in the actions of the Nazi regime and had to somehow cope with this knowledge or – psychologically speaking – the experienced dissonance of loving their parents but associating them with such horrific actions. To do so, they – according to Schönbach – were more or less forced to rewarm the Nazi regime’s antisemitic propa- ganda to generate justifications for their parents’ de- meanor. Adorno (1955) made similar observations in his interpretation of group discussions organized by the Frankfurt Institute for Social Research and his explana- tion was also similar: The participating adults, so he ar- gued, had feelings of latent guilt for what happened during the Holocaust and had to – psycho-dynamically speaking – project this guilt onto the victims (Jews) to alleviate these feelings. Although this version of anti- semitism as a defense mechanism is the most common interpretation of Adorno’s reasoning (as also reflected in synonyms like “Schuldabwehrantisemitismus”, a de- fense-against-guilt-antisemitism; Bergmann, 2006), Adorno’s writing also point to another explanation (that he never explicates as an alternative mechanism): an identity management account. Over the years, these identity concerns moved to the core of current under- standing of secondary antisemitism as an antisemitism borne out of the outrage that Jews’ insistence on re- membering what happened spoils the positive identity of being German. This has been most famously coined in a quip (ascribed to Israeli psychoanalyst Zvi Rex): “The Germans will never forgive the Jews for Ausch- witz” (Buruma, 2003). Of course, this mechanism is not only a well-estab- lished figure in the political arena but makes a lot of sense against the background of a plethora of psycho- logical theories. Blaming innocent victims is a central aspect of just-world theory (Lerner, 1980), whereby construing victims as negative and undeserving helps to uphold the illusion that the world is a just place (Cor- reia & Vala, 2003; Friedman & Austin, 1978). Likewise, from a social identity perspective, derogating outgroup victims is functional to attenuate threats to the moral value of the ingroup (Branscombe, Schmitt, & Schiff- hauer, 2007; Castano & Giner-Sorolla, 2006). System Justification Theory (Jost, Banaji, & Nosek, 2004) inte- grates many of these tenets to postulate that rationaliz- ing the status quo (e.g., the ongoing suffering of Holo- caust victims by finding fault in their character) may help reduce guilt, dissonance, and discomfort (Jost & Hunyady, 2002). Despite these many theoretical lines allowing the same prediction, the very core idea of secondary anti- semitism had never been experimentally tested. Exist- ing work on the issue was predominantly non-psycho- logical and based on secondary antisemitism as a rhet- oric rather than a process. These studies invited re- spondents to indicate their agreement with statements that encapsulated what researchers understood as sec- ondary antisemitism. Prominent examples are items like “Jews should stop complaining about what happened to them in Nazi Germany” (Selznick & Steinberg, 1969), “The Jews exploit remembrance of the Holocaust for their own benefit” (Heitmeyer, 2006), or “I am tired of continuously hearing about German crimes against Jews” (Bergmann, 2006). Although such utterances may well reflect what has been conceptualized as sec- ondary antisemitism, agreement with them is not indic- ative of the underlying process. It is, for instance, con- ceivable that a respondent just dislikes Jews in general, without any specific emphasis on the Holocaust. This respondent will certainly agree with these statements as they communicate the negativity he or she sees in Jews, but this agreement will not be the result of the need to alleviate guilt or defend one’s ingroup’s moral value. In fact, the very same argument could be made regarding the original participants in the studies by the Frankfurt Institute for Social Research. Maybe they were antise- mitic during WWII and continued to be antisemitic thereafter without any indirect mediation via latent guilt or the need to justify their parents. The fact that subscales tapping into agreement with traditional forms of antisemitism (e.g., “Jews have too much power and influence in this world”; Weil, 1985) and secondary an- tisemitism correlate up to r = .84 at a latent level (Im- hoff, 2010) adds further fuel to this fire. We thus aimed to provide experimental evidence for secondary antisemitism as a process rather than a rhet- oric. As a way to induce feelings of (collective) guilt or uneasiness about German atrocities, we aimed to make IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 3 Holocaust victims‘ ongoing suffering salient with the ex- pectation that the salience should increase antisemitism as a form of victim derogation (to alleviate guilt or see the world as just or one’s group as moral). Something about this prediction, however, did not feel right. Clearly, telling people how much a certain group suffers should somehow increase the threshold to devalue the group, as suffering is expected to evoke sympathy (Hei- der, 1958) rather than derogation. We aimed to resolve this by reaching into the bag of tricks of social psycholo- gists: Maybe people did have this sentiment but did not express it because social norms prevented them from doing so. So all we needed was a way to block the in- fluence of such norms. If people had a feeling that we could know what they actually felt, then socially desir- able (but dishonest) responding was futile since we would not only find out about their prejudice anyway, but would also see that they are liars (a double norm violation). This sums up the logic of bogus pipeline pro- cedures, which allegedly detect dishonest responding and thus lead participants to respond truthfully to avoid the double norm violation described above. So, this was how we proceeded: We asked as many of our undergraduate psychology students as we could find (a whopping 70 participants) to indicate their agreement with 29 statements of antisemitism as part of a larger paper-and-pencil test (your infamous “mass testing”). Three months later, they were invited to par- ticipate in individual testing sessions and 63 of them agreed and showed up for an experiment involving two independent variables manipulated between subjects: Was the suffering of Holocaust victims described as hav- ing ongoing negative consequences for them and their descendants (Ongoing Suffering: yes/no)? Were partic- ipants hooked up to (slightly outdated) EEG machinery and a hand palm electrode with the information that this would help us detect untruthful responding (Bogus Pipeline: yes/no)? Afterwards, participants wrote down all the thoughts they had while reading the text, then completed a measure of implicit antisemitism, the same antisemitism scale as three months earlier, and a ma- nipulation check item to make sure that they had indeed read the initial text (“Please briefly recall the introduc- tory text. Did it mention ongoing consequences for the victims?”). When we finally looked at the results, they were beautiful – everything looked exactly as it “should”. We had an unexpectedly large number of failed manipula- tion checks (15 people), but the pattern made perfect sense (in hindsight): Almost all of these wrong re- sponses came from the ongoing suffering conditions (13 people). Thus, instead of derogating the victims to alle- viate guilt, they just refused to even take note of the ongoing suffering. The remaining 48 participants, how- ever, showed exactly the pattern we expected (Figure 2, left panel). Without mention of ongoing suffering, the level of antisemitism stayed more or less the same (op- erationalized as standardized residuals of predicting Time 2 antisemitism from Time 1 antisemitism; r = .89). Mentioning ongoing suffering, however, de- creased the expression of antisemitic prejudice in the control condition but led to an increase when attached to a bogus pipeline. The results were even significant despite the small sample, but clearly, the strategy of controlling for baseline antisemitism made our measure very sensitive. There were more details in the data that added to the picture of a perfect study: The correlation between implicit and explicit antisemitism was inde- pendently moderated by the bogus pipeline condition and a Time 1 measurement of the motivation to control prejudiced reactions (Banse & Gawronski, 2003), fur- ther validating the experimental procedure and the data in general. Presenting this study at conferences in the following months was awarded with a lot of positive feedback that boosted our confidence to reach high with this one: We submitted to Psychological Science and received the happy news roughly 11 weeks later: “In both its subject matter as in its empirical approach, your paper is (in my humble opinion) a prototypical Psychological Science paper: It reports on a phenomenon that many people think or have heard about but does so in a way that makes this phenomenon more worthwhile, more im- portant, and much more consequential than lay psy- chology would have predicted.” Sure, the reviewers still had critical comments; None, however, referred to sam- ple size. We resubmitted the manuscript within 10 days and it was accepted shortly thereafter. In light of the positive feedback we got, it seemed only logical to follow up on this line of research. The many theoretical lines that converged in predicting the effect we found were a plus in making a convincing ar- gument. On the flipside, however, this also meant that we had not one but several candidates for the underly- ing psychological process responsible for this mecha- nism. Our project sought to tackle this. Specifically, we expected three distinct, not necessarily mutually exclu- sive, processes to be potentially involved (Figure 1). Building on originally psycho-dynamic reasoning, we reasoned that the mediating mechanism rested on the process that (latent) feelings of guilt that were fought off by derogating the victims and or interpreting their suffering as deserved. The implication would be that IMHOFF & MESSER 4 this mechanism should be restricted to victims of one’s own group (as feeling guilty for atrocities committed by another seemed unlikely), should be moderated by pro- pensity to feel guilty, mediated via feelings of guilt, and should be reduced if this guilt was alleviated in any other way. The second alternative was built on the notion of so- cial identity and individuals’ motivation to see their own group as moral (Branscombe, Ellemers, Spears, & Doosje, 1999) and defend its positive identity (Brans- combe, Schmitt, & Schiffhauer, 2007). Here also the ef- fect should be restricted to victims of the ingroup (as there exists no motivation to see outgroups as moral) and should be particularly prominent among people who identify (defensively) with their ingroup. The me- diating mechanism would be the perceived threat to the ingroup’s moral image and any alternative means to re- pair this image might reduce the effect. The final distinct possibility was that victim deroga- tion here was a means to restore one’s illusion of the world as a just place (e.g., Correia & Vala, 2003; Fried- man & Austin, 1978; Godfrey & Lowe, 1975; Lerner & Simmons, 1966; Miller, 1977; Simmons & Piliavin, 1972). The strong need to see the world as a place where everyone gets what they deserve and deserves what they get (Lerner, 1980) should prompt the desire to generate reasons why Jewish suffering was actually deserved, likely leading to victim blaming. Importantly, this mechanism is not exclusive to one’s own victim but should be a general process independent of who brought about the suffering. People with a greater need to see the world as just should be more prone to show the effect and re-establishing a sense of the world as just by alternative means should reduce the effect. Figure 1. Potential pathways from perception of ongo- ing victim suffering to increased prejudice. The present research. We planned a research program that sought to rep- licate the basic finding of secondary antisemitism and address the plausibility of each of the three theoretical possibilities outlined above by three strategies. First, all three accounts propose different moderators for the ef- fect: guilt proneness, defensive national identification, just-world beliefs. Second, the boundary conditions of the effect should also be informative. Whereas the first two accounts would predict the effect to be limited to victims of the ingroup, the last would make a general prediction for any (innocent) victim. Third, all three theories allow predictions of the specific kind of alter- native means that could serve as an alternative means to alleviate the discomforting feelings of guilt, ingroup threat, or just-world threat. Washing one’s hands, we reasoned, should alleviate guilt; re-affirming the moral- ity of one’s nation should alleviate concerns about one’s group’s morality; and providing examples of fair and just procedures should re-establish a sense of justice in this world. As an additional possibility, we planned to explore indirect effects via measured mediators (e.g., latent guilt). Below we describe the first two studies from that line of research, which could not even estab- lish the basic effect let alone a moderation. In light of this, we refrained from conducting additional studies with experimental moderators (e.g., washing hands). Instead, all other reported studies describe efforts to find evidence for the basic process of an increase in an- tisemitic prejudice by making the history of the Holo- caust salient (not necessarily ongoing victim suffering). We employed more subtle measures of prejudice (Stud- ies 3a-3c), less egalitarian samples (Studies 4a and 4b), or more modest forms of negativity, like reduced empa- thy (Study 5). None of these succeeded in providing such evidence. Study 1 In the first study, we aimed to replicate Imhoff and Banse’s (2009) study and to test the role of latent guilt as a potential mediating process. We utilized an adap- tation of the Implicit Positive and Negative Affect Test (IPANAT; Quirin, Kazén, & Kuhl, 2009), which served as an indirect measure of guilt. We examined whether a) ongoing Jewish suffering increases implicit guilt, whether b) implicit guilt is positively correlated with antisemitism under bogus-pipeline conditions, and whether c) implicit guilt mediates the effect of ongoing Jewish suffering on antisemitism. To maximize our chances of finding subtle effects, we took an earlier baseline measurement of our central dependent varia- ble. IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 5 Method Participants. An a priori power analysis suggested a required sample of N = 120 to find an interaction of a size of f = .30 (effect size was f = 0.36 in Imhoff & Banse, 2009) with 90% power. As we expected substan- tial dropout, we sought to oversample at t1. Specifi- cally, we circulated an invitation to participate in a study consisting of two parts (45-minute online study, 15-minute lab experiment) via an e-mail to individuals who had signed up as interested in study participation. To enhance participation at both measurement times, we offered 12 EUR that would be given in cash after completion of the second, lab-based experiment. De- spite this incentive and three invitation e-mails, only 109 individuals (34 men, 74 women, 1 missing; mean age: 27.05, SD = 6.70) participated in the online study. Upon completion (roughly 3 months after the first invi- tation to the online study), participants were contacted individually to make appointments for the lab study. A total of 83 participants (29 men, 54 women; mean age: 27.71, SD = 7.22; drop-out: 23.9%) were successfully recruited to show up for the lab study. This equipped us with 77% power to detect the estimated effect of f = .30. The data of one additional participant in the lab study had to be excluded because he or she provided a participant code not included in the dataset of the pre- test. Online testing. The purpose of the online test was twofold. First, we needed a baseline measure of antisemitism to control for a t2. This would reduce the noise due to stable indi- vidual differences and thus isolate the proportion of the variance that was not due to such individual differences and was therefore in principle susceptible to experi- mental manipulation. Second, we included a long list of moderators predicted by the different theoretical mod- els outlined above. The overarching goal was to identify systematic patterns across a series of studies to bolster the robustness of one specific theoretical approach. Spe- cifically, we included measures of guilt proneness, na- tional identification, and just-world beliefs. Some addi- tional measures were added on a purely exploratory base. Antisemitism. Explicit antisemitism was assessed us- ing Imhoff’s (2010) scale for the measurement of pri- mary and secondary antisemitism on seven-point scales ranging from 1 (totally disagree) to 7 (totally agree). In order to attenuate reactance and as in the original study, these items were preceded by a filler item (“I think the relationship between Germans and Jews is still influenced by the past.”). Additionally, among the clearly negative items we included items that indicated more positive attitudes (e.g., 9 items tapping into col- lective guilt and regret; Imhoff, Bilewicz, & Erb, 2012; 5 items on contact and contact intention, 5 items on reparation intentions). The actual antisemitism scale consisted of 29 items measuring modern antisemitism (e.g., “Jews have too much influence on public opin- ion”; 4 reverse-coded; Cronbach’s α = .91). As a second measurement approach, participants in- dicated how warm (5 items, e.g. “good-natured”, Cronbach’s α = .92) and competent (4 items, e.g. “com- petent”, Cronbach’s α = .77; Fiske, Cuddy, Glick, & Xu, 2002) they perceived Jews to be using a list of 20 ad- jectives (including 11 filler items) on the same scale. Guilt proneness. We assessed disposition to experi- ence strong feelings of guilt using two instruments: the Test of Self-Conscious Affect-3 (TOSCA-3; German ver- sion by Rüsch & Brück, 2003; 5-point scale) and the Guilt and Shame Proneness Scale (GASP; German trans- lation by Cohen, Wolf, Panter, & Insko, 2011; 7-point scale). Both measures ask participants to imagine vari- ous scenarios and to indicate how likely it is for them to experience guilt (among other possible reactions) in these situations. Cronbach’s α was .47 for the TOSCA-3 guilt scale and .60 for the guilt – negative behavior eval- uation scale of the GASP. National identification. National identification was measured in two ways so that the impact of the defense form of national identification (i.e., glorification con- trolled for attachment, collective narcissism) could be isolated. We measured attachment to the national group (8 items; e.g., “Being a German is an important part of my identity”; Cronbach’s α = .90) and glorifica- tion of this group (8 items; e.g., “Germany is better than other nations in all respects”; Cronbach’s α = .82) on seven-point scales ranging from 1 (totally disagree) to 7 (totally agree) with items by Roccas, Sagiv, Halevy, and Eidelson (2008) that were adapted and translated to German. As an additional measure of defensive na- tional identification, we included a measure of collec- tive narcissism, the exaggerated belief that one’s own national group is superior to other groups, on the same scale. To this end we used the German translation of nine items (Cronbach’s α = .85) of the Collective Nar- cissism Scale (e.g., “I wish other groups would more quickly recognize authority of the Germans”; Golec de Zavala, Cichocka, Eidelson, & Jayawickreme, 2009). Belief in a just world. We used Dalbert’s (2001) Gen- eral Belief in a Just World Scale that consists of six items (e.g., “I think basically the world is a just place”; Cronbach’s α = .72). The items of this scale were an- swered on a six-point scale ranging from 0 (totally dis- agree) to 5 (totally agree). IMHOFF & MESSER 6 Additional variables. We measured right-wing au- thoritarianism (RWA; Funke, 2005), social dominance orientation (SDO; von Collani, 2002), the Big Five (BFI- 10; Rammstedt & John, 2007), conspiracy mentality (Imhoff & Bruder, 2014), and the coping modes vigi- lance and cognitive avoidance (Mainz Coping Inven- tory, ABI; Egloff & Krohne, 1998) using German ver- sions of the scales. Procedure. After giving informed consent, partici- pants completed all scales in a fixed order (TOSCA-3, Belief in a Just World, Collective Narcissism, Glorifica- tion and Attachment, Antisemitism, Conspiracy Mental- ity, Right-Wing Authoritarian, Social Dominance Orien- tation, Mainz Coping Inventory, GASP, BFI-10, de- mographics) before generating the individual code needed to match their pretest data with the lab study data. Lab Study. All participants who participated in the online study and left contact details were invited via e-mail to par- ticipate in the lab study. Upon arriving at individually arranged sessions they were randomly assigned to one of the four conditions resulting from a 2 (ongoing con- sequence: yes vs. no) by 2 (bogus pipeline: yes vs. no) design. Information on ongoing consequences. Participants read a text, ostensibly taken from a history book, which described the German atrocities committed against Jews in the Auschwitz concentration camp. This text was identical to that used by Imhoff and Banse (2009). The last paragraph contained the manipulation of on- going consequences. Participants either read that the suffering of the Jewish victims was part of a terrible his- tory that has no direct implications for Jews today (no ongoing consequences) or that even today Jews are suf- fering either as Auschwitz survivors or as their descend- ants because of “secondary traumatization” (ongoing consequences). Bogus Pipeline. The implementation of the bogus pipeline differed from the original study (Imhoff & Banse, 2009) because we initially intended to explore physiological reactions to both versions of the text about the Holocaust. In the bogus pipeline condition, the electrode belt of a heart rate monitor watch was ap- plied to participants’ chests. In addition, electrodes were attached to the palmar surfaces of the participants’ index and middle fingers and to the back of their hands, supposedly to measure galvanic skin response. Partici- pants were informed that physiological data were meas- ured because “previous research has shown that we can detect quite well whether someone answers truthfully or with a lie”. Participants in the control condition un- derwent measurement of heart rate as well but did not have electrodes attached to their hands. Importantly, participants in this condition were informed that physi- ological measures were obtained merely in order to ex- plore whether physiological parameters correlate with information processing in reading. Measures. Implicit guilt. We used an adaptation of the Implicit Positive and Negative Affect Test (IPANAT; Quirin, Kazén, & Kuhl, 2009) that assesses anger, fear, happi- ness, and guilt (IPANAT-4-EM) to measure implicit guilt. Participants were asked to judge the extent to which artificial words (e.g., “VIKES”) express each of three emotional qualities per emotion cluster. Guilt was represented by the emotion words “guilt”, “regret”, and “shame”, Cronbach’s α = .88. Explicit guilt. The same emotions that were meas- ured with the IPANAT-4-EM in an indirect way were also assessed using a self-report measure. Participants indicated to what extent they felt anger, fear, happi- ness, and guilt (“guilty”, “regretful”, and “ashamed”) at that moment, Cronbach’s α = .81. Antisemitism. Participants completed the same scale as in the online study, α = .93. Heart rate variability. We collected heart rate varia- bility data for exploratory purposes using heart rate monitor watches by Polar. Procedure. After an interval ranging between seven days and three months between the online survey and participation in the lab study (Time 2), participants were randomly assigned to one of four experimental conditions in a 2 (ongoing consequences vs. no ongoing consequences) × 2 (bogus pipeline vs. control) factorial design. The session started with the bogus pipeline ma- nipulation and the physiological device set up. After a two-minute baseline measurement of heart rate varia- bility, participants read a text about the German atroci- ties in the Auschwitz concentration camp, which in- cluded the manipulation of consequences for present- day Jews. The individual paragraphs of the text moved across the screen over a period of 140 seconds to allow for a mapping of physiological reactions to specific parts of the text. After the reading task, participants were asked to write down on a piece of paper the thoughts they had had while reading the text. Subsequently, they completed the IPANAT-4-EM, which included our meas- ure of implicit guilt, and filled in the measure of explicit guilt. Finally, they again answered the same antisemi- tism questionnaire that they had completed at Time 1 IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 7 and indicated whether the text presented to them be- fore contained information about ongoing negative con- sequences for Jews today as a manipulation check (“yes” or “no”). Results Antisemitism showed high stability between both measurements, r(83) = .89, p < .001. We followed the strategy of the original study (Imhoff & Banse, 2009) in analyzing the effect of the information on ongoing con- sequences on antisemitism. Time 1 antisemitism scores were entered as a predictor of Time 2 antisemitism scores in a regression analysis and standardized resid- ual change scores were used as an index of change in antisemitism. The resulting residual change scores were subjected to a 2 (ongoing consequences vs. no ongoing consequences) × 2 (bogus pipeline vs. control) analysis of variance (ANOVA). In contrast to our hypothesis and the results of the original study, no evidence was found for an interaction between the information on ongoing negative consequences for Jews and the bogus pipeline manipulation, F(1, 79) = 0.28, p = .602, ηp2 = 0.003 (Figure 2, right panel). Likewise, none of the experi- mental factors showed a main effect, Fs < 1. Confront- ing German participants with ongoing negative conse- quences for present-day Jews did not result in increased antisemitism, even when participants thought that un- truthful responses could be detected by the experi- menter. Figure 2. Change in explicit antisemitism (standardized residuals) from Time 1 to Time 2 as a function of the information on ongoing consequences and bogus pipe- line manipulations in the original study (Imhoff & Banse, 2009; left panel) and in Study 1 of the current research. Error bars represent standard errors of the mean. Despite this lack of support for the basic effect, we analyzed whether ongoing Jewish suffering increases implicit guilt. A t-test for independent samples revealed no significant difference in implicit guilt between the ongoing consequences condition (M = 3.25, SD = 0.95) and the no ongoing consequences condition (M = 3.06, SD = 0.90), t(81) = 0.93, p = .354, Hedges’s gs = 0.20, 95% CI [-0.23, 0.64]. In contrast to our hypothesis, im- plicit guilt was not positively correlated with antisemi- tism under bogus pipeline conditions, r(44) = .12, p = .451. In order to test the moderator hypotheses, we per- formed separate hierarchical multiple regression anal- yses using the standardized residual change scores in antisemitism as a dependent variable. Product terms representing the three-way interactions among both ex- perimental factors and the potential moderator varia- bles were entered as predictors in a third step after the simple predictors and all possible two-way products. None of these regression analyses revealed evidence for a moderation effect of collective narcissism (see Table Osm.1 on our OSF project page), national glorification (see Table Osm.2), just-world beliefs (see Table Osm.3), or guilt proneness (see Table Osm.4). Discussion Study 1 provided no lead on the research question of which psychological processes are plausibly respon- sible for increased prejudice in light of ongoing suffer- ing, predominantly because it failed to replicate this finding. Although descriptively the mean scores were in the predicted direction, this trend was far from signifi- cant. Several reasons appeared conceivable for this. As always, the non-significant findings could be a false- negative and due to too little power. We failed to collect data from 120 participants as planned based on a priori power analyses and these analyses might already have been biased by an effect size estimate that was too op- timistic, taken from the original study. Alternatively, the bogus pipeline manipulation might not have worked as it did in the original study. We had used different equip- ment (a heart rate monitor plus hand electrodes instead of forehead electrodes plus hand electrodes) in a differ- ent setting (neutral, almost empty room instead of a slightly messy laboratory with many cables lying around) and sampled from a different population (via a volunteer participant e-mail list instead of first-year un- dergraduates) with different incentives (cash payment instead of course credit). Potentially, any of these fac- tors or their combination undermined the credibility of our bogus pipeline manipulation. In fact, unlike the pre- vious study, we have no evidence for the validity of the procedure. In our original study, we had included an Affective Misattribution Procedure (Payne, Cheng, Go- vorun, & Stewart, 2005) as a measure of implicit anti- semitism. As we expected, this measure correlated sub- stantially with the explicit measure under bogus pipe- line conditions (i.e., participants really self-report what IMHOFF & MESSER 8 they “feel”), but not under control condition (where they corrected their responses in a socially desirable way). We had eliminated the indirect measure between the ongoing suffering manipulation and the dependent variable in an effort to streamline the procedure. Nev- ertheless, we continued as planned with Study 2. Study 2 In Study 2, we aimed to test the just-world theory as an explanation of the effect of ongoing Jewish suffering against the hypotheses of guilt-defense and the protec- tion of a positive social identity. We did so by introduc- ing a condition in which just-world theory would make a different prediction than guilt-defense or social iden- tity theory. Just-world beliefs should be threatened by unjustly suffering victims in any case, irrespective of who the perpetrator is. In contrast, not every case of in- justice should result in increased guilt or in a threatened positive social identity. Only if the perpetrators are members of the in-group (in this case Germans), one should be motivated to derogate the victims. Accord- ingly, we manipulated group membership of the perpe- trators. Method Participants. We again aimed for a final sample of 120 participants. One hundred and eighty-five first-year psychology students (27 men, 158 women; mean age: 22.29, SD = 4.88) from the University of Cologne, Ger- many, participated in an online study at the first meas- urement. Seventy-eight participants dropped out be- tween the first and the second measurement occasion (42%). The post-test data of two participants had to be excluded because they provided participant codes not included in the dataset of the pretest. We excluded nine participants before running the analyses because they did not remember that the historical text they had read contained information about ongoing negative conse- quences for the victims or because they did not remem- ber who the perpetrators had been. The remaining sam- ple of N = 96 (86 women, 10 men) ranged from 17 to 39 in age (M = 21.55, SD = 4.11). Participants received 12 EUR for their participation (approx. 7.50 EUR per hour). Measures. The main dependent variable in Study 2 was explicit prejudice against the victim group partici- pants read about during the experiment. Depending on experimental condition, participants responded to items measuring prejudice against Jews or Chinese. We chose ten items from the antisemitism scale (Imhoff, 2010; Cronbach’s α = .72) that could be modified to assess prejudice against Chinese (e.g., “Chinese have too much influence on public opinion”; Cronbach’s α = .80). The prejudice items were supplemented by four items on collective guilt (e.g., “I can easily feel guilty for the neg- ative consequences that were brought about by Ger- mans [Japanese]”, Cronbach’s α = .83 and .62, respec- tively) and, for participants that read about the Holo- caust, by two items on primary and five items on sec- ondary antisemitism for exploratory purposes. Study 2 included the same measures of potential moderators and additional variables as in Study 1 except that we excluded the TOSCA-3 (but kept the GASP as a measure of guilt proneness), the in-group attachment and glori- fication scales (but kept the measure of collective nar- cissism), and the ABI (measuring anxiety coping styles that might be related to the tendency to avoid – and therefore misremember – threatening information). In addition, we included the following measures on a purely exploratory basis: a response latency-based measure of prejudice (adapted from Vala, Pereira, Eu- gênio, Lima, & Leyens, 2012), a rating of Jews and Chi- nese on eight warmth-related traits, and a feeling ther- mometer assessing feelings towards these groups (among other groups). The BIOPAC system that had the main purpose of serving as a bogus pipeline setup (see below) was also used to record electrodermal activity data for exploratory purposes. We did not analyze phys- iological data, but the raw data can be obtained from the authors. Independent variables. We manipulated group membership of the perpetrators by presenting partici- pants with either a text about the Holocaust (which was the same as in Study 1) or about the ongoing suffering of Chinese victims of the Nanking massacre committed by Japanese troops. In both conditions, the last para- graph stressed the ongoing negative consequences for present-day Jews or Chinese, respectively. Presentation of the text differed from Study 1 in that the whole text was shown on the screen at once, whereas the individ- ual paragraphs moved across the screen in Study 1. In contrast to Study 1 and more similar to the origi- nal study (Imhoff & Banse, 2009), we operationalized the bogus pipeline manipulation as measuring electro- dermal activity under the pretext of lie detection vs. no physiological measurement at all. Participants in the bo- gus pipeline condition were informed that “specific pa- rameters of electrodermal activity allow us to detect whether someone answers truthfully or with a lie”. Sub- sequently, the experimenter attached the electrodes of a BIOPAC system to the palmar surfaces of the partici- pants’ index and middle fingers. In order to increase IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 9 credibility of the bogus pipeline, the experimenter con- tinued with an alleged calibration that required partici- pants to follow some instructions while the experi- menter was monitoring the physiological parameters at another computer behind a room divider. Specifically, participants were instructed to take a deep breath and hold the breath for a moment. After that, the experi- menter asked participants to memorize a number be- tween one and six printed on a card (which was a 4 in every case). Analogous to a concealed information test, the experimenter then read a series of numbers that could have been on the card, and participants were in- structed to answer “yes” to every number, whether ac- curate or not. After a few seconds, participants were in- formed that the apparatus was working properly and that they were ready to start with the study. Participants in the control condition received no treatment at all. Procedure. The first measurement of explicit preju- dice against the victim group alongside assessment of the potential moderator variables was obtained in a classroom testing session (Time 1). After an interval of five or six months, participants were invited to the la- boratory for an individual session (Time 2). Participants were randomly assigned to one of four groups in a 2 (group membership of the perpetrators: in-group vs. out-group) × 2 (bogus pipeline vs. control) design. Af- ter the bogus pipeline manipulation had been adminis- tered, participants gave demographic information and read a neutral text about the history of an abandoned town, which served as a control task for the assessment of electrodermal activity, involving reading but without injustice-related content. In the bogus pipeline condi- tion, this reading task was preceded by a three-minute baseline measurement of electrodermal activity. After this initial reading task, participants were given two minutes to write down on a piece of paper their thoughts about the text. After a one-minute rest period, participants were presented with the critical text that contained the manipulation of the perpetrators’ group membership and again wrote down their thoughts. Sub- sequently, participants completed 48 trials of the re- sponse latency-based measure of prejudice and an- swered the prejudice questionnaire. Finally, they were asked whether the text contained information on ongo- ing negative consequences for the victims (“yes” or “no”) and who the perpetrators had been as a manipu- lation check (“the Red Army”, “Japanese troops”, “SS officers”, or “American soldiers”). Results The stability of antisemitism was lower than in Study 1, r(44) = .57, p < .001. The stability of preju- dice against Chinese was r(52) = .72, p < .001. Preju- dice against the victim group was analyzed as change in prejudice between both measurement occasions exactly as in Study 1. The standardized residual change scores were subjected to a 2 (group membership of the perpe- trators: in-group vs. out-group) × 2 (bogus pipeline vs. control) ANOVA. Results neither revealed a significant main effect of the bogus pipeline manipulation, which would have been predicted by just-world theory, F(1, 92) = 0.05, p = .830, ηp2 = 0.00, nor an interaction effect, which would have been predicted by guilt-de- fense and social identity theory, F(1,92) = 0.01, p = .919, ηp2 = 0.00. The only significant experimental ef- fect was a (hard-to-explain) main effect of victim group, F(1, 92) = 15.74, p < .001, ηp2 = 0.17: Whereas anti- semitic prejudice showed a relative decrease compared to t1, the opposite was true for anti-Chinese prejudice (Figure 3). Separate moderator analyses confirmed this result for participants high in collective narcissism (see Table Osm.5), just-world beliefs (see Table Osm.6), and guilt proneness (see Table Osm.7). Figure 3. Change in explicit prejudice against the victims (standardized residuals) from Time 1 to Time 2 as a function of the group membership of the perpetrators and the bogus pipeline manipulation in Study 2. Error bars represent standard errors of the mean. Discussion Studies 1 and 2 failed to replicate the basic effect of an increase in antisemitism in response to the ongoing Suffering Manipulation Jewish Victim s of Ingroup Chinese Victim s of Outgroup -1,0 -0,5 0,0 0,5 1,0 Control Bogus Pipeline IMHOFF & MESSER 10 suffering of Jewish victims, which had been reported in the original study (Imhoff & Banse, 2009). In light of this repeated failure to replicate the interaction of bo- gus pipeline and ongoing suffering, we decided to switch gears and focus on establishing the basic effect. The bogus pipeline manipulation appeared to us as the most plausible candidate for this failure. Clearly, partic- ipants needed a lot of trust in the researchers to believe that the researchers could indeed detect untruthful re- sponding. In contrast to the time when bogus pipelines were originally proposed in the early 1970s (e.g., Sigall & Page, 1971), current students are very likely aware of the fact that a simple “lie detector” is a gadget from fic- tional literature, not a real thing. Based on the working hypothesis that lie detection machines have been too thoroughly debunked in public discourse to affect par- ticipants’ responding, we turned to another popular ap- proach to circumvent social desirable responding: more subtle measures. Study 3a – 3c In Studies 3a to 3c, we investigated whether the very basic effect shown in the original study (Imhoff & Banse, 2009) – Germans show increased antisemitism when confronted with the Holocaust – is detectable. As we were not confident in the effectiveness of the bogus pipeline manipulation given the results of Studies 1 and 2, we employed an alternative approach in addressing the problem of measuring antisemitic attitudes, which are socially very undesirable to express. Instead of a bo- gus pipeline setup, we adopted a reverse-correlation paradigm as a subtle, indirect measure of prejudice. If confronting Germans with the crimes their ancestors committed against Jews results in them becoming more antisemitic, we expected Germans to remember the face of a Jewish person as more negative when the Holo- caust is mentioned at the initial confrontation with this person. To test this hypothesis, we asked participants to form a first impression of a person that was either Jew- ish or Christian. In addition, we manipulated whether the text about this person contained information about the Holocaust or not. Participants then completed a re- verse-correlation image-classification task based on the memory they had of the target person’s face, which al- lowed us to visualize the remembered facial appearance of that person. We replicated this study twice (Studies 3b and 3c) with minor changes regarding the materials, as explained below. Method Participants. Seventy-eight psychology students from the University of Cologne, Germany, were re- cruited via mailing lists, flyers, social networks, or by being personally approached on the university campus to take part in Study 3a. Based on a priori set criteria (see below), we excluded 17 participants before run- ning the analyses because they did not remember cor- rectly that the target person was Jewish [vs. Christian] or that he was volunteering in an organization that sup- ports Holocaust survivors [vs. an organization working to protect forests] or both. The remaining sample of N = 61 (47 women and 14 men) ranged from 20 to 49 years in age (M = 24.67, SD = 5.82). Participants re- ceived course credit for their participation. Roughly 120 students from different fields of study participated in exchange for 4 EUR in Study 3b (N = 121) and Study 3c (N = 120), respectively. The effec- tive sample size after exclusions based on the same cri- teria as in Study 3a was N = 94 (50 women and 44 men; age 18 to 38 years, M = 22.71, SD = 3.42) in Study 3b, and N = 89 (59 women, 29 men, one partic- ipant did not indicate; age 18 to 40 years, M = 23.22, SD = 4.23) in Study 3c. Independent Variables. The session started with an impression formation task that contained the manipula- tion of both independent variables. Participants read a short text about a person containing irrelevant infor- mation about that person’s job, residence, and leisure time, and, critically, cues to the person’s religious affili- ation and a sentence mentioning the Holocaust or a con- trol issue. Participants were told that the person was ac- tive in his synagogue [vs. church] and volunteered with an organization that helps Holocaust survivors because his grandfather had been murdered in the Auschwitz concentration camp [vs. an organization working to protect forests]. In Studies 3b and 3c, we introduced minor changes in the manipulations. Specifically, we reasoned that volunteer work in any religious group might be seen as a cue to morality or other positive traits. In Studies 3b and 3c, religious affiliation was thus made salient without implying volunteer work: The sen- tence containing the manipulation of group member- ship was changed so that the target person was not ac- tive in a synagogue or church but had been asked whether he wanted to become active in his father’s syn- agogue [vs. church]. Participants in the Holocaust con- dition read that the target person was involved in an organization demanding reparation payments for Holo- caust survivors (whereas he was working for another charity not related to the Holocaust in the other condi- IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 11 tion). Contrary to Study 3a, the text contained no infor- mation about any victims among his family members to eliminate potential effect of direct sympathy. In each of the three versions of Study 3, the text about the target person was accompanied by a picture showing the face of a young man. In Studies 3a and 3b, the face image was the neutral male face of the Averaged Karolinska Directed Emotional Faces database (Lundqvist & Litton, 1998), whereas we used a morph of sixteen emotionally neutral faces in frontal view taken from the Radboud Faces Database (Langner et al., 2010) in Study 3c. Both images have been used in previous reverse-correlation research (e.g., Dotsch et al, 2008, and Imhoff et al., 2013, respectively). Central Dependent Variable: Reverse-Correlation Image-Classification Task. We relied on reverse correla- tions to assess whether participants’ memory of a per- son’s face is biased by information on that person’s group membership and mention of the Holocaust. Re- verse correlation is a data-driven approach that enables researchers to visualize an idealized decision criterion. By tracking which kind of subtle (and random) altera- tions in the appearance of face correlates with a classi- fication decision (e.g., which of two faces look more fe- male; Mangini & Biederman, 2004), one can estimate what a face that fulfills all criteria in an ideal way looks like (classification image). Beyond very basic decisions (e.g., male vs. female), and more relevant to this study, reverse-correlation techniques can be used to construct images that reflect the expected or remembered facial appearance of a target person without making any a pri- ori assumptions about relevant features. Previous studies applied this approach to investigate biased expected facial appearance of out-group mem- bers (Dotsch, Wigboldus, Langner, & van Knippenberg, 2008; Dotsch, Wigboldus, & van Knippenberg, 2013; Imhoff & Dotsch, 2013; Imhoff, Dotsch, Bianchi, Banse, & Wigboldus, 2011) and previously encountered indi- viduals (Karremans, Dotsch, & Corneille, 2011). For in- stance, Karremans et al. (2011) found that people in- volved in a romantic relationship held a less attractive memory of an attractive alternative’s face than unin- volved individuals. When asked to select a face that best represents a typical member of a certain social group (e.g., manager, nursery teacher), stereotypical beliefs about these groups’ warmth as well as competence are encoded in the face and can be decoded from the clas- sification image by independent perceivers (Imhoff, Woelki, Hanke, & Dotsch, 2013). Image Creation. Subsequently, participants worked through the reverse-correlation task, which allowed us to obtain visualizations of the participants’ memories of the target face. We used a two-images, forced-choice variant of the reverse-correlation paradigm (e.g., Dotsch et al., 2008; Imhoff et al., 2011), in which each participant completed 400 trials of selecting one of two presented faces. In each of these trials, they selected the face that they thought looked more like the target per- son they had seen before (i.e., during the impression formation task). The stimuli used in the picture classifi- cation task were all based on the face they had seen on the page about the target person. To generate the stim- uli, this base image had been converted to grayscale and superimposed with random noise resulting in random variations of the facial appearance between the stimuli (for noise generation, see Dotsch & Todorov, 2011). Every trial employed a different noise pattern display- ing the original pattern on the left and the negative of that pattern on the right side of the screen. Participants selected pictures by pressing a left or right button on the keyboard. By averaging all noise patterns participants had se- lected separately for each experimental condition and superimposing these classification patterns on the base image, we obtained a classification image for every con- dition (see Figure 4). Trials with a response time lower than 200ms were excluded before constructing the clas- sification images (<5% of the trials). The resulting clas- sification images visualized how participants in each of the four experimental groups remembered the target face on average. In addition to the classification images aggregated on a group level, we also analyzed classifi- cation images of individual participants in Studies 3b and 3c in order to explore the possibility that derogation of victims could occur on inter-individually different di- mensions and hence be reflected in different facial fea- tures. Holocaust Metioned Control Je w is h Ta rg et IMHOFF & MESSER 12 C hr is ti an T ar ge t Figure 4. Classification Image as a function of infor- mation about the Holocaust (Holocaust is mentioned vs. control) and group membership of the target person (Jewish vs. Christian) in Study 3a. Image Rating. In the second phase of Study 3, the classification images created by every experimental group in the first phase were rated on warmth (Cronbach’s α between .84 and .92) and competence (Cronbach’s α between .72 and .90) by 56 independent participants recruited via Amazon MechanicalTurk (MTurk; 30 women and 26 men, age 18 to 75 years, M = 38.57, SD = 14.59; Study 3b: N = 43, 20 women and 23 men, age 20 to 66 years, M = 37.93, SD = 13.04; Study 3c: N = 64, 40 women, and 23 men, one person did not indicate, age 18 to 71 years, M = 35.56, SD = 12.68). Five other participants were excluded because they indicated that they had answered randomly or pur- posely false, or that they would exclude their data if they were the researcher (six exclusions in Study 3b and four in Study 3c). The warmth and competence items were the same as in the first phase of the study. Re- sponses were made using a five-point scale ranging from 1 (strongly disagree) to 5 (strongly agree). Every rater in this second phase of the study rated each of the four group-wise classification images. Accordingly, rat- ings were analyzed using within-subjects tests. The warmth ratings of the classification images constituted the main dependent variable. The individual classifica- tion images from the first phases of Studies 3b and 3c were rated by independent participants by indicating “how likable” they found each of the persons. Partici- pants were paid 25 cents in Study 3a and 50 cents in Studies 3b and 3c. Additional measures. After completing the reverse- correlation task, participants were probed for suspicion using a funneled debriefing procedure (cf. Chartrand & Bargh, 1996) and were then asked to indicate their first impression of the target person by a) describing the per- son in their own words and b) rating the person’s warmth and competence. For the warmth and compe- tence ratings, participants indicated to what extent each of 20 adjectives representing warmth (5 items, e.g. “good-natured”, Cronbach’s α = .82) and competence (4 items, e.g. “competent”, Cronbach’s α = .69; Fiske et al., 2002) characterized the target person on a five- point scale (1 = not at all to 5 = very much). In Studies 3b and 3c, we excluded the question asking participants to describe the target person and replaced the warmth and competence items with ten items assessing likabil- ity of the target person (e.g., “How likable do you find David S.?”), which also included five reverse-coded items representing common negative stereotypes about Jews (e.g., “How stingy do you find David S.?”). These ten items were combined into a single explicit likability scale, Cronbach’s α = .84 in Study 3b and .88 in Study 3c. Next, participants answered ten (in Studies 3b and 3c, six) questions about the target person of which three served as a manipulation check and gave demographic information. Finally, they completed an antisemitism questionnaire (only in Study 3a) consisting of 14 items taken from the scale used in Study 1 (Imhoff, 2010; Cronbach’s α = .86). In Studies 3b and 3c, we included a word stem com- pletion task to explore whether representations of the Holocaust were successfully activated in the Holocaust condition. This task was administered after the warmth and competence ratings and asked participants to com- plete 30 word stems of which ten could be completed to form a word related to the Holocaust (e.g., “Endl_____” could be completed to “Endlösung” [final solution]). Answers on the ten critical items were coded as Holo- caust-related or not by a single rater and aggregated to a sum score. Furthermore, the Positive and Negative Af- fect Schedule (PANAS; Watson, Clark, & Tellegen, 1988) was added in between the impression formation task and the reverse-correlation image-classification task in Studies 3b and 3c for exploratory purposes. Materials and Procedure. Participants were seated at a computer in individual cubicles and were randomly assigned to one of four experimental conditions follow- ing a 2 (group membership of the target person: Jewish vs. Christian) × 2 (Holocaust is mentioned vs. control information) design. Secondary antisemitism, we rea- soned, would be exhibited in a face that independent others would perceive as less warm if the person was introduced as Jewish and the Holocaust was mentioned. Results Based on the idea of secondary antisemitism, we ex- pected the classification images created by participants who were both presented with a Jewish target person and reminded of the Holocaust to be rated as less warm or likable than those from the other conditions. Warmth ratings of the group-wise classification images were IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 13 subjected to a 2 (group membership of the target per- son: Jewish vs. Christian) × 2 (Holocaust is mentioned vs. control information) repeated measures ANOVA. Contrary to the hypothesis, in Studies 3a and 3b results did not show a significant interaction effect, F(1, 55) = 0.03, p = .872, ηp2 = .00, and F(1, 42) = 0.02, p = .897, ηp2 = .00, respectively. In Study 3c, a significant interaction effect emerged, F(1, 63) = 10.06, p = .002, ηp2 = .14. However, the pattern of means was in con- trast to expectations, as the classification image from the Jewish condition was rated as warmer when partic- ipants were reminded of the Holocaust (vs. control in- formation). For the analysis of individual classification images, likability ratings were averaged across raters yielding a mean likability rating for every individual classification image. The likability scores were then submitted to a 2 (group membership of the target person: Jewish vs. Christian) × 2 (Holocaust is mentioned vs. control in- formation) between subjects ANOVA. Neither for Study 3b nor for Study 3c the ANOVAs revealed any differ- ences between experimental conditions, test of interac- tion effects, F(1,90) = 0.03, p = .871, ηp2 = .00 and F(1,85) = 0.16, p = .693, ηp2 = .00, respectively. In addition to the primary analyses looking at the classification images reported above, we explored the explicit ratings of the target person’s warmth (Study 3a) and likability (Studies 3b and 3c). Between-subjects ANOVAs did not yield an interaction effect in any of the studies, F(1, 57) = 0.02, p = .879, ηp2 = .00 in Study 3a, F(1, 90) = 2.97, p = .088, ηp2 = .03 in Study 3b, and F(1, 85) = 0.38, p = .537, ηp2 = .00 in Study 3c. To explore whether representations of the Holocaust were activated to a higher degree in the conditions men- tioning the Holocaust than in the control conditions, we compared the number of Holocaust-related answers in the word stem completion task. In contrast to our ex- pectations, the sum of Holocaust-related answers was not significantly higher in the Holocaust conditions (M = 2.22, SD = 1.74 in Study 3b and M = 1.58, SD = 1.18 in Study 3c) than in the control conditions (M = 1.66, SD = 1.58 and M = 1.43, SD = 1.17), t(92) = 1.63, p = .108, Hedges’s gs = 0.33, 95% CI [-0.08, 0.74] and t(87) = 0.59, p = .559, Hedges’s gs = 0.12, 95% CI [-0.29, 0.54], respectively. Discussion Studies 3a to 3c failed to provide any evidence for the notion that making the Holocaust salient increases participants’ need to derogate the victim group. If any- thing, the effect was in the opposite direction in one study, but not reliably in the other studies. This invites speculation as to whether the chosen measure is indeed immune to social desirability concerns. Although it is not explicitly an evaluation task, participants are of course free to take all the time they need to select im- ages according to whatever impression they want to convey of themselves (e.g., as particularly unpreju- diced). It may thus be that the measure taps into partic- ipants’ very explicit and elaborate evaluation as much as typical prejudice scales do. The unexpected effect (somewhat reminiscent of the pattern in the no bogus pipeline condition in the original paper) is compatible with this interpretation, but the lack of any effect in the following studies does not corroborate this speculation. At present, there is no consistent effect (in any direc- tion) of making the atrocities of the Holocaust salient. As perhaps a side effect rather than the focus of the current interest, we also were not able to produce con- sistent effects on what we perceived as a simple manip- ulation check: a word stem completion task. The logic was that making the history of the Holocaust salient should increase participants’ tendency to complete am- biguous word stems in a semantically consistent way. Such tasks are highly popular instruments in the field of social cognition to tap into the semantic accessibility of certain constructs (or concept activation). While our failure to find any effect in such measures may raise doubts about their validity, it should be noted that the employed task was constructed ad hoc without proper pilot testing of base rates of word completion tenden- cies. In our own lab, we have gathered experiences with such tasks in other domains (i.e., to what extent pic- tures of or real pregnant women make baby-related word completions more likely) with more success (Mar- henke & Imhoff, 2018). We would thus caution against throwing the baby out with the bath water based on the failure presented here. At the same time, we caution that it is bad practice how naïvely we and other col- leagues construct such measures ad hoc and interpreted them as valid as long as they produced the desired ef- fects, but discard them as unreliable and invalid if they do not. Another reason for the failure to replicate the effect could be the population we sampled from in Studies 1- 3. Most were student samples from the University of Co- logne, more specifically the School of Humanities with a specialization in special needs education. Students from this school have a reputation to be particularly lib- eral (and their average self-reported political orienta- tion was left of the scale midpoint in both Studies 1 and 2), whereas students in the original study were psychol- ogy students who do not necessarily have the same rep- utation. To increase our chances of finding support for IMHOFF & MESSER 14 the mechanism of secondary antisemitism, we thus changed the research setting to a less restricted sample that might not have egalitarian norms to the same ex- tent. We thus conducted two studies in the city center of Cologne with pedestrians from the general popula- tion as participants. Studies 4a and 4b To include more politically diverse participants, we recruited individuals walking in front of the main sta- tion in Cologne, Germany, to fill in a “short survey on opinions on violent conflicts”. Instead of open antise- mitic expressions, we used agreement to criticism of Is- rael as a dependent variable. We assumed that criticism of Israel would be perceived as less taboo and thus be reported openly in a questionnaire so that we would not need a bogus pipeline setup. This approach was built on the notion that not only are anti-Israeli sentiment and antisemitism highly correlated in Europe (Kaplan & Small, 2006), but certain forms of criticism of Israeli politics are construed as a substitute communication. Demonizing Israel is socially more accepted than de- monizing Jews (Steinberg, 2004), but – in the context of secondary antisemitism – serves the same purpose: By portraying the (Jewish) state of Israel as ruthless perpetrator of human right violations, the (German) crimes against Jews become less salient (i.e., victim-per- petrator reversal; Imhoff, 2010). In line with the hy- pothesis of secondary antisemitism, we expected partic- ipants to show higher agreement (relative to a control condition) to statements criticizing Israel after being re- minded of the Holocaust. This effect might be greater for individuals high in national glorification. Method Participants. One hundred passers-by approached in front of the main station of Cologne, Germany, par- ticipated in Study 4a (57 women and 43 men). Partici- pants ranged from 17 to 76 years in age (M = 33.87, SD = 14.59). For Study 4b we recruited 196 passers-by (119 women, 73 men, four did not indicate their gen- der) ranging from 14 to 63 in age (M = 27.55, SD = 12.03). Another four participants were excluded before running the analyses because of missing responses on more than 50% of the items of the main dependent var- iable. In both studies, we included an attention check by asking participants in the last sentence of the instruc- tion to write an X on the page margin. As a very high proportion of participants failed this attention check (45% in Study 4a and 27% in Study 4b), we decided to keep these participants in the sample. The results re- ported below do not change when these participants are excluded. Participants received no compensation. In Study 4b, participants scored higher on national glorification (M = 2.32, SD = 1.22) than our student sample in Study 1 (M = 1.72, SD = 0.88), t(276) = 4.02, p < .001, Hedges’s gs = 0.52, 95% CI [0.26, 0.79] using the same three items. Although, a mean of 2.32 is still relatively low on a seven-point scale, our goal to acquire a less liberal sample was achieved. Materials and Procedure. The study was conducted in summer 2014 during the 2014 Israel-Gaza conflict. Participants were approached by the experimenter and asked to participate in a “short survey on opinions on wars and violent conflicts” (in Study 4b, “on the Israeli- Palestinian conflict”). They were then handed a two- page paper-and-pencil questionnaire and were ran- domly assigned to one of two experimental conditions in which they were either reminded of the Holocaust or not. The manipulation of being reminded of the Holo- caust vs. a control condition was embedded in the in- structions of the questionnaire. In Study 4a, participants in the Holocaust condition read that “70 years have passed since the monstrous German crime, the Holo- caust. Within 4 years, the Germans systematically killed 6 million Jews in extermination camps like Auschwitz.” In the control condition, that first part of the instruc- tions read, “History of humanity is a sequence of wars” making no reference to the Holocaust. Participants then indicated their agreement to 13 statements criticizing Israel (Cronbach’s α = .77), which had been taken from the existing literature (e.g., “Israel is a state that stops at nothing”; Kempf, 2014). To make the cover story (a survey on wars and violent conflicts) more credible, the questionnaire also included ten items on two other wars, five on the war in Ukraine and five on the war in Syria, which were not analyzed. In Study 4b, we included a more detailed description of the Holocaust in the Holocaust condition emphasiz- ing that a) most Germans from all parts of German so- ciety participated in the genocide or willfully ignored the crimes and that b) Jews are still suffering today as a result of the Holocaust. Besides the text, the Holocaust condition included a picture showing corpses of prison- ers of the Buchenwald concentration camp. The infor- mation about the Holocaust was introduced by referring to the public discussion in Germany about the role of the Nazi past for the contemporary relations to Israel. The control condition in Study 4b was a baseline meas- ure that simply asked participants to report their opin- ion on the Israeli-Palestinian conflict. The main depend- ent variable, criticism of Israel, was assessed using an IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 15 18-item scale (as compared to 13 items in Study 4a; Cronbach’s α = .84). This scale comprised the same 13 items that had been used in Study 4a and five more items that were newly created on the basis of actual comments on the 2014 Israel-Gaza conflict from the me- dia (e.g., “The war that Israel initiated against Gaza is completely unjustifiable. It is a crime, whether from the air or on the ground.”) To test whether Holocaust reminders increase criti- cism of Israel especially among Germans who glorify their national group, we included three of the items by Roccas et al. (2008) on national glorification in Study 4b (Cronbach’s α = .62). In Study 4b, we also included five items from the antisemitism scale (Imhoff, 2010), four items on group-based shame, three items on na- tional attachment, and, for exploratory purposes, an open-ended question on participants’ understanding of German identity. Results Neither Study 4a nor Study 4b provided evidence for an increase in criticism of Israel as a reaction to being reminded of the Holocaust, t(98) = -0.29, p = .776, Hedges’s gs = -0.06, 95% CI [-0.45, 0.33] and t(194) = 0.14, p = .890, Hedges’s gs = 0.02, 95% CI [-0.26, 0.30], respectively. Participants who read about the Holocaust did not criticize Israel more (Study 4a: M = 3.02, SD = 0.72; Study 4b: M = 3.78, SD = 0.85) than those in the control condition (Study 4a: M = 3.06, SD = 0.91; Study 4b: M = 3.76, SD = 0.88). This result also held for participants high in national glorification as revealed by a hierarchical multiple regression analy- sis. In a first step, the effect-coded group variable (Hol- ocaust = 1; control = -1), attachment to Germany, and glorification of Germany were entered as predictors of criticism of Israel, followed by three product terms rep- resenting all possible two-way interactions between the predictor variables in a second step, and the three-way interaction in a third step. The interaction between ex- perimental condition and glorification of Germany did not significantly predict criticism of Israel in the full model, β = .05, t(187) = 0.60, p = .550. Study 5 In Study 4, we tried to assess antisemitism in a less blatant form, harsh criticism of Israel. In Study 5, we accounted for the possibility that Germans might be equally sensitized to criticism of Israel as to open anti- semitic statements. It might be the case that when con- fronted with statements criticizing Israel in a question- naire, Germans retrieve learned answers, leaving little room for situational influences. However, emotional re- actions and behavior towards individual persons might be elicited more spontaneously and thus be more re- sponsive to situational influences. Therefore we turned away from antisemitism and investigated a more subtle reaction in the area of intergroup-emotions and inter- group-behavior: Does reminding Germans of the Holo- caust result in decreased empathy and support towards Israeli victims of rocket attacks? Again, we also aimed at testing whether this presumably defensive reaction is dependent on the level of national glorification. Method Participants. Ninety-eight students (53 women and 45 men) of different fields of study from the University of Cologne, Germany, participated in exchange for a bar of chocolate and the opportunity to win 50 EUR in a raffle. Participants ranged from 17 to 56 years in age (M = 22.97, SD = 5.68). Another two participants dropped out during the experiment and were deleted from the data set. Materials and Procedure. Participants were seated at computers in individual cubicles and were randomly assigned to one of two experimental conditions (Holo- caust reminder vs. control). After reading the instruc- tions in which they were informed that the study was about German perceptions of Israel, participants started by reporting their age, gender, educational status, citi- zenship, and whether their family had a history of mi- gration. Next, they were either reminded of the Holo- caust or not. In the Holocaust reminder condition, par- ticipants read the same text that already had been used in Study 4b. The control condition was a baseline con- dition in which participants were simply told that “we are interested in your perception of different aspects of Israel.” Subsequently, participants were presented with a short (47s) television news video about the Gaza- based rocket attacks on a village in Israel. After presen- tation of the video, empathy towards the Israeli victims was assessed using six adjectives from the empathy lit- erature (e.g., “compassionate”; Cronbach’s α = .73; Batson, Fultz, & Schoenrade, 1987). Using a seven- point scale (1 = not at all to 7 = extremely), partici- pants indicated for each of the items to what extent they had felt the given emotion while watching the video about the situation of the Israeli victims. The empathy items were presented among ten filler items – eight dis- tress adjectives (e.g., “worried”) and two guilt adjec- tives (e.g., “guilty”) for exploratory purposes. IMHOFF & MESSER 16 In order to increase credibility of the cover story that the study was about German perceptions of different as- pects of Israel, participants were presented with an- other short video, which was a report about young Is- raelis protesting against housing shortage and high rent prices, and indicated their agreement to six statements about the social protests (e.g., “I can easily identify with the requests of the protesters.”). Subsequently, partici- pants answered the eight-item national attachment (Cronbach’s α = .86) and the eight-item national glori- fication scale (Cronbach’s α = .79). Finally, they were presented with a screen saying that the study was over and that they could participate in a raffle offering the opportunity to win 50 EUR. Participation in the raffle included a measure of financial support of the Israeli victims as a second dependent variable (in addition to empathy). Participants were invited to pledge to donate a portion of their choice of these 50 EUR (between 0 and 50 EUR) to an organization supporting the Israeli victims in case of them winning the raffle. The text in- cluded a description of the charity organization. Results Empathy scores and donation pledge amounts were subjected to independent samples t-tests. Empathy scores did not significantly differ between participants who were reminded of the Holocaust (M = 4.03, SD = 1.07) and those in the control group (M = 3.72, SD = 0.94), t(96) = -1.53, p = .129, Hedges’s gs = -0.31, 95% CI[-0.70, 0.09]. Likewise, results revealed no significant group difference in the amounts participants pledged to donate in support of the Israeli victims (Holocaust re- minder: M = 18.48, SD = 13.41; control: M = 21.52, SD = 15.26), t(46) = 0.74, p = .466, Hedges’s gs = - 0.21, 95% CI[-0.78, 0.36]. To test whether level of na- tional glorification moderated the hypothesized effect of Holocaust reminders on empathy, we conducted a hi- erarchical multiple regression analysis with the effect- coded group variable (Holocaust = 1; control = -1), at- tachment to Germany, glorification of Germany, and all possible two-way and three-way interaction terms as predictors. In contrast to our hypothesis, the product term representing the interaction between glorification of Germany and experimental condition did not signifi- cantly predict empathy in the full regression model, β = -.20, t(90) = -1.29, p = .200. Table 1 Means, t-tests, and effect sizes for simple effects that would indicate secondary antisemitism across all studies. Study Measure N M (SD) M (SD) t df p Hedges’s g [95% CI] SEg Study 1 Change in explicit antisemitism Ongoing conse- quences 22 No ongoing consequences 22 Ongoing conse- quences 0.11 (0.90) No ongoing consequences -0.00 (0.78) 0.44 42 .666 0.13 [-0.45, 0.71] 0.30 Study 2 Change in explicit antisemitism Bogus Pipe- line 21 Control 23 Bogus Pipeline -0.47 (0.58) Control -0.41 (0.87) -0.27 42 .791 -0.08 [-0.66, 0.50] 0.30 Study 3a Warmth ratings of Classification Im- ages 56 Jewish/Holo- caust 3.02 (0.84) Jewish/con- trol 3.51 (0.96) 2.89 55 .006 0.54 [0.15, 0.93] 0.20 Study 3b Warmth ratings of Classification Im- ages 43 Jewish/Holo- caust 3.00 (0.89) Jewish/con- trol 3.16 (0.86) 1.11 42 .272 0.17 [-0.14, 0.48] 0.16 Study 3c Warmth ratings of Classification Im- ages 64 Jewish/Holo- caust 3.10 (0.84) Jewish/con- trol 2.61 (0.78) -4.49 63 < .001 -0.60 [-0.88, - 0.31] 0.15 Study 4a Criticism of Israel Holocaust 50 Control 50 Holocaust 3.02 (0.72) Control 3.06 (0.91) -0.29 98 .776 -0.06 [-0.45, 0.33] 0.20 IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 17 Study 4b Criticism of Israel Holocaust 100 Control 96 Holocaust 3.78 (0.85) Control 3.76 (0.88) 0.14 194 .890 0.02 [-0.26, 0.30] 0.14 Study 5 Empathy towards Israeli victims of rocket attacks Holocaust 50 Control 48 Holocaust 3.72 (0.94) Control 4.03 (1.07) -1.53 96 .129 -0.31 [-0.70, 0.09] 0.20 Note: The effect size measure used in Studies 1, 2, 4a, 4b, and 5 is Hedges’s gs. In Studies 3a through 3c, we used Hedges’s gav for the differ- ence of two correlated measurements as recommended by Lakens (2013). Discussion Study 5 did not provide evidence for the notion that being reminded of the Holocaust makes Germans (who score high on national glorification) less empathic with Israeli victims of Palestinian rocket attacks. We thus, again, failed to observe data patterns in line with sec- ondary antisemitism. Although the study did not have particularly high statistical power according to current standards, the effect on empathy was – if anything – in the unexpected direction (Hedges’s g of 0.31). Meta-analysis across studies Before discussing the findings, we want to integrate them to give an overall feeling for the obtained evidence or lack thereof. To do so, we calculated simple effects for each study that would indicate secondary antisemi- tism (Table 1). Although we predicted interactions for the first three studies, we decided to select simple com- parisons instead of effect sizes of interaction terms, be- cause simple comparisons are more informative regard- ing the direction of effects. For Studies 1 and 3 (that have a 2x2 design), we focused on the comparison be- tween conditions with reminders of either ongoing suf- fering or the Holocaust and conditions with these on the evaluation of Jewish targets. For Study 2, each condi- tion included an ongoing suffering manipulation and thus we compared the degree of (baseline-corrected) antisemitism in the bogus pipeline condition to that in the control condition. We calculated Hedges’s gs, respec- tively gav (see Lakens, 2013), for each of the studies and conducted a random-effects model using the R metafor package (Viechtbauer, 2010). The data show substan- tial heterogeneity, Q(7) = 27.14, p = < .001, I2 = 72.26%, and meta-analytically again no evidence for any secondary antisemitism (Figure 5). Given the large heterogeneity, an average effect size of almost exactly zero may be taken as an indication that – despite under- powered studies – it was not just too small of samples that prevented us from replicating the effect. General Discussion Across a research program spanning two years and eight studies, we did not provide evidence for the no- tion that reminders of the Holocaust evoke negative re- sponses towards Jews among German participants. None of the studies had particularly strong statistical power and none were Figure 5. Forest plot of all studies reported in the man- uscript. Simple effects are coded so that positive effects speak to the hypothesis of secondary antisemitism. a direct replication in exact detail. Nevertheless, their consistency in not producing any result has shaken our confidence in the very basic effect. In light of this, the original goal to better understand the involved pro- cesses has lost some of its relevance. As a caveat, although all of the studies reported here were conducted after the current debate on how to achieve more reproducible and reliable science had al- ready taken off in 2011, the spirit behind this research program is still rooted in the “old way” of doing re- search. What we did (without success) was hunt down an effect, desperately seeking a way to “make it signifi- cant”. What we did not do is systematically plan for compelling evidence – in either direction. In hindsight, many steps and detours we made may seem premature and a single extremely high-powered study might have IMHOFF & MESSER 18 been more advisable. That was not the typical proce- dure for conducting social psychological science in our book. Instead, if the effect was not there, the researcher had the wrong method to show it and therefore needed to change that method. Independent of how we might have obtained more compelling evidence in either di- rection, all our studies consistently converged in not producing results in line with a very influential theory of post-Holocaust antisemitism. What can we now make of this pattern? Does it mean that the original study (Imhoff & Banse, 2009) was a false positive? Although virtually everything in that original study fell exactly in its place, luck might have played a trick on us and convinced us of something that was never there to begin with (despite a plethora of qualitative writings and discourse analyses along the same lines). In light of the present research, we would argue that this is very well possible. Assuming that the original study was indeed a false positive might mean that either the very idea of secondary antisemitism is wrong or – as a more modest interpretation – that psy- chologists’ illusions of omnipotence to translate a socie- tal discourse and its dynamics into a pretty, 30-minute experiment are ill-advised. Maybe scholarly interpreta- tions of the effect of continuous Holocaust reminders are indeed to the point but it is naïve to emulate this in a cute little study. While we can only emphasize again that we are more than open to this possibility, we would – for the sake of the argument – like to entertain alternative ex- planations. Under the (admittedly speculative) assump- tion that the original study was a true positive, how could we explain the absence of any effect in that direc- tion in all studies reported here? One of the most parsimonious (and potentially cheap) explanations could rest on the assertion that – for whatever reasons – the bogus pipeline procedure just never worked as nicely as it did in 2007. As the original study attested, however, this is crucial to find the diverging effects of Holocaust reminders. The dilu- tion of psychological research and knowledge about the (missing) validity of “lie detectors” might have made it increasingly difficult to convince people of the operat- ing principle of the bogus pipeline. In fact, lie detectors are debunked on a regular base not only in undergrad- uate psychology classes but also popular media outlets. If that was true, the psychological processes of second- ary antisemitism indeed happened within our partici- pants as they did in 2007, but our setup was not potent enough to make participants admit antisemitism. Alt- hough we have no direct evidence against this plausibil- ity, we would argue that the several other steps we un- dertook to reduce social desirability should then have produced at least a suggestive pattern in line with the hypothesis. Another counter point to that argument might rest on the observation that psychologists tend to overesti- mate the power of social desirability, as most people are quite confident in the validity of their beliefs and do not adapt them according to what they think they ought to think and feel instead. In that case bogus pipelines would fail to produce effects not because they are not working, but because there is no hidden “real” belief to be revealed: People already speak frankly without such an apparatus. This would, however, mean that ongoing victim suffering did not increase our participants’ prej- udice. If it had, it should have produced a main effect independent of the bogus pipeline manipulation (not an interaction by the – in this logic – obsolete bogus pipe- line), which we never observed. The lack of effect in subsequent studies while maintaining the speculation that the original effect was a true positive is addressed by a second potential explanation. The second explanation is a little more difficult to argue with and in fact a reoccurring argument in the replication debate. Heraclitus’ dictum that you cannot step into the same river twice, as it is not the same river and you are not the same person, served as an encapsu- lation of the notion that the objectively same thing is not the same as time has passed. Meanings change and persons change and potentially the effect we sought is subject to Zeitgeist effects to a much greater extent than we anticipated. In fact, many effects, particularly in so- cial psychology, may be more prone to changing times and norms than most of us are typically willing to con- cede. The year 2007 not only was the year in which our original study took place, but also the year in which the last public negotiation between the Jewish Claims Con- ference Against Germany with the German federal gov- ernment came to an agreement. This was a publicly fol- lowed event and for many Germans Martin Walser’s (1998) infamous speech, in which he conceded that “I am almost glad when I think I can discover that more often not the remembrance, the not-allowed-to-forget is the motive, but the exploitation of our shame for cur- rent goals” still resonated. Very possibly, much like studies on prejudices against African Americans from the 1970s would not necessarily replicate in the 1980s and much less today, maybe also here the discursive context changed and thus made the seemingly highly similar experimental situation a psychologically very different one. IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 19 This is a specific case of Gergen’s (1973) more gen- eral argument that psychology is primarily a historical inquiry as it “deals with facts that are largely nonrepeat- able and which fluctuate markedly over time” (p. 310). If indeed empirical studies are then mere manifestations of a historically situated effect that neither does nor should be expected to be diachronically robust, what is the use of doing such studies? First, (obviously always under the premise that the original finding was no false positive) it illustrates this historically situated principle. It might be of interest (admittedly primarily for histori- cal reasons) that in the early 21st century a markedly negative reaction towards Jewish suffering was observ- able among German students. Second, and potentially more interesting, the original study might illustrate a general principle and a methodological approach to it rather than establish a human constant. The principle may be that the emphasis on victim suffering can back- fire if respondents are liberated from social desirability concerns. The method might thus provide a potential blueprint of how to tackle such research questions. Is this enough for a field of science that so desper- ately seeks to be as hard a science as the physical sci- ences in which “stable and broad generalizations can be established with a high degree of confidence” (Gergen, 1971, p. 309) and thus allow explanations that can be empirically tested? Does such an understanding of sci- ence have a place in the replication era? One positive aspect of the new way of doing psychology is that (hopefully), unlike in the present case, original studies will have been pre-registered and (internally) repli- cated before publication. The likelihood that these re- sults were true positives is much higher than in the cur- rent case of a single underpowered study. Failure to rep- licate in a different time, location, and/or context might thus inform our theorizing about the situatedness of the effect at hand. Thus, pre-registration will not only in- crease the trust in published findings, but also allow an informative hint whether it is advised or futile to seek hidden moderators. In summary, we repeatedly failed to conceptually replicate one of our findings across a relatively large number of studies, with different methodological ap- proaches. Thus, the claim that confrontation with the Holocaust evokes a backlash of antisemitism among Germans is not empirically well supported. Either the initial finding was a false positive or this process needs to be specifically situated in a given time and context. Open Science Practices This article earned the Open Data and the Open Ma- terials badge for making the data and materials availa- ble. It has been verified that the analysis reproduced the results presented in the article. The entire editorial pro- cess, including the open reviews, are published in the online supplement. References Adorno, T. W. (1975). Schuld und Abwehr. In T. W. Adorno, Gesammelte Schriften, Band 9 Soziologische Schriften II.2 (pp. 121-326). Frankfurt: Suhrkamp. Banse, R., & Gawronski, B. (2003). Die Skala Motivation zu vorurteilsfreiem Verhalten: Psychometrische Eigenschaften und Validität. Diagnostica, 49, 4-13. Batson, C. D., Fultz, J., & Schoenrade, P. A. (1987). Distress and empathy: Two qualitatively distinct vicarious emotions with different motivational consequences. Journal of Personality, 55, 19-39. Bergmann, W. (2006). “Nicht immer als Tätervolk dastehen” - Zum Phänomen des Schuldabwehr- Antisemitismus in Deutschland In D. Ansorge (Ed.), Antisemitismus in Europa und in der arabischen Welt (pp. 81-106). Paderborn- Frankfurt: Bonifatius Verlag. Branscombe, N. R., Ellemers, N., Spears, R., & Doosje, B. (1999). The context and content of social identity threat. In N. Ellemers, R. Spears, & B. Doosje (Eds.), Social identity: Context, commitment, content (pp. 35-58). Oxford, England: Blackwell Science. Branscombe, N. R., Schmitt, M. T., & Schiffhauer, K. (2007). Racial attitudes in response to thoughts of White privilege. European Journal of Social Psychology, 37, 203-215. Buruma, I. (2003, August). How to talk about Israel. New York Times, Section 6, p. 28. Castano, E., & Giner-Sorolla, R. (2006). Not quite human: Infrahumanization in response to collective responsibility for intergroup killing. IMHOFF & MESSER 20 Journal of Personality and Social Psychology, 90, 804-818. Chartrand, T. L., & Bargh, J. A. (1996). Automatic activation of impression formation and memorization goals: Nonconscious goal priming reproduces effects of explicit task instructions. Journal of Personality and Social Psychology, 71, 464-478. Cohen, T. R., Wolf, S. T., Panter, A. T., & Insko, C. A. (2011). Introducing the GASP scale: a new measure of guilt and shame proneness. Journal of Personality and Social Psychology, 100, 947-966. Correia, I., & Vala, J. (2003). When will a victim be secondarily victimized? The effect of observer’s belief in a just world, victim’s innocence and persistence of suffering. Social Justice Research, 16, 379-400. Dalbert, C. (1999). The world is more just for me than generally: About the personal belief in a just world scale's validity. Social Justice Research, 12, 79-98. Dotsch, R. & Todorov, A. (2012). Reverse Correlating Social Face Perception. Social Psychological and Personality Science, 3, 562-571. Dotsch, R., Wigboldus, D. H. J., & van Knippenberg, A. (2013). Behavioral information biases the expected facial appearance of members of novel groups. European Journal of Social Psychology, 43, 116-125. Dotsch, R., Wigboldus, D. H., Langner, O., & van Knippenberg, A. (2008). Ethnic out-group faces are biased in the prejudiced mind. Psychological Science, 19, 978-980. Egloff, B. & Krohne, H. W. (1998). Die Messung von Vigilanz und kognitiver Vermeidung: Untersuchungen mit dem Angstbewältigungs- Inventar (ABI). Diagnostica, 44, 189-200. Fiske, S. T., Cuddy, A. J. C., Glick, P., & Xu, J. (2002). A model of (often mixed) stereotype content: competence and warmth respectively follow from perceived status and competition. Journal of Personality and Social Psychology, 82, 878-902. Friedman, J. S., & Austin, W. (1978). Observers’ reactions to an innocent victim: Effect of characterological information and degree of suffering. Personality and Social Psychology Bulletin, 4, 569-574. Funke, F. (2005). The dimensionality of right-wing authoritarianism: Lessons from the dilemma between theory and measurement. Political Psychology, 26, 195-218. Gergen, K. J. (1973). Social psychology as history. Journal of Personality and Social Psychology, 26, 309-320. Godfrey, B. W., & Lowe, C. A. (1975). Devaluation of innocent victims: An attribution analysis within the just world paradigm. Journal of Personality and Social Psychology, 31, 944-951. Golec de Zavala, A., Cichocka, A., Eidelson, R., & Jayawickreme, N. (2009). Collective narcissism and its social consequences. Journal of Personality and Social Psychology, 97, 1074-1096. Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley. Heitmeyer, W. (2005). Deutsche Zustände (Folge 3) [German circumstances (Vol. 3)]. Frankfurt: Suhrkamp. Imhoff, R., & Banse, R. (2009). Ongoing victim suffering increases prejudice: The case of secondary anti-semitism. Psychological Science, 20, 1443–1447. Imhoff, R., & Bruder, M. (2014). Speaking (un-)truth to power: Conspiracy mentality as a generalised political attitude. European Journal of Personality, 28, 25–43. Imhoff, R., & Dotsch, R. (2013). Do we look like me or like us? Visual projection as self- or ingroup- projection. Social Cognition, 31, 806-816. Imhoff, R. (2010). Zwei Formen des modernen Antisemitismus? Eine Skala zur Messung primären und sekundären Antisemitismus. Conflict & communication online, 9. Imhoff, R., Bilewicz, M., & Erb, H. (2012). Collective regret versus collective guilt: Different emotional reactions to historical atrocities. European Journal of Social Psychology, 42, 729–742. Imhoff, R., Dotsch, R., Bianchi, M., Banse, R., & Wigboldus, D. H. J. (2011). Facing Europe: Visualizing spontaneous in-group projection. Psychological Science, 22, 1583-1590. Imhoff, R., Woelki, J., Hanke, S., & Dotsch, R. (2013). Warmth and competence in your face! Visual encoding of stereotype content. Frontiers in Psychology, 4, 386. Jost, J. T., & Hunyady, O. (2002). The psychology of system justification and the palliative function of ideology. European Review of Social Psychology, 13, 111-153. Jost, J.T., Banaji, M.R., & Nosek, B.A. (2004). A decade of system justification theory: Accumulated evidence of conscious and unconscious bolstering of the status quo. Political Psychology, 25, 881–919. IN SEARCH OF EXPERIMENTAL EVIDENCE FOR SECONDARY ANTISEMITISM : A FILE DRAWER REPORT 21 Kaplan, E. H., & Small, C. A. (2006). Anti-Israel sentiment predicts anti-Semitism in Europe. Journal of Conflict Resolution, 50, 548-561. Karremans, J. C., Dotsch, R.,& Corneille, O.(2011). Romantic relationship status biases memory of faces of attractive opposite-sex others: Evidence from a reverse-correlation paradigm. Cognition, 121, 422-426. Kempf, W. (2014). Anti-Semitism and criticism of Israel: Methodology and results of the ASCI survey. conflict & communication online, 14. Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in psychology, 4, 863 Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D. H. J., Hawk, S., & van Knippenberg, A.(2010). Presentation and validation of the Radboud Faces Database. Cognition and Emotion, 24, 1377–1388. Lerner, M. J., & Simmons, C. H. (1966). Observer's reaction to the" innocent victim": Compassion or rejection? Journal of Personality and Social Psychology, 4, 203-210. Lerner, M. J. (1980). Belief in the just world. New York: Plenum Press. Lundqvist, D., Flykt, A., & Öhman, A. (1998). The Karolinska directed emotional faces (KDEF). CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet, (1998). Mangini, M. & Biederman, I. (2004). Making the ineffable explicit: Estimating the information employed for face classifications. Cognitive Science, 28, 209–226. Marhenke, T., & Imhoff, R. (2018). Increased accessibility of semantic concepts after (more or less) subtle activation of related concepts: Support for the basic tenet of priming research. Unpublished manuscript. Miller, D. T. (1977). Altruism and threat to a belief in a just world. Journal of Experimental Social Psychology, 13, 113-124. Payne, B. K., Cheng, C. M., Govorun, O., & Stewart, B. D. (2005). An inkblot for attitudes: affect misattribution as implicit measurement. Journal of Personality and Social Psychology, 89, 277-293. Quirin, M., Kazén, M., & Kuhl, J. (2009). When nonsense sounds happy or helpless: The Implicit Positive and Negative Affect Test (IPANAT). Journal of Personality and Social Psychology, 97, 500-516. Rammstedt, B., & John, O. P. (2007). Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. Journal of Research in Personality, 41, 203-212. Roccas, S., Klar, Y., & Liviatan, I. (2006). The paradox of group-based guilt: modes of national identification, conflict vehemence, and reactions to the in-group's moral violations. Journal of Personality and Social Psychology, 91, 698-711. Rüsch, N., Corrigan, P. W., Bohus, M., Jacob, G. A., Brueck, R., & Lieb, K. (2007). Measuring shame and guilt by self-report questionnaires: A validation study. Psychiatry Research, 150, 313- 325. Schönbach, P. (1961). Reaktionen auf die antisemitische Welle im Winter 1959/1960 [Reactions to the anti-Semitic wave in the winter 1959/1960]. Frankfurt: Europäische Verlagsanstalt. Selznick, G. J., & Steinberg, S. (1969). The tenacity of prejudice: Anti-Semitism in contemporary America. Oxford, England: Harper & Row. Sigall, H., & Page, R. (1971). Current stereotypes: A little fading, a little faking. Journal of Personality and Social Psychology, 18, 247-255. Simons, C. W., & Piliavin, J. A. (1972). Effect of deception on reactions to a victim. Journal of Personality and Social Psychology, 21, 56-60. Steinberg, G. (2004). Abusing the legacy of the Holocaust: The role of NGOs in exploiting human rights to demonize Israel. Jewish Political Studies Review, 16, 59-72. Vala, J, Pereira, C. P., Eugênio, M., Lima, O., & Leyens, J. (2012). Intergroup Time Bias and Racialized Social Relations. Personality and Social Psychology Bulletin, 38, 491-504. von Collani, G. (2002). Das Konstrukt der Sozialen Dominanzorientierung als generalisierte Einstellung: eine Replikation [The construct of social dominance orientation as a generalized attitude: A replication]. Zeitschrift für Politische Psychologie, 10, 263-282. Walser, M. (1998). Erfahrungen beim Verfassen einer Sonntagsrede [Experiences while composing an oration]. In: Börsenverein des Deutschen Buchhandels (Hg.): Friedenspreis des Deutschen Buchhandels 1998 - Ansprachen aus Anlaß der Verleihung [Peace prize of the German booktrade 1998 – Speeches from the award ceremony]. Frankfurt/Main. Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect. The PANAS scales. IMHOFF & MESSER 22 Journal of Personality and Social Psychology, 54, 1063-1070. Weil, F. D. (1985). The variable effects of education on liberal attitudes: A comparative historical analysis of anti-Semitism using public opinion survey data. American Sociological Review, 50, 458-474.