call for papers the journal of dynamic decision making (jddm) invites submissions on political and social crises: a case for dynamic decision making? jddm is a community-run open-access journal with no charges for authors or readers published by heidelberg university (https://journals.ub.uni-heidelberg.de/index.php/jddm). facing problems, challenges, or difficult situations has always been part of human life. however, driven by rapidly advancing industrialization and globalization, modern societies face complex and dynamic problems at an unprecedented speed and scale. global climate change, the worldwide corona pandemic or the unexpected war in ukraine are just three examples of current global problems which dynamically unfold and affect the lives of millions – if not billions – people all over the world. despite their diversity, these problems share a central aspect: their development is considerably shaped by the dynamic decision making of a relatively small number of human actors in influential positions. these may be members of governments, administrations, international organizations, ngos, executives of large companies, political figures or media personalities. this special issue of jddm will focus on how the scholarly study and analysis of dynamic decision making processes can help to understand, explain and potentially improve decision processes in political and social problems of this kind. as a multi-disciplinary journal jddm welcomes submissions from a range of fields, such as psychology, economics, political science, sociology, philosophy, engineering, and others. jddm also supports methodological pluralism including quantitative research, qualitative methodologies or theoretical analysis. we particularly encourage contributions by early career researchers and studies focusing on populations traditionally underrepresented in scholarly research. for further details regarding jddms current scope and policies please refer to our recent editorial (https://doi.org/10.11588/jddm.2021.1.82929). if you are interested in contributing to this special issue of jddm, please submit an abstract outlinining your intended contribution (max. 300 words) by december 31st, 2022, via the following email address: editor@jddm.org you will receive feedback on your abstract by february, 2023. if accepted, we will ask you to submit your final manuscript until june, 2023. compliance with our manuscript guidelines is explicitly requested: https://journals.ub.uni-heidelberg.de/index.php/jddm/about/submissions subsequent to the submission of your manuscript, the peer review process will start and the result will be communicated to you by august, 2023. any revisions should then be made as soon as possible. the issue is scheduled to be published as open-access online in october, 2023. https://journals.ub.uni-heidelberg.de/index.php/jddm/index https://journals.ub.uni-heidelberg.de/index.php/jddm https://doi.org/10.11588/jddm.2021.1.82929 mailto:editor@jddm.org https://journals.ub.uni-heidelberg.de/index.php/jddm/about/submissions correction corrigendum: exploration and exploitation during information search and consequential choice cleotilde gonzalez1 and varun dutt2 1dynamic decision making laboratory, department of social and decision sciences, carnegie mellon university, pittsburgh, pa, usa and 2school of computing and electrical engineering and school of humanities and social sciences, indian institute of technology mandi, india an error occured in the paper of gonzalez and dutt(2016) that was recently published in jddm. the description of the sampling-h calculation which appears in the methods section of the paper (page 4, paragraph above the results section) is inaccurate and it appears in the original paragraph as: "then, we checked whether the option sampled by a participant was the high expected value option, and coded this as 1; otherwise, the choice was coded as 0. we then aggregated high choices across all participants and problems for different samples and defined the sampling-h rate per sample." the paragraph above should be replaced with the new paragraph as follows: "then, for each sample, we calculated the natural mean (hertwig & pleskac, 2008) for each option by summing all the experienced outcomes in the respective option and dividing by the number of samples up to the current one. if the option with the higher natural mean corresponded to the option with the higher expected value, the trial was coded as 1; otherwise it was coded as 0. we then aggregated the codes across all participants and problems for different samples and defined the sampling-h rate per sample." following this procedure produces the graph shown in fig. 3. the figure supports learning effects over time (i.e., the effect of sample size on sampling error): the option with the higher natural mean corresponds to the higher expected value. however, sampling-h does not reflect direct sampling behavior of the high expected value option as implied by the original paragraph. the interpretation of sampling-h throughout the article should therefore be in agreement with the meaning stated in the new paragraph. the r and matlab scripts that demonstrate the correct procedure for calculating sampling-h and generate figure 3 are available from the authors and online as supplementary materials. we thank jeffrey chrabaszcz, ddmlab, for producing the r code. we also thank an anonymous commentator for pointing out this error. declaration of conflicting interests: the authors declare they have no conflict of interests. author contributions: all authors contributed equally to the manuscript. supplementary material: supplementary material available online. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: gonzalez, c. & dutt, v. (2016). corrigendum: exploration and exploitation during information search and consequential choice. journal of dynamic decision making, 2, 4 doi:10.11588/jddm.2016.1.33651. received: 25th august 2016 accepted: 24th october 2016 published: 1st november 2016 references gonzalez, c. & dutt, v. (2016). exploration and exploitation during information search and consequential choice. journal of dynamic decision making, 2, 6. doi:10.11588/jddm.2016.1.29308 hertwig, r. & pleskac, t. j. (2008). the game of life: how small samples render choice simpler. in n. chater & m. oaksford (eds.), the probabilistic mind: prospects for bayesian cognitive science (pp. 209-235). oxford, uk: oxford university press. 10.11588/jddm.2016.1.33651 jddm | 2016 | volume 2 | article 4 | 1 http://doi.org/10.11588/jddm.2016.1.33651 http://doi.org/10.11588/jddm.2016.1.29308 http://dx.doi.org/10.11588/jddm.2016.1.33651 editorial the first year of the journal of dynamic decision making andreas fischer, daniel v. holt, and joachim funke department of psychology, heidelberg university we are proud to announce the completion of our firstvolume 2015, which comprises a range of interesting findings about modelling, training, and assessing dynamic decision making (ddm). summary of contributions these contributions to the first volume of the journal of dynamic decision making exemplify many relevant aspects of ddm as well as multiple perspectives that can be taken to investigate this multi-facetted phenomenon: dutt and gonzalez (2015) demonstrate the benefit of using process-data in decision modeling to explain both process and outcomes of ddm. güss, tuason and orduña (2015) write about decision making in complex dynamic environments and investigated how performance can be predicted by observations of certain strategies, tactics, and errors. kretzschmar and süß (2015) present an extensive training study and found that training with multiple complex environments had positive effects on knowledge acquisition but not on knowledge application in a ddm transfer task. hundertmark et al. (2015) report differential effects of cognitive ability on performance in different kinds of ddm tasks (e.g., effects are smaller in case of negative feedback), and fischer and neubert (2015) propose a model of problem solving competence (composed of knowledge, skills, abilities and other components) that explains what is required to handle complex dynamic environments. as editors, we are pleased by the breadth and quality of the initial contributions and thank the authors for supporting jddm! to visualize the various aspects of ddm and perspectives that are represented in our first volume we decided to build a word cloud based on all the papers of this volume (see figure 1). besides being a characterization of our first volume, figure 1 represents the close connection between ddm and problem solving in complex dynamic environments – “problem”, “model”, “complex problems solving” and “decision” were among the most frequent terms. further, the figure highlights the importance of “processes”, “knowledge”, “strategies”, and “abilities” for understanding ddm. plans for the future in the future we hope to present more research on the aspects highlighted by figure 1, but we are also encouraging researchers from different domains of research (e.g., economics, philosophy, or computer science) to contribute figure 1. word cloud based on the relative frequency of thematically relevant terms in the introductionand discussion-sections of the first volume (2015) of jddm. relative frequency of each word (i.e., frequency per length of section summed across all sections) is represented by font size and shading. their perspectives respectively. in addition to the articles mentioned above, many researchers followed our call to publish supplementary materials such as data sets or cognitive models. this provides interesting material for the readers of the journal of dynamic decision making, and it fosters the replicability of research. we hope that the contributions to our first volume will encourage more researchers around the world to contribute interesting and replicable research on ddm, and to ensure the journal of dynamic decision making will become a well-balanced journal that represents a wide range of views on all the different aspects of this fascinating topic. starting this year, we will also launch a jddm news blog to aggregate information about interesting papers, new tools or models, conferences, media coverage, or realworld applications of ddm. so, if you have any relevant news about ddm to share, please get in touch! 10.11588/jddm.2016.1.28995 jddm | 2016 | volume 2 | article 1 | 1 http://dx.doi.org/10.11588/jddm.2016.1.28995 fischer et al.: the first year appendix: reviewers and guest editors for jddm in 2015 we want to thank all our reviewers and guest editors, who did a great job in fostering the quality of submissions: • joachim funke, • daniel v. holt, • florian kutzner (guest editor), • magda osman, • jan rummel (guest editor), • wolfgang schoppek, • robert sternberg, • david tobinski. declaration of conflicting interests: the authors declare they have no conflict of interests. author contributions: all authors contributed equally to the manuscript. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: fischer, a., holt, d. v., & funke, j. (2016). the first year of the journal of dynamic decision making. journal of dynamic decision making, 2, 1. doi: 10.11588/jddm.2016.1.28995 published: 19 march 2016 references dutt, v. & gonzalez, c. (2015). accounting for outcome and process measures in dynamic decision-making tasks through model calibration. journal of dynamic decision making, 1, 2. doi:10.11588/jddm.2015.1.17663 fischer, a. & neubert, j.c. (2015). the multiple faces of complex problems: a model of problem solving competency and its implications for training and assessment. journal of dynamic decision making, 1, 6. doi: 10.11588/jddm.2015.1.23945 güss, c. d., tuason, m. t., & orduña, l.v. (2015). strategies, tactics, and errors in dynamic decision making in an asian sample. journal of dynamic decision making, 1, 3. doi: 10.11588/jddm.2015.1.13131 hundertmark, j., holt, d. v., fischer, a., said, n., & fischer, h. (2015). system structure and cognitive ability as predictors of performance in dynamic system control tasks. journal of dynamic decision making, 1, 5. doi: 10.11588/jddm.2015.1.26416 kretzschmar, a. & süß, h.-m. (2015). a study on the training of complex problem solving competence. journal of dynamic decision making, 1, 4. doi:10.11588/jddm.2015.1.15455 10.11588/jddm.2016.1.28995 jddm | 2016 | volume 2 | article 1 | 2 http://doi.org/10.11588/jddm.2015.1.17663 http://doi.org/10.11588/jddm.2015.1.23945 http://doi.org/10.11588/jddm.2015.1.13131 http://doi.org/10.11588/jddm.2015.1.13131 http://doi.org/10.11588/jddm.2015.1.26416 http://doi.org/10.11588/jddm.2015.1.15455 http://dx.doi.org/10.11588/jddm.2016.1.28995 editorial on the future of complex problem solving: seven questions, many answers? wolfgang schoppek1, andreas fischer2, daniel holt3 and joachim funke3 1university of bayreuth, germany. 2research institute for vocational education and training (f-bb), nuremberg, germany. 3heidelberg university, germany. while research on complex problem solving (cps) has reached a stage where certain standards have been achieved, the future development is quite ambiguous. therefore, we were interested in the views of representative authors about the attainments and the future development of that field. we asked the authors to share their point of view with respect to seven questions about the relevance of (complex) problem solving as a research area, about the contribution of laboratory-based cps research to solving real life problems, about the roles of knowledge, strategies, and intuition in cps, and about the existence of expertise in cps. complex problem solving, dynamic decision making, research strategy, knowledge acquisition, experts research on complex problem solving (cps) has reacheda stage where certain standards have been achieved, whereas the future development is quite ambiguous. in this situation, we were interested in the views of representative authors about the attainments. do we agree on the roles of knowledge and strategies that are important for cps? even more, we were interested in collecting ideas about the future development of our field. to stake off a conceptual framework, we introduce current definitions of the central concepts: “complex problem solving is a collection of self-regulated psychological processes and activities necessary in dynamic environments to achieve ill-defined goals that cannot be reached by routine actions” (dörner & funke, 2017, p.6). this definition clearly goes beyond the conception of cps as a narrowly defined competency. for defining knowledge, we refer to the preliminary process model by schoppek & fischer (2017): “structural knowledge is knowledge about the causal relations among the variables that constitute a dynamic system. i-o knowledge (shorthand for ‘input-output knowledge’) represents instances of interventions together with the system’s responses. strategy knowledge represents abstract plans of how to cope with the . . . problem” (p.2). the strategy notion may include quite specific approaches that might better be characterized as tactics. however, we discourage the use of the term ‘strategy’ for a mere description of a participant’s course of action. we asked the authors to share their point of view with respect to the seven questions listed below. as we were interested in unfiltered opinions, we did not subject the contributions to peer review, but to an editorial review. authors were free to select only five or six of the seven questions and add one or two of their own questions related to cps. the questions 1. why should there continue to be problem solving research (in addition to research on memory, decisionmaking, motivation etc.)? 2. what are the connections between current cps research practice and real problems? where do you see potential for development towards stronger relations? 3. given the artificiality of the laboratory situation, do participants really adopt the presented problems? what insights can be gained despite this artificiality and which cannot? 4. what evidence exists for the influence of other kinds of knowledge besides structural knowledge on the results of cps? which of these kinds of knowledge should be examined in future research? 5. what evidence is available for the impact of strategies (except votat) on the results of cps? which of these strategies should be examined more closely? 6. is there intuitive cps? 7. what distinguishes experts in cps from laypersons? declaration of conflicting interests: the authors declare they have no conflict of interests. author contributions: the first author wrote most parts of the manuscript. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: schoppek, w., fischer, a., holt, d. v., & funke, j. (2019). on the future of complex problem solving: seven questions, many answers?. journal of dynamic decision making, 5, 5. doi: 10.11588/jddm.2019.1.69294 published: 31 dec 2019 corresponding author: wolfgang schoppek, university of bayreuth, 95440 bayreuth, germany. e-mail: wolfgang.schoppek@uni-bayreuth.de 10.11588/jddm.2019.1.69294 jddm | 2019 | volume 5 | article 5 | 1 mailto: https://doi.org/10.11588/jddm.2019.1.69294 schoppek et al.: seven questions references dörner, d., & funke, j. (2017). complex problem solving: what it is and what it is not. frontiers in psychology, 8, 1153. doi:10.3389/fpsyg.2017.01153 schoppek, w., & fischer, a. (2017). common process demands of two complex dynamic control tasks: transfer is mediated by comprehensive strategies. frontiers in psychology, 8, 2145. doi:10.3389/fpsyg.2017.02145 10.11588/jddm.2019.1.69294 jddm | 2019 | volume 5 | article 5 | 2 https://doi.org/10.3389/fpsyg.2017.01153 https://doi.org/10.3389/fpsyg.2017.02145 https://doi.org/10.11588/jddm.2019.1.69294 editorial looking back at the third volume of the journal of dynamic decision making andreas fischer, daniel v. holt, and joachim funke department of psychology, heidelberg university we are proud to announce the completion of volume2017, which comprises a range of interesting findings about dynamic decision making (ddm). as the editors of the journal of dynamic decision making we consider ddm a multi-facetted phenomenon (see fischer, holt, and funke, 2015; 2016) and we are pleased to see a variety of different methods and perspectives on ddm that is apparent in the papers published in 2017. emphasizing the value of self-descriptive methods, wendt (2017) proposes to use live streaming as a valuable data source for research on ddm. engelhart, funke, and sager (2017) elaborate on the empirical effect of different kinds of feedback on performance in ddm. sharma and dutt (2017) compare the performance and generalization of various cognitive models in the sampling paradigm of ddm. frank and kluge (2017) elaborate on the effects of general mental ability and memory on temporal transfer of workrelated ddm skills. vangsness and young (2017) present two experiments on effects of task difficulty and experienced losses on the probability of applying risk defusing operators in a videogame environment. the broad and multi-facetteded approach to investigating decision making and problem solving promoted by jddm is also reflected in a recent research topic in the journal frontiers (edited by w. schoppek, j. funke, m. osman, and a. kluge1). while specifically addressing complex problem solving, which is conceptually related to ddm (see fischer, greiff, & funke, 2012), the research topic covers many questions also relevant for cognitive research in ddm. in particular, dörner and funke (2017) discuss the central question of validity: what is complex problem solving and what is it not? they propose that instead of concentrating attention on psychometric issues such as reliable assessment instruments we should refocus on the content validity of complex problem solving. dörner and funke draw attention to psychological phenomena like the emergency reaction of the cognitive system, the role of context and background knowledge (as demonstrated by crosscultural differences), the potential for failures, or the context dependency of strategies. similarly, funke, fischer, and holt (2017) argue that “solving multiple simple problems is not complex problem solving” and that “complex problem solving is, first and foremost, a complex cognitive process, which involves a range of skills, abilities and knowledge” (funke, et al., 2017, p. 8). processes of this kind – complex problem solving as well as dynamic decision making – require a wide range of tools and perspectives in order to be investigated adequately. we look forward to seeing such a broad scientific approach and corresponding discourse unfold in future volumes of jddm. reviewers and guest editors for jddm in 2017 we want to thank all our reviewers and guest editors, who did a great job in fostering the quality of submissions: • armin fuegenschuh (btu cottbus, germany), • joachim funke (heidelberg university, germany), • dominik c. güß (university of north florida, usa), • daniel v. holt (heidelberg university, germany), • oswald huber (fribourg university, switzerland), • benjamin scheibehenne (university of geneva, switzerland), • wolfgang schoppek (bamberg university, germany), • anna-lena schubert (heidelberg university, germany). finally, we would like to encourage researchers working in the field of ddm to contribute their work to our journal. there are many reasons to choose the journal of dynamic decision making (jddm) as your outlet, such as the short time between submission, peer review, and final publication; currently (and hopefully in the future) no article processing fees; free availability of your articles for researchers word-wide; easy citation and direct access to articles with digital object identifiers (dois); and high visibility of published articles through listing in the directory of open access journals (doaj) and google scholar. moreover, heidelberg university library as the host of jddm has sustainably operated for more than 625 years. we are therefore confident that articles published in jddm will remain freely accessible for decades (if not centuries) to come! declaration of conflicting interests: the authors declare they have no conflict of interests. author contributions: all authors contributed equally to the manuscript. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. 1 https://www.frontiersin.org/research-topics/5058/complexproblem-solving-beyond-the-psychometric-approach 10.11588/jddm.2017.1.43868 jddm | 2017 | volume 3 | article 6 | 1 https://www.frontiersin.org/research-topics/5058/complex-problem-solving-beyond-the-psychometric-approach https://www.frontiersin.org/research-topics/5058/complex-problem-solving-beyond-the-psychometric-approach http://dx.doi.org/10.11588/jddm.2017.1.43868 fischer et al.: looking back citation: fischer, a., holt, d. v., & funke, j. (2017). looking back at the third volume of the journal of dynamic decision making. journal of dynamic decision making, 3, 6. doi:10.11588/jddm.2017.1.43868 published: 31 dec 2017 references dörner, d., & funke, j. (2017). complex problem solving: what it is and what it is not. frontiers in psychology, 8, 1153. doi:10.3389/fpsyg.2017.01153 engelhart, m., funke, j. ,& sager, s. (2017). a web-based feedback study on optimization-based training and analysis of human decision making. journal of dynamic decision making, 3, 2. doi:10.11588/jddm.2017.1.34608 fischer, a., greiff, s., & funke, j. (2012). the process of solving complex problems. journal of problem solving, 4, 19-42. fischer, a., holt, d. v., & funke, j. (2015). promoting the growing field of dynamic decision making. journal of dynamic decision making, 1, 1. doi:10.11588/jddm.2015.1.23807 fischer, a., holt, d. v., & funke, j. (2016). the first year of the journal of dynamic decision making. journal of dynamic decision making, 2, 1. doi:10.11588/jddm.2016.1.28995 frank, b. & kluge, a. (2017). the effects of general mental ability and memory on adaptive transfer in work settings. journal of dynamic decision making, 3, 4. doi:10.11588/jddm.2017.1.40004 funke, j., fischer, a., & holt, d. v. (2017). when less is less: solving multiple simple problems is not complex problem solving a comment on greiff et al. (2015). journal of intelligence, 5, 5. doi:10.3390/jintelligence5010005 sharma, n., & dutt, v. (2017). modeling decisions from experience: how models with a set of parameters for aggregate choices explain individual choices. journal of dynamic decision making, 3, 3. doi:10.11588/jddm.2017.1.37687 vangsness, l. & young, m. e. (2017). the role of difficulty in dynamic risk mitigation decisions. journal of dynamic decision making, 3, 5. doi:10.11588/jddm.2017.1.41543 wendt, a. n. (2017). the empirical potential of live streaming beyond cognitive psychology. journal of dynamic decision making, 3, 1. doi:10.11588/jddm.2017.1.33724 10.11588/jddm.2017.1.43868 jddm | 2017 | volume 3 | article 6 | 2 http://dx.doi.org/10.11588/jddm.2017.1.43868 https://doi.org/10.3389/fpsyg.2017.01153 http://dx.doi.org/10.11588/jddm.2017.1.34608 http://dx.doi.org/110.11588/jddm.2015.1.23807 http://dx.doi.org/10.11588/jddm.2016.1.28995 http://dx.doi.org/10.11588/jddm.2017.1.40004 https://doi.org/10.3390/jintelligence5010005 http://dx.doi.org/10.11588/jddm.2017.1.37687 http://dx.doi.org/10.11588/jddm.2017.1.41543 http://dx.doi.org/10.11588/jddm.2017.1.33724 http://dx.doi.org/10.11588/jddm.2017.1.43868 opinion a new orientation for research on problem solving and competencies in any domain andreas fischer1 1research institute for vocational education and training (f-bb), germany. research on complex problem solving (cps) has reached a stage where certain standards have been achieved, whereas the future development is quite ambiguous. in this situation, the editors of the journal of dynamic decision making asked a number of representative authors to share their point of view with respect to seven questions about the relevance of (complex) problem solving as a research area, about the contribution of laboratory-based cps research to solving real life problems, about the roles of knowledge, strategies, and intuition in cps, and about the existence of expertise in cps. why should there continue to be problem solving research (in addition to research on memory, decision-making, motivation etc.)? problem solving as well as the disposition to solve tasksand problems in a given domain (i.e., “competence”, see fischer & neubert, 2015) is more than the sum of its parts and interesting in its own right. in particular, building and testing theories on problem solving may contribute to • understanding where and why people fall short from optimum when confronted with complex and dynamic problems, • deriving and teaching/training useful strategies to help people in need to become better problem solvers (kretzschmar & süß, 2015), • providing assistance to people in charge (e.g., by partially automating the process of modelling or solving complex problems). what are the connections between current cps research practice and real problems? where do you see potential for development towards stronger relations? current cps research has a focus on interactive toy problems that can be solved by systematically applying simple strategies such as “varying one thing at a time” table 1. exemplary components of competency, varying in domain-specificity (cf. fischer & neubert, 2015) knowledge skills abilities other domain general world knowledge problem solving skills general intelligence frustration tolerance domain specific domain expertise psychomotor skills numerical reasoning certificates (votat). this kind of research is interesting and valuable in many regards, but needs to be put in perspective (for an overview, see fischer, 2015; funke, fischer & holt, 2018). to establish stronger relations of cps research to real problems, the heterogeneity inherent in some of the current cps paradigms (e.g., microfin) could be exploited. additionally, new paradigms based on fundamental problems and dilemmata of real life may well be worth a try (e.g., grossmann & kross, 2014; grossmann, kung & santos, 2018) given the artificiality of the laboratory situation, do participants really adopt the presented problems? what insights can be gained despite this artificiality and which cannot? the artificiality of the laboratory situation is perfectly suited for (and may have contributed to a focus on) research on toy problems. this is not necessarily a bad thing: cps research of the last decade has shown that this kind of research can be fruitful indeed. presenting more complex and/or realistic problems in the laboratory in an immersive manner is more challenging but it may be worth the effort (see schoppek & fischer, 2017; grossmann & kross, 2014). what evidence exists for the influence of other kinds of knowledge besides structural knowledge on the results of cps? which of these kinds of knowledge should be examined in future research? there is a lot of research on the influence of strategic knowledge, implicit knowledge, instance-based learning and the potential of case-based reasoning (see fischer, greiff & funke, 2012). future research should elaborate on the interplay among these kinds of knowledge as well as on non-cognitive factors and circumstances this interplay (or its effectiveness) depends upon (cf. fischer & neubert, 2015). 10.11588/jddm.2019.1.69298 jddm | 2019 | volume 5 | article 9 | 1 https://doi.org/10.11588/jddm.2019.1.69298 fischer: a new orientation what evidence is available for the impact of strategies (except votat) on the results of cps? which of these strategies should be examined more closely? problem solving research has elaborated on a variety of heuristics and strategies (see fischer, greiff and funke, 2017), and all of these huristics and strategies can be applied to problems of varying complexity. one question that research on cps should elaborate on in more detail is when to apply (or abandon) which strategy. is there intuitive cps? this depends on the definitions of intuition and of cps, but i tend to agree: on the one hand a person may not be likely to have a problem when intuition can provide a solution. on the other hand – and highly characteristic for cps situations – an expert may well be able to intuitively provide a solution to another person’s problem (commonly referred to as “wisdom”, cf. fischer, 2015b; fischer & funke, 2016). in fact people even tend to reason more wisely when it comes to other peoples’ problems a phenomenon known as “solomon’s paradox” (grossmann & kross, 2014). what distinguishes experts in cps from laypersons? wisdom i.e., knowledge and deep understanding of the fundamental pragmatics of life (fischer 2015b; baltes & staudinger, 2000) may be one of the most distinguishing attributes of an expert in cps, but as the disposition to solve complex problems (i.e., “competence”) in any domain is based on a wide range of domain-general and domainspecific kinds of knowlede, skills, abilities and other components (as explained in more detail in the ksao-model by fischer & neubert, 2015) differences are to be expected in each component of the ksao-model (see table 1 for examples). declaration of conflicting interests: the author declares he has no conflict of interests. author contributions: the author is completely responsible for the content of this manuscript. the abstract was added by the editors. handling editor: andreas fischer and wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: fischer, a. (2019). a new orientation for research on problem solving and competencies in any domain. journal of dynamic decision making, 5, 9. doi: 10.11588/jddm.2019.1.69298 published: 31 dec 2019 references baltes, p. b., & staudinger, u. m. (2000). wisdom: a metaheuristic (pragmatic) to orchestrate mind and virtue toward excellence. american psychologist, 55(1), 122-136. doi: 10.1037//0003066x.55.1.122 fischer, a. (2015). assessment of problem solving skills by means of multiple complex systems–validity of finite automata and linear dynamic systems (doctoral dissertation). doi: 10.11588/heidok.00019689 fischer, a. (2015b). wisdom-the answer to all the questions really worth asking. international journal of humanities and social science, 5(9), 73-83. doi: 10.11588/heidok.00019786 fischer, a., & funke, j. (2016). entscheiden und entscheidungen: die sicht der psychologie. interdiszipliniratität in den rechtswissenschaften. ein interdisziplinärer und internationaler dialog, 217-229. fischer, a., greiff, s., & funke, j. (2012). the process of solving complex problems. journal of problem solving, 4(1), 19-42. doi: 10.7771/1932-6246.1118 fischer, a., greiff, s., & funke, j. (2017). the history of complex problem solving. in benő csapó and joachim funke (eds.), the nature of problem solving: using research to inspire 21st century learning (pp. 107-121). oecd publishing, paris. doi: 10.1787/9789264273955-9-en fischer, a. & neubert, j.c. (2015). the multiple faces of complex problems: a model of problem solving competency and its implications for training and assessment. journal of dynamic decision making,1, 6. doi: 10.11588/jddm.2015.1.23945 funke, j., fischer, a., & holt, d. v. (2018). competencies for complexity: problem solving in the twenty-first century. in assessment and teaching of 21st century skills (pp. 41-53). springer, cham. doi: 10.1007/978-3-319-65368-6_3 grossmann, i., kung, f. y., & santos, h. c. (2018). wisdom as state vs. trait. in r. j. sternberg & j. glück (eds.), the cambridge handbook of wisdom. cambridge, uk: cambridge university press. doi: 10.1017/9781108568272.013 grossmann, i., & kross, e. (2014). exploring solomon’s paradox: self-distancing eliminates the self-other asymmetry in wise reasoning about close relationships in younger and older adults. psychological science, 25(8), 1571-1580. doi: 10.1177/0956797614535400 kretzschmar, a. & süß, h.-m. (2015). a study on the training of complex problem solving competence. journal of dynamic decision making, 1, 4. doi: 10.11588/jddm.2015.1.15455 schoppek, w., & fischer, a. (2017). common process demands of two complex dynamic control tasks: transfer is mediated by comprehensive strategies. frontiers in psychology, 8, 2145. doi: 10.3389/fpsyg.2017.02145 corresponding author: andreas fischer, forschungsinstitut betriebliche bildung (f-bb) ggmbh, rollnerstraße 14, 90408 nuremberg, germany. email: andreas.fischer@f-bb.de 10.11588/jddm.2019.1.69298 jddm | 2019 | volume 5 | article 9 | 2 https://doi.org/10.11588/jddm.2019.1.69298 https://doi.org/10.1037//0003-066x.55.1.122 https://doi.org/10.1037//0003-066x.55.1.122 http://doi.org/10.11588/heidok.00019689 http://doi.org/10.11588/heidok.00019689 http://doi.org/10.11588/heidok.00019786 https://doi.org/10.7771/1932-6246.1118 https://doi.org/10.7771/1932-6246.1118 http://dx.doi.org/10.1787/9789264273955-9-en http://dx.doi.org/10.1787/9789264273955-9-en http://dx.doi.org/10.11588/jddm.2015.1.23945 https://doi.org/10.1007/978-3-319-65368-6_3 https://doi.org/10.1017/9781108568272.013 https://doi.org/10.1177/0956797614535400 https://doi.org/10.1177/0956797614535400 http://dx.doi.org/10.11588/jddm.2015.1.15455 https://doi.org/10.3389/fpsyg.2017.02145 https://doi.org/10.3389/fpsyg.2017.02145 mailto: https://doi.org/10.11588/jddm.2019.1.69298 opinion complex problem solving in search for complexity joachim funke1 1department of psychology, heidelberg university, germany. research on complex problem solving (cps) has reached a stage where certain standards have been achieved, whereas the future development is quite ambiguous. in this situation, the editors of the journal of dynamic decision making asked me to share my point of view with respect to seven questions about the relevance of (complex) problem solving as a research area, about the contribution of laboratory-based cps research to solving real life problems, about the roles of knowledge, strategies, and intuition in cps, and about the existence of expertise in cps. why should there continue to be problem solving research (in addition to research on memory, decision-making, motivation etc.)? problem solving research is more than a combination of research on memory, decision making, or motivation because it integrates all basic functions of the human brain (and the human body) in the service of proper acting. therefore, a theory of action is needed that brings together the different partial cognitive functions with emotion regulation and with motivational issues. effective problem solving in complex situations requires the integration of cognition, motivation, and emotion. what are the connections between current cps research practice and real problems? where do you see potential for development towards stronger relations? recent research on problem solving is still working with simple problems (e.g., puzzle problems, see sanders et al., 2019) – the problems in daily life or with regard to life on the planet earth are quite different to either moving “towers of hanoi” or finding puzzle pieces – different in terms of complexity, dynamics, intransparency, and incompatibility (or even contradictoriness) of multiple goals. even what is subsumed under the heading of cps in modern research has lost the original complexities of real-life problems (for validity issues, see dörner & funke, 2017). that state of affairs needs to be changed. given the artificiality of the laboratory situation, do participants really adopt the presented problems? what insights can be gained despite this artificiality and which cannot? laboratory experiments are fine for testing hypotheses – but from my point of view, we are far away from comprehensive theories that would allow for the derivation of specific hypotheses. we are still in need of good field studies (see brehmer & dörner, 1993). what evidence exists for the influence of other kinds of knowledge besides structural knowledge on the results of cps? which of these kinds of knowledge should be examined in future research? structural knowledge is only one of the ingredients for successful problem solving. additionally, there is knowledge necessary for interventions into complex systems and knowledge for the identification of unknown systems. the use of semantically "poor" systems (with variable labels like "a", "b", or "c") tries to keep knowledge outside the problem solving process. if we allow problems to be semantically "rich", a broad universe of knowledge becomes immediately important. in future research, domain knowledge should be acknowledged as an significant ingredient of any kind of problem solving. the more we allow domain specificity, the more influential becomes domain knowledge. what evidence is available for the impact of strategies (except votat) on the results of cps? which of these strategies should be examined more closely? votat is an excellent strategy for simple systems but we need strategy analyses for more complex and realistic problems. think, for example, of the “thirty-six stratagems” within the chinese culture based on sun tzu’s art of war – quite different to the simple strategies discussed in recent research papers (stadler, fischer, & greiff, 2019). the identification of simple systems can be approached by simple strategies but once real-life complexities enter the stage, a strategy like votat is no longer helpful. corresponding author: joachim funke, department of psychology, heidelberg university, hauptstr. 47, 69117 heidelberg, germany. email: joachim.funke@pychologie.uni-heidelberg.de 10.11588/jddm.2019.1.69299 jddm | 2019 | volume 5 | article 10 | 1 mailto: https://doi.org/10.11588/jddm.2019.1.69299 funke: complex problem solving is there intuitive cps? as kahneman and klein (2009) explain, there is good reason for the assumption of intuitive skills. if that is true, it should be valid also for the domain of complex problem solving. it might be related to wisdom (see, e.g., fischer, 2015). what distinguishes experts in cps from laypersons? experts in solving complex problems have a good understanding of systems. funke, fischer, and holt (2018, p. 47) argue for a “systems competency” that consists of the ability to construct mental models of systems, to form and test hypotheses, and to develop strategies for system identification and control. experts in solving complex problems should be particularly skilled on these dimensions. declaration of conflicting interests: the author declares he has no conflict of interests. author contributions: the author is completely responsible for the content of this manuscript. the abstract was added by the editors and modified by the author. handling editor: andreas fischer and wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: funke, j. (2019). complex problem solving in search for complexity. journal of dynamic decision making, 5, 10. doi: 10.11588/jddm.2019.1.69299 published: 31 dec 2019 references brehmer, b., & dörner, d. (1993). experiments with computersimulated microworlds: escaping both the narrow straits of the laboratory and the deep blue sea of the field study. computers in human behavior, 9(2–3), 171–184. doi:10.1016/07475632(93)90005-d dörner, d., & funke, j. (2017). complex problem solving: what it is and what it is not. frontiers in psychology, 8(1153), 1–11. doi:10.3389/fpsyg.2017.01153 fischer, a. (2015). wisdom -the answer to all the questions really worth asking. international journal of humanities and social science, 5(9), 73–83. doi:10.11588/heidok.00019786 funke, j., fischer, a., & holt, d. v. (2018). competencies for complexity: problem solving in the twenty-first century. in e. care, p. griffin, & m. wilson (eds.), assessment and teaching of 21st century skills. research and applications (pp. 41–53). doi:10.1007/978-3-319-65368-6_ 3 kahneman, d., & klein, g. (2009). conditions for intuitive expertise: a failure to disagree. american psychologist, 64(6), 515–526. doi:10.1037/a0016755 sanders, k. e. g., osburn, s., paller, k. a., & beeman, m. (2019). targeted memory reactivation during sleep improves next-day problem solving. psychological science, 28. doi:10.1177/0956797619873344 stadler, m., fischer, f., & greiff, s. (2019). taking a closer look: an exploratory analysis of successful and unsuccessful strategy use in complex problems. frontiers in psychology, 10, 10. doi:10.3389/fpsyg.2019.00777 10.11588/jddm.2019.1.69299 jddm | 2019 | volume 5 | article 10 | 2 https://doi.org/10.11588/jddm.2019.1.69299 https://doi.org/10.1016/0747-5632(93)90005-d https://doi.org/10.1016/0747-5632(93)90005-d https://doi.org/10.3389/fpsyg.2017.01153 https://doi.org/10.11588/heidok.00019786 https://doi.org/10.1007/978-3-319-65368-6_ 3 https://doi.org/10.1037/a0016755 https://doi.org/10.1177/0956797619873344 https://doi.org/10.3389/fpsyg.2019.00777 https://doi.org/10.11588/jddm.2019.1.69299 opinion quo vadis cps? brief answers to big questions matthias stadler1 and samuel greiff2 1ludwig-maximilians-university munich. 2university of luxembourg. research on complex problem solving (cps) has reached a stage where certain standards have been achieved, whereas the future development is quite ambiguous. in this situation, the editors of the journal of dynamic decision making asked a number of representative authors to share their point of view with respect to seven questions about the relevance of (complex) problem solving as a research area, about the contribution of laboratory-based cps research to solving real life problems, about the roles of knowledge, strategies, and intuition in cps, and about the existence of expertise in cps. why should there continue to be problem solving research (in addition to research on memory, decision-making, motivation etc.)? the ability to solve problems (i.e., tasks for which no ap-parent solution is readily available) has, in our view, become one of the quintessential abilities for both professional and personal life. by now, machines can complete most repetitive tasks, leaving humans more time to focus on creating new knowledge and applying this knowledge to solve problems. while computers can help us overcome some of our human limitations (e.g., externalize our memory or help with decision processes), ultimately we as humans need to define the problems we want to solve and find ways to use the appropriate tools and strategies. research on how people approach problems, why they fail to solve them, and how they can be supported to succeed in the future, needs to be continued. however, the field is also in need of either a clear delineation to other, often overlapping, fields such as dynamic-decision making or of stronger efforts to synthesize adjacent fields to see what problem solving research can learn from fields such as decision making and vice versa. what are the connections between current cps research practice and real problems? where do you see potential for development towards stronger relations? most of the current research on cps focusses on complex systems with only few variables or that, in some way, do not fully resemble the complexity of the real world (greiff, fischer, stadler, & wüstenberg, 2015). despite the justified criticism that reality is far more complex, this limitation in contemporary assessment instruments might still be appropriate to represent “real-world” problem solving. if, for example, your cat is sick it is certainly appropriate to identify everything it ate (few variables) and then systematically rule out potential causes of the illness. obviously, some problems are either too complex to fully understand the influence of each individual variable, not stable enough to actually specify any consistent rules, or there is not enough time to explore the system comprehensively. such systems were used to study cps in the field’s “early days” with the aim of emulating “real world” problems as closely as possible. we argue, though, that most people deal with sick cats more frequently than they become almost omnipotent rulers of midsized cities (the scenario of one of the most famous cps tasks; dörner, kreuzig, reither, & stäudel, 1983). while there is a great deal of research on problem solving in controllable systems (such as the food you feed your cat), research on uncontrollable systems needs to be strengthened. for instance, we face the problem of how to talk to our colleagues during our daily interactions with them. telling jokes (i.e., an “input variable”) may make some people like you more, whereas others may not appreciate it (i.e., “outcome variables”). systematically isolating colleagues to tell them jokes in order to measure their response is obviously not feasible. however, based on data that has been generated in the past, we could generate knowledge and then use this knowledge to solve future problems. this line of research is exciting and might help us understand “real-world” problem solving in diverse situations. given the artificiality of the laboratory situation, do participants really adopt the presented problems? what insights can be gained despite this artificiality and which cannot? in our experience, the artificial nature of the problem situation is not problematic as long as the cognitive (and noncognitive) processes involved are the same. there seems to be no reason to assume that a person who is able to solve a problem in a laboratory situation will not be able to solve a similar problem in a more naturalistic situation. examples come from the fields of both medical, military, and teacher training where complex skills are usually trained and assessed using simulations (for an overview see chernikova, heitzmann, fink, timothy, seidel, t., & fischer, 2019). corresponding author: matthias stadler, ludwig-maximilians-university munich, geschwister-scholl-platz 1, 80539 munich, germany. email: matthias.stadler@uni-muenchen.de 10.11588/jddm.2019.1.69302 jddm | 2019 | volume 5 | article 13 | 1 mailto: https://doi.org/10.11588/jddm.2019.1.69302 stadler & greiff: quo vadis cps? what evidence exists for the influence of other kinds of knowledge besides structural knowledge on the results of cps? which of these kinds of knowledge should be examined in future research? the amount of evidence on the impact of knowledge on cps is plentiful. as mentioned above, various fields such as medical, military, and teacher training use complex simulations in which participants need to engage in cps in knowledge-rich situations. combining the theories of these fields with the methodology and theory of the more cognitive research on “knowledge lean” cps will, in our view, be one of the most exciting challenges for future research. what evidence is available for the impact of strategies (except votat) on the results of cps? which of these strategies should be examined more closely? interestingly, the votat strategy has, by far, received the highest level of attention when it comes to understanding strategic behavior in cps. this is mirrored in the field of science inquiry in which the very same strategy only with the different label cvs (control of variables) has received a similar amount of attention. moreover, a study based on the pisa 2012 data found that use of votat in one task was highly predictive of overall cps score that required solving tasks with different strategies (greiff, wüstenberg, & avvisati, 2015). thus, votat (just as other strategies) might not be limited to a specific behavior but also indicate a more general level of strategic competence. we know little about what this competence might be even though some recent studies have looked at other strategic behaviors in cps research (beckmann, birney, & goode, 2017; schoppek & fischer, 2017). an often neglected way forward might be to look at (longer) sequences of behaviors instead of the single use of strategies using (educational) data mining techniques to discover those fuzzy relations (stadler, fischer, & greiff, 2019). another interesting topic are heuristics that are needed in complex environments and that have not been sufficiently focused from an individual difference perspective (gigerenzer & gaissmaier, 2006). is there intuitive cps? – what distinguishes experts in cps from laypersons? the term “experts” is often employed in the context of specific domains in which individuals can – partly through practice and experience – achieve an extremely high level of competency and/or knowledge with chess experts being the classical example (detterman, 2014). thus, usually experts are found in specific areas and we are not sure whether the term equally applies to a broad mental ability such as cps. in fact, one would not consider highly intelligent people as experts in intelligence and a gradual distinction between different levels of cps might be more appropriate (for an example that distinguishes a continuous scale into distinct level for ease of communication, see the pisa 2012 problem solving assessment; oecd 2010). of course, people with high levels of cps are likely to differ from people with low levels of cps, for instance with regard to fundamental cognitive abilities, meta-cognition, or the available set of strategies. to the best of our knowledge, there is no research indicating clear qualitative shifts (e.g., from layperson to expert) beyond what could be described in quantitative models of cps. declaration of conflicting interests: the authors declare they have no conflict of interests. author contributions: both authors contributed to the content of this manuscript. the abstract was added by the editors. handling editor: andreas fischer and wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: stadler, m. & greiff, s. (2019). quo vadis cps? brief answers to big questions. journal of dynamic decision making, 5, 13. doi: 10.11588/jddm.2019.1.69302 published: 31 dec 2019 references beckmann, j. f., birney, d. p., & goode, n. (2017). beyond psychometrics: the difference between difficult problem solving and complex problem solving. frontiers in psychology, 8, 1739. doi:10.3389/fpsyg.2017.01739 chernikova, o., heitzmann, n., fink, m., timothy, v., seidel, t., & fischer, f. (2019). facilitating diagnostic competences in higher education a meta-analysis in medical and teacher education. educational psychology review, 1-40. doi:10.1007/s10648-019-09492-2 detterman, d. k. (2014). introduction to the intelligence special issue on the development of expertise: is ability necessary? intelligence, 45, 1–5. doi:10.1016/j.intell.2014.02.004 dörner, d., kreuzig, h.w., reither, f., & stäudel, t. (eds.). (1983). lohhausen: vom umgang mit unbestimmtheit und komplexität [lohausen: on the handling of uncertainty and complexity]. bern: huber (retrieved from http://www.verlaghanshuber.com/). gigerenzer, g. & gaissmaier, w. (2006): denken und urteilen unter unsicherheit: kognitive heuristiken. in: funke, j. (hrsg.): enzyklopädie der psychologie, bd. c, ii, 8. denken und problemlösen, göttingen 2006, s. 329-374. greiff, s., fischer, a., stadler, m., & wüstenberg, s. (2015). assessing complex problem-solving skills with multiple complex systems. thinking & reasoning, 21(3), 356-382. doi:10.1080/13546783.2014.989263 10.11588/jddm.2019.1.69302 jddm | 2019 | volume 5 | article 13 | 2 https://doi.org/10.11588/jddm.2019.1.69302 https://doi.org/10.3389/fpsyg.2017.01739 http://10.1007/s10648-019-09492-2 http://dx.doi.org/10.1016/j.intell.2014.02.004 http://www.verlag-hanshuber.com/ http://www.verlag-hanshuber.com/ http://dx.doi.org/10.1080/13546783.2014.989263 https://doi.org/10.11588/jddm.2019.1.69302 stadler & greiff: quo vadis cps? greiff, s., wüstenberg, s., & avvisati, f. (2015). computergenerated log-file analyses as a window into students’ minds? a showcase study based on the pisa 2012 assessment of problem solving. computers & education, 91, 92-105. doi:10.1016/j.compedu.2015.10.018 oecd. (2010). pisa 2012 assessment and analytical framework: mathematics, reading, science, problem solving and financial literacy. paris: oecd. doi:10.1787/9789264190511-6-en schoppek, w., & fischer, a. (2017). common process demands of two complex dynamic control tasks: transfer is mediated by comprehensive strategies. frontiers in psychology, 8, 2145. doi:10.3389/fpsyg.2017.02145 stadler, m., fischer, f., & greiff, s. (2019). taking a closer look: an exploratory analysis of successful and unsuccessful strategy use in complex problems. frontiers in psychology, 10. doi:10.3389/fpsyg.2019.00777 10.11588/jddm.2019.1.69302 jddm | 2019 | volume 5 | article 13 | 3 https://doi.org/10.1016/j.compedu.2015.10.018 http://dx.doi.org/10.1787/9789264190511-6-en https://doi.org/10.3389/fpsyg.2017.02145 https://doi.org/10.3389/fpsyg.2019.00777 https://doi.org/10.11588/jddm.2019.1.69302 opinion complex problem solving research and its contribution to improving work in high reliabilty organisations annette kluge1 1ruhr-universität bochum, germany. research on complex problem solving (cps) has reached a stage where certain standards have been achieved, whereas the future development is quite ambiguous. in this situation, the editors of the journal of dynamic decision making asked a number of representative authors to share their point of view with respect to seven questions about the relevance of (complex) problem solving as a research area, about the contribution of laboratory-based cps research to solving real life problems, about the roles of knowledge, strategies, and intuition in cps, and about the existence of expertise in cps. why should there continue to be problem solving research (in addition to research on memory, decision-making, motivation etc.)? cps research is a very relevant bridge between basic re-search and applied research, for example in highly complex working contexts, so called high reliability organizations (hros). hros include organizations such as nuclear power plants, petro-chemical and pharmaceutical plants, hospitals, air traffic management, airline operation, disaster and crisis management by first responders, etc. in hros, all single cps research aspects such as memory, decision making, building mental models, etc. need to be conjointly applied in and transferred to acute problems in situ, for example, incidents and developing accidents, to mitigate risks and hazards for people and the environments. and each hro is a unique field for cps research. central aspects such as non-transparency, dynamics, interconnectivity need to be analyzed related to a specific work context, e.g. operations in air traffic management differ a lot from operations in a chemical plant. in that respect cognitive task analysis methods can be applied to elaborate on the particular quality of dynamics, non-transparency, interconnectivity, etc. for each operator. what are the connections between current cps research practice and real problems? where do you see potential for development towards stronger relations? as introduced above, working and operating in hros is the best example where cps research demonstrates its direct impact on safety and the mitigation of hazards. safety culture intervention, safety management systems and safety training for employees, supervisors and the management are directly affected by cps research and builds on the results of cps research. additionally, cps research results are very helpful for personnel selection (e.g. which cognitive abilities are extremely important in that particular work context that cannot be trained?), and for training (e.g. which knowledge, skills and attitudes [ksas] need to be trained?) given the artificiality of the laboratory situation, do participants really adopt the presented problems? what insights can be gained despite this artificiality and which cannot? cp are not artificial – they are directly taken from real life affordances. in hros such as airline management, chemical plant operations, nuclear facility management simulator training is essential. to give an example, look at the work of a control room operator (crop) who is operating a chemical plant (see kluge et al., 2014 and kluge, 2014) and who is interacting with a field operator (fop). the daily work scenarios are trained in high fidelity simulator exercises that are not artificial, because they mirror 1:1 the real situation (kluge et al., 2009): couplings and interconnections require the operator to simultaneously process the interplay of cross-coupled variables in order to either assess a process state or predict the dynamic evolution of the plant. dynamic effects require the operator to mentally process and envisage the change rates of cross coupled variables and to develop sensitivity for the right timing of decisions in order to be successful. non-transparency requires the operator to work with more or less abstract visual cues that need to be composed into a mental representation and need to be compared with the operator’s mental model. multiple or conflicting goals require the operators either to balance management intentions or to decide on priorities corresponding author: annette kluge, ruhr-universität bochum, department of work and organizational psychology, universitätsstr. 150, ib 5/185 gebäude-postfach 66, 44780 bochum, germany. e-mail: annette.kluge@rub.de 10.11588/jddm.2019.1.69295 jddm | 2019 | volume 5 | article 6 | 1 mailto: mailto: https://doi.org/10.11588/jddm.2019.1.69295 kluge: complex problem solving research in case of goal conflicts in the decision making process (e.g. which course of action to take). comprehension of mpc (model predictive control) and rto (real time optimization) philosophies and making sure that crops understand the advanced control and optimization philosophies that are at the basis of mpc and rto, since they have to validate the proposed results before accepting/rejecting their implementation in the on-line control strategy model predictive control (mpc)/ real-time optimization (rto). crew coordination complexity incorporates small crews, for example, crops, fops and supervisors, who are responsible for overall system operations and calls for the operators to concurrently interact with team members in order to orchestrate individual actions into a coordinated flow of actions to either assess the situation or choose a course of action. what evidence exists for the influence of other kinds of knowledge besides structural knowledge on the results of cps? which of these kinds of knowledge should be examined in future research? in applied research in hros, situational awareness is the key knowledge-related construct to focus on and should be under investigation. think of an air traffic controller: situation awareness includes • the knowing and awareness of the elements involved in a particular working context (e.g. planes and their different types, flight plans, weather conditions, special dates; is a politician visiting berlin and the air space is closed for other traffic at a certain time? is the plane of the politician accompanied by military planes?), • to anticipate and monitor the elements’ temporal changes and developments over time (who is flying where in which speed, altitude; are there problems with planes that are low on fuel; are there emergency landings because of a sick passenger; are planes delayed because of bad weather, etc.), • possible decision making processes that are necessary due to the temporal changes (how needs the air space to be “managed” today? is there a thunderstorm approaching? are there abnormal situations emerging?). i propose that in hros with regard to the dynamic effects of cp, knowledge about the temporal dynamics of the involved variables or elements (planes, pilots, passenger, weather, consequences of technical failure in a chemical plant) is essential. for example, in air traffic management, the air traffic controller needs to consider the speed of the planes and the direction they are heading to. but the speed of commercial airplanes for civil aviation is different from the speed of military aircrafts. as a military tactical controller, you need to be aware of the higher speed of the fighter jets and their objectives. what evidence is available for the impact of strategies (except votat) on the results of cps? which of these strategies should be examined more closely? it is known, that stress and its physiological consequences on information processing is very significant. there are several training strategies to mitigate and counteract the impact of acute stress on situation awareness and decision making processes. three trainings approaches seem promising: stress exposure training includes preparatory information about the impact of stress, training skills for maintaining attentional focus, practice of the acquired skills in a simulated stress environment (cosenzo et al., 2007; driskell & johnston, 2006) in order to maintain control of the stress response that would otherwise affect the situation awareness. decision skill training by pliske et al. (2001) addresses attentional control exercises to practice flexibility in scanning situations, for example, practice seeing and assessing cues and their associated patterns. mindfulness-training fosters a state of restful alertness to present-moment experience, stressful or not, in order to reduce stress reactivity and increase situational awareness (meland et al., 2015a; meland et al., 2015b). is there intuitive cps? i assume that persons who are acting in complex environments early in their life, who learn to fly a glider at the age of 16, who are apprentice in a chemical plant or alike, familiarize with the aspects of cp early. by directly experiencing dynamic effects, interconnectivity and nontransparency i assume that these persons become “intuitive” cp solvers in their particular domain. but at the same time this intuitive cps expertise is limited to their clearly defined profession, and the transfer to other domains is limited. what distinguishes experts in cps from laypersons? as i introduced above, expertise in cps relates to the situation awareness, the processing of dynamic changes and the consequences for decision making. experts in cps are highly trained, and have experienced a lot of routine and non-routine/critical situations in order to enhance their situation awareness. expertise in cps requires a very long and extensive training period, in the simulator and in “real” under the supervision of an experienced person. this is the reason why it takes many years to become an airline captain (“der lange weg nach vorne links”, rödig, 2000). you would not like to fly with a layperson. declaration of conflicting interests: the author declares she has no conflict of interests. author contributions: the author is completely responsible for the content of this manuscript. the abstract was added by the editors. 10.11588/jddm.2019.1.69295 jddm | 2019 | volume 5 | article 6 | 2 https://doi.org/10.11588/jddm.2019.1.69295 kluge: complex problem solving research handling editor: andreas fischer and wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: kluge, a. (2019). complex problem solving research and its contribution to improving work in high reliabilty organisations. journal of dynamic decision making, 5, 6. doi: 10.11588/jddm.2019.1.69295 published: 31 dec 2019 references cosenzo, k. a., fatkin, l. t., & patton, d. j. (2007). ready or not: enhancing operational effectiveness through use of readiness measures. aviation, space, and environmental medicine, 78, b96-106. driskell, j. e. & johnston, j. h. (2006). stress exposure training. in j.a.cannon-bowers & e. salas (eds.), making decisions under stress. implications for individual and team training (pp. 191217). washington dc: apa. doi:10.1037/10278-007 kluge, a. (2014). the acquisition of knowledge and skills for taskwork and teamwork to control complex technical systems. a cognitive and macroergonomics perspective. springer: dortrecht. doi:10.1007/978-94-007-5049-4 kluge, a., nazir, s. & manca, d. (2014). advanced applications in process control and training needs of field and control room operators, iie transactions on occupational ergonomics and human factors, 2:3-4, 121-136. doi:10.1080/21577323.2014.920437 kluge, a., sauer, j., schüler, k. & burkolter, d. (2009). designing training for process control simulators: a review of empirical findings and common practice. theoretical issues in ergonomic science, 10, 489-509. doi:10.1080/14639220902982192 meland, a., fonne, v., wagstaff, a., & pensgaard, a. m. (2015a). mindfulness-based mental training in a high-performance combat aviation population: a one-year intervention study and two-year follow-up. international journal of aviation psychology, 25, 4861. doi:10.1080/10508414.2015.995572 meland, a., ishimatsu, k., pensgaard, a. m., wagstaff, a., fonne, v., garde, a. h., & harris, a. (2015b). impact of mindfulness training on physiological measures of stress and objective measures of attention control in a military helicopter unit. international journal of aviation psychology, 25, 191-208. doi:10.1080/10508414.2015.1162639 pliske, r. m., mccloskey, m. j., & klein, g. (2001). decision skill training: facilitating learning from experiences. in e.salas & g. klein (eds.), linking expertise and naturalistic decision making (pp. 37-53). mahwah: lawrence erlbaum. rödig, r. (2000). der lange weg nach vorne links: teil 1 und 2. plochingen: düsendruck. 10.11588/jddm.2019.1.69295 jddm | 2019 | volume 5 | article 6 | 3 https://doi.org/10.1037/10278-007 https://doi.org/10.1007/978-94-007-5049-4 https://doi.org/10.1080/21577323.2014.920437 https://doi.org/10.1080/14639220902982192 https://doi.org/10.1080/10508414.2015.995572 https://doi.org/10.1080/10508414.2015.1162639 https://doi.org/10.11588/jddm.2019.1.69295 editorial promoting the growing field of dynamic decision making andreas fischer, daniel v. holt, and joachim funke department of psychology, heidelberg university anew journal is starting with this page, and we – theeditors – hope that this launch will be a successful one! before we start with the normal course of the editorial business, let us explain why we made the decision to start a new journal. most decisions in our everyday lives are part of dynamic decision making processes. they usually are not isolated acts but take place in a context, with a history of events leading up to the decision and a future unfolding after the decision has been taken shaping our options for later decisions. additionally our preferences about what we consider a desirable outcome may also change over time. it is this emphasis on agency – the effect our decisions have on a situation – and dynamics – the unfolding of a situation over time – that are the hallmarks of dynamic decision making. examples of dynamic decision making can be found virtually everywhere, be it scheduling a workday, managing a company, establishing a medical diagnosis, or complex political negotiations. we therefore note with pleasure that dynamic decision making (ddm) has recently become a quickly growing field of research in the behavioral sciences. while simple single-shot decision making has long been the staple of decision research and ddm was the exotic exception, we agree with other decision researchers that it may be time to reverse this view (cf. hertwig & erev, 2009). even our understanding of biases and fallacies in simple single-shot decision making may improve when considered from the more comprehensive ddm perspective. since the beginning of systematic empirical research on ddm about fourty years ago (e.g., dörner, 1975), it has evolved in many different niches of psychology and other disciplines, with fruitful contributions in areas such as experimental research (e.g., berry & broadbent, 1984; dutt, arló-costa, helzner, & gonzalez, 2014; funke, 1995; hertwig & erev, 2009; huber, wider, & huber 1997), cognitive modelling (e.g., busemeyer & johnson, 2004; gonzalez, lerch, & lebiere, 2003), training (e.g., kluge, 2008), problem solving (e.g., danner et al., 2011; fischer, greiff, & funke, 2012), assessment of real-world skills (e.g., dörner, 1986; fischer, greiff, wüstenberg, fleischer, & buchwald, 2015), education (e.g., klahr & dunbar, 2000), or political decision making (e.g.,verweij & thompson, 2006). in philosophy, dynamic choice even contributed to redefining what it may mean to be rational (e.g., hammond, 1976). a literature search in the database “web of science” for the phrase “dynamic decision making” reveals a steep exponential growth in number of publications (see figure 1) distributed over more than 80 journals. additionally, there are many different labels for research essentially investigating the same phenomenon. figure 1. growth of publications containing the phrase “dynamic decision making” in the “web of science” database between 1975 and 2014. seeing that dynamic decision making is a growing field of research without a dedicated platform for exchange we decided to start the journal of dynamic decision making (jddm) as an outlet for international research in this area. our aim is to offer a home for the growing number of publications that do not always neatly fit traditional disciplinary categories and to act as an exchange hub for the ddm community to share tools, results, and ideas. scope of jddm we are interested in the various kinds of decision making that are assembled under the umbrella term “dynamic decision making”. the defining features of dynamic decision making are: (1) decisions are made at multiple points in time, and (2) between decisions the environment may change as a result of previous decisions, or (3) the environment may change spontaneously as a result of autonomous processes (cf. busemeyer, 2002; edwards, 1962). by analogy to the physical environment, the mental prerequisites for a decision – e.g., the decision maker’s preferences – may also be subject to change between decisions. in summary, dynamic decision making refers to decision processes in a series of interdependent decisions at multiple points in time in an environment that may change substantially in between decisions. the main focus of jddm is the multidisciplinary and multi-methodological study of cognitive processes in dynamic decision making. we explicitly encourage research on different aspects of dynamic decision making and expect 10.11588/jddm.2015.1.23807 jddm | 2015 | volume 1 | article 1 | 1 http://dx.doi.org/10.11588/jddm.2015.1.23807 fischer et al.: promoting the growing field of dynamic decision making a wide range of research methods to be applied. dynamic decision making is a broad field of research with contributions from cognitive science, psychology, neuroscience, informatics, economics, or mathematics. while the focus of jddm is on cognitive processes in ddm, we welcome contributions from all disciplines, from classical behavioral research to mathematical driven simulation studies or theoretical analysis. jddm is open to all approaches as long as they follow scientific standards of objectivity, transparency, and reproducibility. journal policies in addition to offering the standard features of an academic journal, such as independent peer review and long-term archiving, jddm implements a range of additional policies. reproducibility. we want to foster transparency and reproducibility of research results and encourage our authors to publish research tools, code, and datasets to the extent possible. our digital platform (“open journal system”, ojs) offers a repository for different types of material that researchers may want to make accessible to their colleagues. the more material we share, the easier it becomes to replicate a study. we also explicitly encourage authors to submit replication studies as part of our commitment to ensure reproducible results. constructive peer review. each paper will be reviewed by at least two peer reviewers with a focus on quality (and without commercial interests). our instruction for reviewers is not to primarily find the weakest spots in a study but to make good papers even better and give advice for how to get the most out of the initial submission. together with our reviewers, we want to come to a quick decision about publication, revision, or rejection of a submitted paper. once a paper is accepted it will be published without delay. sustainable open access. the world of journal publishing is changing fast. open access has become an important advantage for authors: their articles are available freely around the world without a “paywall”. we try to run the journal without any fees for authors or readers, hosted by heidelberg university library (hul), a publicly funded institution. our publishing partner, heidelberg university library, grants availability of the service for the next 100 years (and because hul already exists for more than 625 years, we believe this is a credible promise). all published papers will be available online without access restrictions, and independent of commercial interests. at present, there is no fee for publishing in jddm, and we will try our best to always offer a publication option that is free of charge. the publication infrastructure is provided by hul and the editorial work and reviews are done by the editorial board and by volunteers. editorial board we are grateful that many colleagues and established researchers from the field of dynamic decision making followed our call to become member of the editorial board that will give advice on strategic issues and that support us in building a high-quality outlet for dynamic decision making. specifically, members of the editorial board will assist in the process of review and quality-control and they will hopefully help to attract interesting papers. of course, the editorial board may undergo substantial changes as the journal develops and over the years to come; new members may join and old members may leave the editorial board. we refer the interested reader to our journal website for the current editorial board. although this list is always a work in progress, it already shows the variety of research on dynamic decision making in terms of personal characteristics (senior researchers as well as younger ones; mixed with respect to gender and nationality), research themes (learning, knowledge, uncertainty, risk, failures, culture) and research methods (cognitive modeling, experiments, psychometric assessment). we are happy to have these friends and colleagues around us! hopefully others will join us in the future, supporting our initiative and helping us to produce a journal that brings together the community by means of interesting research on dynamic decision making. conclusion just in time for the 40th anniversary of the pioneering work of dörner (1975), who initiated computer-based research on dynamic decision making in complex environments in europe, we are glad to present the first issue of the journal of dynamic decision making (jddm) as an outlet for international research in this field. so, if you have an interesting data set, a good theory, or a convincing simulation study just waiting to get published, get in touch! declaration of conflicting interests: the authors declare they have no conflict of interests. author contributions: all authors contributed equally to the manuscript. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: fischer, a., holt, d. v., & funke, j. (2015). promoting the growing field of dynamic decision making. journal of dynamic decision making, 1, 1. doi:10.11588/jddm.2015.1.23807 published: 29 september 2015 references berry, d. c., & broadbent, d. e. (1984). on the relationship between task performance and associated verbalizable knowledge. quarterly journal of experimental psychology, 36(2), 209–231. doi:10.1080/14640748408402156 busemeyer, j. r. (2002). dynamic decision making. in n. j. smelser and p. b. bates (eds.), international encyclopedia of the social and behavioral sciences: methodology, mathematics and computer science (pp. 3903–3908). oxford: elsevier. 10.11588/jddm.2015.1.23807 jddm | 2015 | volume 1 | article 1 | 2 http://dx.doi.org/10.1080/14640748408402156 http://dx.doi.org/10.11588/jddm.2015.1.23807 fischer et al.: promoting the growing field of dynamic decision making busemeyer, j. r., & johnson, j. g. (2004). computational models of decision making. in d. j. koehler & n. harvey (eds.), blackwell handbook of judgment and decision making (pp. 133-154). oxford: blackwell. danner, d., hagemann, d., holt, d. v., hager, m., schankin, a., wüstenberg, s., & funke, j. (2011). measuring performance in dynamic decision making. reliability and validity of the tailorshop simulation. journal of individual differences, 32(4), 225–233. doi:10.1027/1614-0001/a000055 dörner, d. (1975). wie menschen eine welt verbessern wollten [how people wanted to improve a world]. bild der wissenschaft, 12, 48-53. dörner, d. (1986). diagnostik der operativen intelligenz [assessment of operative intelligence]. diagnostica, 32(4), 290-308. dutt, v., arlóo-costa, h., helzner, j., & gonzalez, c. (2014). the description–experience gap in risky and ambiguous gambles. journal of behavioral decision making, 27(4), 316-327. doi:10.1002/bdm.1808 edwards, w. (1962). dynamic decision theory and probabiiistic information processing. human factors, 4, 59–73. doi:10.1177/001872086200400201 fischer, a., greiff, s., & funke, j. (2012). the process of solving complex problems. journal of problem solving, 4(1), 19–42. doi:10.7771/1932-6246.1118 fischer, a., greiff, s., wüstenberg, s., fleischer, j., & buchwald, f. (2015). assessing analytic and interactive aspects of problem solving competency. learning and individual differences, 39, 172-179. doi:10.1016/j.lindif.2015.02.008 funke, j. (1995). experimental research on complex problem solving. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective (pp. 243–268). hillsdale, nj: erlbaum. gonzalez, c., lerch, j. f., & lebiere, c. (2003). instance-based learning in dynamic decision making. cognitive science, 27(4), 591–635. doi:10.1016/s0364-0213(03)00031-4 hammond, p. j. (1976). changing tastes and coherent dynamic choice. the review of economic studies, 43(1), 159-173. hertwig, r., & erev, i. (2009). the description–experience gap in risky choice. trends in cognitive sciences, 13(12), 517-523. doi:10.1016/j.tics.2009.09.004 huber, o., wider, r., & huber, o. w. (1997). active information search and complete information presentation in naturalistic risky decision tasks. acta psychologica, 95(1), 15-29. doi:10.1016/s0001-6918(96)00028-5 klahr, d., & dunbar, k. (2000). a paradigm for investigating scientific discovery in the psychological lab. in d. klahr (ed.), exploring science (pp. 41–61). boston, ma: mit press. kluge, a. (2008). what you train is what you get? task requirements and training methods in complex problemsolving. computers in human behavior, 24(2), 284–308. doi:10.1016/j.chb.2007.01.013 oecd (2014). pisa 2012 results: creative problem solving: students’ skills in tackling real-life problems (volume v). paris: oecd publishing. doi:10.1787/9789264208070-en verweij, m., & thompson, m. (eds.) (2006). clumsy solutions for a complex world. governance, politics and plural perceptions. new york, ny: palgrave macmillan. 10.11588/jddm.2015.1.23807 jddm | 2015 | volume 1 | article 1 | 3 http://dx.doi.org/10.1027/1614-0001/a000055 http://dx.doi.org/10.1002/bdm.1808 http://dx.doi.org/10.1177/001872086200400201 http://dx.doi.org/10.7771/1932-6246.1118 http://dx.doi.org/10.1016/j.lindif.2015.02.008 http://dx.doi.org/10.1016/s0364-0213(03)00031-4 http://dx.doi.org/10.1016/j.tics.2009.09.004 http://dx.doi.org/10.1016/s0001-6918(96)00028-5 http://dx.doi.org/10.1016/j.chb.2007.01.013 http://dx.doi.org/10.11588/jddm.2015.1.23807 opinion complex problem solving: a gem to study expertise, strategic flexibility, culture, and so much more; and especially to advance psychological theory c. dominik güss1 1university of north florida, usa. research on complex problem solving (cps) has reached a stage where certain standards have been achieved, whereas the future development is quite ambiguous. in this situation, the editors of the journal of dynamic decision making asked a number of representative authors to share their point of view with respect to seven questions about the relevance of (complex) problem solving as a research area, about the contribution of laboratory-based cps research to solving real life problems, about the roles of knowledge, strategies, and intuition in cps, and about the existence of expertise in cps. why should there continue to be problem solving research (in addition to research on memory, decision-making, motivation etc.)? one book title of sir karl popper’s books is: “all life isproblem solving” (popper, 1994). the most intense and crucial situations in life involve dealing with complex problems. thus, the topic complex problem solving is highly relevant both from a theoretical and applied perspective. what are the connections between current complex problem solving (cps) research practice and real problems? where do you see potential for development towards stronger relations? it is unclear what current cps practice is, since there are many different research studies on complex problem solving. many studies in the field of cps, however, use microworlds or computer-simulated complex problems to study cps. even the best simulations are simulations. thus, studying complex problems in the real world would be an interesting area for future research. especially case studies could help with further development of cps theory, i.e., the interaction of motivation, emotion, and cognition. such further development of cps theory would be highly desirable. given the artificiality of the laboratory situation, do participants really adopt the presented problems? what insights can be gained despite this artificiality and which cannot? brehmer and dörner (1993) showed exactly the strength of computer-simulated problems as a research methodology. this methodology allows the study of cps in the laboratory and gives some control representing at the same time problems that are dynamic, complex, and non-transparent – characteristics shared with complex problems in the real world. yes, some simulations are quite simple and artificial, but others that simulate hundreds of variables and take several hours are challenging. participants often are fully emerged emotionally, motivationally, and cognitively into these situations. although external validity is still an open question, studies with experts and novices show interesting differences in cps. what evidence is available for the impact of strategies (except votat) on the results of cps? which of these strategies should be examined more closely? research has shown and analyzed different cps strategies, for example votat (vary-one-thing-at-a-time, molnár & csapó, 2018; wüstenberg, stadler, hautamäki, & greiff, 2014), pulse (“setting all input variables to zero after an intervention and waiting a certain time”, schoppek & fischer, 2017), cautious versus proactive strategies, flexible versus rigid strategies (e.g., güss, tuason, & orduña, 2015). research has also shown errors occurring during cps. application of various strategies over time and strategic adaptation would be interesting topics for future research. is there intuitive cps? at first, one is tempted to say no, there is no intuitive cps, because per definition cps involves actions that go beyond corresponding author: c. dominik güss, department of psychology, university of north florida, jacksonville, fl 32225. e-mail: dguess@unf.edu 10.11588/jddm.2019.1.69296 jddm | 2019 | volume 5 | article 7 | 1 mailto: https://doi.org/10.11588/jddm.2019.1.69296 güss: a gem to study routine actions (dörner & funke, 2017, p. 6; funke, 2012) and involve higher-order cognitive processes. i would argue, however, that there is evidence for intuitive cps. any experience is stored in explicit and/or implicit memory. successful problem-solving behaviors are stored in memory as well and if a new situation is similar to situations encountered in the past, one might search for and execute such a stored cps behavior pattern or a slightly modified similar version of this cps behavior pattern. even if the situation is slightly different, people still execute stored cps behavior patterns, which sometimes leads to failure (e.g., “methodism”, dörner, 1996). the impact of intuitive cps can be seen in cross-cultural studies on cps (e.g., güss, 2011). people from different cultures approach cps situations differently; confronted with a novel, complex, and dynamic situation, people rely first on their previous knowledge and skills. cultural differences show that some are more cautious and seek information, others are more pragmatic and jump to making decisions; others first react emotionally and show these emotions. these seem to be culturally learned and adequate/acceptable patterns to react to novel and complex problems in different cultures. what distinguishes experts in cps from laypersons? the trivial first answer to this question is knowledge. experts have accumulated more knowledge than novices. most experts have engaged in “deliberate practice” for over 10 years or 10,000 hours. deliberate practice can be defined as “. . . engaging in practice activities assigned by a teacher with a clear, specific goal of improvement and where the practice activities provide immediate feedback and opportunities for repetitions to attain gradual improvements” (ericsson, 2014, p. 509). but knowledge alone does not make someone smart. in fact, expert knowledge can lead to foolish decisions, if someone simply applies a successful cps behavior “program” in a new situation, not realizing that the conditions have changed and that exactly in the new situation such “old” actions will lead to failure (e.g., methodism). regarding knowledge, research has shown a more web-like structure of stored knowledge of experts compared to a more cause-effect-like structure of knowledge of novices (reither, 1981). one study compared business experts and novices, namely business owners, business students, and psychology students (güss, devore edelstein, badibanga, & bartow, 2017) in the simulation chocofine (dörner & gerdes, 2003), where participants take the role of managers of a chocolate producing company. results showed that business owners explored the situation in more detail and adjusted their tactics better to the changes in situations compared to novices. they were more sensitive to “read” the new situations, to see the key changes, and to adapt their behaviors flexibly to these changes. in another study the effects of a cps training and self-reflection on cps performance was investigated (donovan, & güss, & naslund, 2015). participants learned about the steps of cps and filled out a survey on selfreflection. both training and self-reflection predicted performance. high self-reflection was related to more consistency in planning and decision making. thus experiences, self-reflection, and adaptive flexibility seem to be key characteristics of expertise. declaration of conflicting interests: the author declares he has no conflict of interests. author contributions: the author is completely responsible for the content of this manuscript. the abstract was added by the editors. handling editor: andreas fischer and wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: güß, c.d. (2019). complex problem solving: a gem to study expertise, strategic flexibility, culture, and so much more; and especially to advance psychological theory. journal of dynamic decision making, 5(7), 10.11588/jddm.2019.1.69296 published: 31 dec 2019 references brehmer b., & dörner d. (1993). experiments with computersimulated microworlds: escaping both the narrow straits of the laboratory and the deep blue sea of the field study. computers in human behavior, 9, 171–184. doi: 10.1016/07475632(93)90005-d dörner d. (1996). the logic of failure. new york, ny: holt. dörner, d., & funke, j. (2017). complex problem solving: what it is and what it is not. frontiers in psychology, 8, 1153. doi: 10.3389/fpsyg.2017.01153 dörner, d., & gerdes, j. (2003). schokofin. computerprogramm [choco fine. computer program]. institut für theoretische psychologie, universität bamberg, germany. donovan, s., güss, c. d., & naslund, d. (2015). improving dynamic decision making through training and self-reflection. judgment and decision making, 10, 284-295. ericsson, k. a. (2014). expertise. current biology, 24, 508-510. doi: 10.1016/j.cub.2014.04.013 funke, j. (2012). complex problem solving. in n. m. seel (ed.), encyclopedia of the sciences of learning (pp. 682-685). heidelberg: springer. doi: 10.1007/978-1-4419-1428-6_685 güss, c. d. (2011). fire and ice: testing a model on cultural values and complex problem solving. journal of cross-cultural psychology, 42, 1279-1298. doi: 10.1177/0022022110383320 güss, c. d., devore edelstein, h., badibanga, j., & bartow, s. (2017). comparing business experts and novices in complex problem solving. journal of intelligence, 5(2), 20. doi: 10.3390/jintelligence5020020 güss, c. d., tuason, m. t., & orduña, l. v. (2015). strategies, tactics, and errors in dynamic decision making. journal of dynamic decision making. doi: 10.11588/jddm.2015.1.13131 molnár, g., & csapó, b. (2018). the efficacy and development of students’ problem-solving strategies during compulsory schooling: logfile analyses. frontiers in psychology, 9:302. doi: 10.3389/fpsyg.2018.00302 10.11588/jddm.2019.1.69296 jddm | 2019 | volume 5 | article 7 | 2 https://doi.org/10.11588/jddm.2019.1.69296 https://doi.org/10.1016/0747-5632(93)90005-d https://doi.org/10.1016/0747-5632(93)90005-d https://doi.org/10.3389/fpsyg.2017.01153 https://doi.org/10.1016/j.cub.2014.04.013 https://doi.org/10.1007/978-1-4419-1428-6_685 https://doi.org/10.1177/0022022110383320 https://doi.org/10.3390/jintelligence5020020 https://doi.org/10.3390/jintelligence5020020 https://doi.org/10.11588/jddm.2015.1.13131 https://doi.org/10.3389/fpsyg.2018.00302 https://doi.org/10.11588/jddm.2019.1.69296 güss: a gem to study popper, k. (1994). alles leben ist problemlösen [all life is problem solving]. münchen: piper. reither, f. (1981). thinking and acting in complex situations: a study of experts’ behavior. simulation & games, 12(2), 125-140. doi: 10.1177/104687818101200202 schoppek, w., & fischer, a. (2017). common process demands of two complex dynamic control tasks: transfer is mediated by comprehensive strategies. frontiers in psychology, 8, 2145. doi: 10.3389/fpsyg.2017.02145 wüstenberg, s., stadler, m., hautamäki, j., & greiff, s. (2014). the role of strategy knowledge for the application of strategies in complex problem solving tasks. technology, knowledge and learning, 19, 127-146. doi: 10.1007/s10758-014-9222-8 10.11588/jddm.2019.1.69296 jddm | 2019 | volume 5 | article 7 | 3 https://doi.org/10.1177/104687818101200202 https://doi.org/10.3389/fpsyg.2017.02145 https://doi.org/10.1007/s10758-014-9222-8 https://doi.org/10.11588/jddm.2019.1.69296 opinion a flashlight on attainments and prospects of research into complex problem solving wolfgang schoppek1 1university of bayreuth, germany. research on complex problem solving (cps) has reached a stage where certain standards have been achieved, whereas the future development is quite ambiguous. in this situation, the editors of the journal of dynamic decision making asked a number of representative authors to share their point of view with respect to seven questions about the relevance of (complex) problem solving as a research area, about the contribution of laboratory-based cps research to solving real life problems, about the roles of knowledge, strategies, and intuition in cps, and about the existence of expertise in cps. why should there continue to be problem solving research (in addition to research on memory, decision-making, motivation etc.)? the virtue of problem solving as object of research liesin its integrative potential. problem solving is an activity that is characteristic of humans. as a form of action it involves the entire person: goals have their roots in personality, knowledge acquisition and use as well as thinking are important topics in cognitive science, just like motivation, self-regulation and emotion are in other areas of psychology. i am convinced that insight into the human mind can only be gained when the interactions of important subsystems are considered. moreover, problem solving research treasures an arsenal of methods that reminds psychologists that not everything is best investigated by way of large-scale studies, which suggest mistaking average effects or correlations for explanations, or even for theories. what are the connections between current cps research practice and real problems? where do you see potential for development towards stronger relations? currently, the mainstream of research about cps, revolving around the multiple (or minimal) complex systems test microdyn (e.g. greiff, fischer, et al., 2013), has little to do with real problems. consequently, there are claims about turning to more realistic microworlds (funke, 2014). while i am sympathetic to these claims and believe that cps research in the narrower sense should be more aware of research about naturalistic decision making (klein, 2008), i do not think that only high-fidelity simulations should be used (see question 5). however, i expect from authors that they state more thoroughly, with what research interest they conduct studies with specific microworlds. just stating that our world is increasingly complex and dynamic, and therefore we must study how persons deal with such systems, is not sufficient. in particular, i doubt that the requirement to explore a completely new and unknown system is very common in reality. given the artificiality of the laboratory situation, do participants really adopt the presented problems? what insights can be gained despite this artificiality and which cannot? i think that many of our participants do not adopt the presented problems as their own. and those who do so, often do not adopt them to a degree they would if they were real and personal problems. for example, i have never seen a participant confronted with the “tailorshop” who conducted exact cost analysis to fix a rational shirt price before starting the game. however, treating things lightly and trying to solve problems that are not existential on the quick appears typically human to me. it is just this transition from a halfhearted approach to immersion into a problem – and the conditions that support it – that can well be investigated via simulated complex problems. another research question that can be studied in laboratory situations refers to how persons reduce complexity. in “the logic of failure”, dörner (1996) has described a number of ways how this happens. for example, persons tend to search for a central variable to which they attribute excessive explanatory power. however, as the book title suggests, these observations focus on detrimental attempts to reduce complexity, which might be shifted to a more positive orientation in future research (see also osman, 2010, discussed under question 5). what evidence exists for the influence of other kinds of knowledge besides structural knowledge on the results of cps? which of these kinds of knowledge should be examined in future research? besides structural knowledge, the best-studied type is strategy knowledge, which i address under question 5. in the future, we should investigate the significance of corresponding author: wolfgang schoppek, university of bayreuth, 95440 bayreuth, germany. e-mail: wolfgang.schoppek@uni-bayreuth.de 10.11588/jddm.2019.1.69297 jddm | 2019 | volume 5 | article 8 | 1 mailto: https://doi.org/10.11588/jddm.2019.1.69297 schoppek: a flashlight on attainments and prospects knowledge about concepts from systems theory: exponential growth, saturation, properties of non-linear dynamics, such as self-organization, phase transitions, or attractors (schiepek & strunk, 2010). i believe that persons who are familiar with such concepts should be better at controlling complex dynamic systems, because these concepts are helpful for understanding the system at hand; and they are associated with hints about potential actions or caveats. simple examples of these are considering side effects or delayed effects. what evidence is available for the impact of strategies (except votat) on the results of cps? which of these strategies should be examined more closely? as stated above, i think that the best-studied type besides structural knowledge is strategy knowledge. an important class of exploration tactic whose role in cps has been rediscovered recently is observing the dynamics of a system without interventions, if necessary after a short impulse. i have named this tactic pulse (schoppek & fischer, 2017), but there are a number of other descriptions and demonstrations that this tactic is beneficial (beckmann, 1984; schoppek, 2002; notat: lotz, scherer, et al., 2017). however, like for votat, the area of application for pulse is narrow: exploring unknown systems. osman (2010) has discussed the necessity to reduce complexity when dealing with the characteristic uncertainty of complex dynamic control tasks, which clearly has a strategic aspect. in my opinion, the proposed monitoring and control framework is too abstract to derive specific strategies from it. although i doubt that we can find much generalizable evidence about the features of a promising strategy for reducing complexity, i consider it worthwhile to study the different ways of reducing complexity in specific domains. is there intuitive cps? at first glance, “intuitive cps” sounds like a contradiction. when we consider a problem being defined by a barrier that precludes direct goal achievement, and intuition as a solution that comes into mind without thinking, there is no intuitive cps. in other words, a task that can be accomplished without thinking is not a problem. however, a problem solving process can comprise varying portions of intuitive components. such components could be, for example, the execution of an exploration tactic, the recognition of a critical system status, the recognition of an opportunity for a certain intervention, or pondering the constraints of different input possibilities. gobet and chassy (2009) have presented a model of expert problem solving in chess that incorporates intuitive and analytic components and their interplay. these authors define intuition as “the rapid understanding shown by individuals, typically experts, when they face a problem” (gobet & chassy, 2009, p.151), and model it by the formation of a network of increasingly complex chunks. this approach might be fruitfully applied to cps. in addition, i estimate a dual-processing framework (evans, 2012) as useful for teasing apart intuitive and analytic components of problem solving (schoppek, 2020). what distinguishes experts in cps from laypersons? this question implies that there are experts in cps, which is not self-evident. greiff and his colleagues view cps as a general competence (greiff & martin, 2014), whereas tricot and sweller (2014) provided compelling evidence for the primacy of domain-specific knowledge (e.g. air-traffic controllers can hold enormous amounts of flight related information in working memory, but are not better at standard working memory tests). however, i think that there is knowledge about complex dynamic systems that can potentially be applied to a wide range of problem situations, regardless of their specific domain. these include the items addressed under question 4, but also the ability to recognize classes of systems. for example, a person who has understood the oscillating dynamic of the legendary “sugar factory” (berry & broadbent, 1984) and has explored predator-prey systems, is probably better prepared for dealing with oscillations in a new domain than a person who had none such experiences. additionally, experts have a large repertoire of strategies at their disposal (see question 5) and can execute many tactics almost automatically. apart from these knowledge related characteristics, i would expect that cps experts feel more appealed by a complex problem. they are more likely to perceive a failed problem solving attempt as challenging self-worth than laypersons. this hypothesis may contribute to explain the replicated finding that science students are better at controlling dynamic systems that are new to them than students of other majors (schoppek, 2004, 2020). it is also an example of the relevance of motivational, self-regulatory, and emotional processes for understanding (complex) problem solving. declaration of conflicting interests: the author declares he has no conflict of interests. author contributions: the author is completely responsible for the content of this manuscript. the abstract was added by the editors. handling editor: andreas fischer and wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: schoppek, w. (2019). ca flashlight on attainments and prospects of research into complex problem solving. journal of dynamic decision making, 5, 8. doi: 10.11588/jddm.2019.1.69297 published: 31 dec 2019 10.11588/jddm.2019.1.69297 jddm | 2019 | volume 5 | article 8 | 2 https://doi.org/10.11588/jddm.2019.1.69297 https://doi.org/10.11588/jddm.2019.1.69297 schoppek: a flashlight on attainments and prospects references berry, d. c., & broadbent, d. e. (1984). on the relationship between task performance and associated verbalizable knowledge. the quarterly journal of experimental psychology section a, 36(2), 209–231. https://doi.org/10.1080/14640748408402156 dörner, d. (1996). the logic of failure: recognizing and avoiding error in complex situations. new york, ny: basic books. evans, jonathan st. b.t. (2012). spot the difference: distinguishing between two kinds of processing. mind & society, 11(1), 121–131. retrieved from http://dx.doi.org/10.1007/s11299012-0104-2 funke, j. (2014). analysis of minimal complex systems and complex problem solving require different forms of causal cognition. frontiers in psychology, 5(739), 1–3. https://doi.org/10.3389/fpsyg.2014.00739 gobet, f., & chassy, p. (2009). expertise and intuition: a tale of three theories. minds and machines, 19(2), 151–180. retrieved from http://dx.doi.org/10.1007/s11023-008-9131-5 greiff, s., fischer, a., wüstenberg, s., sonnleitner, p., brunner, m., & martin, r. (2013). a multitrait-multimethod study of assessment instruments for complex problem solving. intelligence, 41(5), 579–596. retrieved from http://dx.doi.org/10.1016/j.intell.2013.07.012 greiff, s., & martin, r. (2014). what you see is what you (don’t) get: a comment on funke’s (2014) opinion paper. frontiers in psychology, 5(1120). https://doi.org/10.3389/fpsyg.2014.01120 klein, g. (2008). naturalistic decision making. human factors, 50(3), 456–460. https://doi.org/10.1518/001872008x288385 lotz, c., scherer, r., greiff, s., & sparfeldt, j. r. (2017). intelligence in action – effective strategic behaviors while solving complex problems. intelligence, 64, 98–112. https://doi.org/10.1016/j.intell.2017.08.002 osman, m. (2010). controlling uncertainty: a review of human behavior in complex dynamic environments. psychological bulletin, 136(1), 65–86. https://doi.org/10.1037/a0017815 schiepek, g., & strunk, g. (2010). the identification of critical fluctuations and phase transitions in short term and coarsegrained time series—a method for the real-time monitoring of human change processes. biological cybernetics, 102, 197–207. https://doi.org/10.1007/s00422-009-0362-1 schoppek, w. (2002). examples, rules, and strategies in the control of dynamic systems. cognitive science quarterly, 2(1), 63–92. schoppek, w. (2004). teaching structural knowledge in the control of dynamic systems: direction of causality makes a difference. in k. d. forbus, d. gentner, & t. regier (eds.), proceedings of the 26th annual conference of the cognitive science society (pp. 1219–1224). mahwah, nj: lawrence erlbaum associates. schoppek, w. (2020). tut denken weh? überlegungen zur ökonomietendenz beim komplexen problemlösen. in k. viol & h. schöller (eds.), selbstorganisation ein paradigma für die humanwissenschaften? (373-388). berlin [u.a.]: springer. schoppek, w., & fischer, a. (2017). common process demands of two complex dynamic control tasks: transfer is mediated by comprehensive strategies. frontiers in psychology, 8, 2145. https://doi.org/10.3389/fpsyg.2017.02145 tricot, a., & sweller, j. (2014). domain-specific knowledge and why teaching generic skills does not work. educational psychology review, 26(2), 265–283. https://doi.org/10.1007/s10648013-9243-1 10.11588/jddm.2019.1.69297 jddm | 2019 | volume 5 | article 8 | 3 https://doi.org/10.11588/jddm.2019.1.69297 editorial supporting open access publishing in the field of dynamic decision making wolfgang schoppek1, andreas fischer2, joachim funke3, daniel v. holt3, and alexander nicolai wendt4 1chair of psychology, university of bayreuth, germany, 2research institute for vocational education and training (f-bb), nuremberg, germany, 3department of psychology, heidelberg university, germany, and 4department of human sciences, university of verona, italy in contrast to the successful previous year, 2020 turnedout to be difficult, not only for the earth’s population due to covid-19 but also for jddm with an unusually small sixth volume. looking back at these two very different years back-to-back led us to some reflection: as the covid-19 pandemic forcefully illustrates, dynamic decision-making (ddm) with all its complications and uncertainty is a topic of high relevance for modern societies. how can decision science best contribute to enhance the understanding of such situations? what is the role of a journal like jddm for the research community? and how should we, as editors, adjust the scope and processes of the journal to serve the needs of the community? we will first take a quick look back at the 2019 and 2020 volumes and then outline how we intend to develop the journal in years to come with a new editor-in-chief, wolfgang schoppek. looking back at volumes 2019 and 2020 in the “seven questions project” that was part of the fifth volume of jddm (2019) we sent a set of questions to a number of researchers active in the field of complex problem solving (cps). the questions covered the relevance of (complex) problem solving as a research area, the contribution of laboratory-based cps research to solving real life problems, the roles of knowledge, strategies, and intuition in cps, and the existence of expertise in cps. a brief review and summary of the answers we have obtained from eight authors or teams will appear in the current volume of jddm. in the sixth volume of jddm (2020), romy müller and leon urbas report the results of an experiment the authors had devised to explore the applicability of psychological theories about stability vs. flexibility in decision making in a simulated modular chemical plant. they found that most participants applied a satisficing strategy and showed sequence effects – but in the opposite direction from what was predicted! the conceptual basis (adapt vs. exchange) resembles the explore exploit dichotomy that has been studied extensively in decision-making research (e.g. osman, glass, & hola, 2015). this work is a good example of how the creative extension of standard paradigms can challenge and ultimately enhance psychological theories. as the questions for this research originated in practical problems, it demonstrates the relevance of studying ddm in settings close to reality. similarly, jason harman, claudia gonzalez-valejjo, and jeffrey b. vancouver extended the sunk cost paradigm in making it dynamic. they created a repeated choice scenario where participants learned sunk costs over time. in three experiments, they showed that “the sunk cost fallacy depends on the relative a priori importance of the goal being invested in” (harman et al., 2020, p. 1). escalation of commitment only occurred when the sunk cost domain was more important than alternatives. this extension of the paradigm on the time dimension can give valuable impulses for future research. the contribution of alexander nicolai wendt also aims at extending the narrow scope of laboratory research. in his article, he points to the potential of video material available in the www (e.g. live streaming), that allows “a fairly new access to ecologically valid and unobtrusive observation of problem-solving and decision-making processes” (wendt, 2020, p. 1). he reflects on the epistemological and methodological foundations that need to be considered when trying to exploit those data sources. one important implication of his considerations is that we would do well to remember the importance of qualitative methods for a range of research questions. organizational changes and constants the founding editor in chief of jddm, andreas fischer, has dedicated much time and work to building the journal. after such an intensive phase, we are sympathetic to his wish for stepping back and facing new challenges. we thank andreas for his work and are glad that we still have him in our editorial team. wolfgang schoppek (university of bayreuth), who has joined the journal in 2019, has now taken over as editor in chief from andreas fischer. wolfgang is interested in cognitive modeling and has been working in the field of complex problem solving for many years. moreover, we welcome alexander nicolai wendt as a new member of our editorial team. alexander’s work bridges the gap between psychology and philosophy, with a particular emphasis on phenomenological approaches and the validity of laboratory studies in problem solving and decision making research (e.g., wendt, 2020). we take pride in the fact that jddm has always been and will continue to be a “diamond” open access journal run entirely by volunteers and neither charges fees from authors nor from readers. instead, the “price” that we ask from authors in jddm is that they try to produce rigorous scientific work to the best of their ability in line with the scientific standards in their field and the principles of good scientific practice in general. in this spirit, jddm has now officially adopted an ethics code of conduct based on the committee on publication ethics (cope) best practice guidelines for scientific journal editors. these guidelines 10.11588/jddm.2021.1.82929 jddm | 2021 | volume 7 | article 1 | 1 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/69326 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/69326 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/71968 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/71968 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/69769 https://doi.org/10.11588/jddm.2021.1.82929 schoppek et al.: supporting open access publishing must be adhered to by all parties involved in the publication process: authors, editors, reviewers and publishers. the guidelines set the frame for ensuring independence, confidentiality, fairness, and participation, as well as scientific integrity and transparency at all stages of the publication process. one new element will be the routine check for plagiarism, another that all authors must now explicitly state they have contributed substantially to the paper and approved the final version of the paper for publication. you can find the complete guidelines on our website. future directions for jddm the covid-19 pandemic has reinforced our conviction that dynamic decision is a cognitive and social process of great importance to society. it has also illustrated that decision-making in the real world does not stop at the individual psychological level but typically takes place in complex social and political contexts. for example, epidemiologists may have a useful mathematical model to simulate the spread of the corona virus sars-cov-2 with clear implications for effective behavioral counter measures. however, if societal factors (how acceptable are different counter measures in the population?), economic factors (who may lose their jobs because of lock-downs?), and political factors (unpopular measures are less likely to be implemented by politicians aiming to get re-elected) are not incorporated, a model just of the core decision domain will not lead to effective decisions. multi-disciplinarity and multiple methods such deliberations reassure our original intention that “the main focus of jddm is the multidisciplinary and multi-methodological study of cognitive processes in dynamic decision making” (fischer, holt, & funke, 2015, p. 1). reconsidering, we would like to complement that statement with “social processes”. emphasizing multi-disciplinarity and multi-methodology does not mean that psychological or quantitative work becomes less important in jddm. instead, we want to widen the scope of jddm, including contributions from cognitive science, economics, philosophy, political science, psychology, operations research, management studies, sociology, and other fields of research (e.g., human factors engineering, education). these fields have different methodological traditions. hence, it is important to emphasize a critical view in order to maintain the standard of publications. we welcome the strengths of different approaches to quantitative and qualitative research but do not hesitate to critically question their validity and relevance at the same time. the history of cognitive research on decisionmaking itself is closely related with various types of mixedmethods research, e.g., the use of think-aloud protocols. if innovative contributions, whether descriptive or inferential, can help advance our field of research, jddm will support them. as striving for multi-disciplinarity entails the risk of becoming arbitrary, we feel the need to state some overarching principles as orientation for authors, editors, reviewers, and readers. having in mind that such principles need to be discussed and developed further, we suggest the following as a starting point: • we expect submitted work to be committed to principled argument and a stance of rationality. different levels of description and abstraction, which are characteristic for each academic discipline, should be recognized as equivalent. we welcome critical positions as long as they are based on fair argumentation. • methods should be used to answer research questions and are not deployed as an end in itself. that holds for quantitative and qualitative methods alike. researchers should be careful not to uncritically adhere to particular methods because they are common in their field of research (dörner, 1996). • authors should strive for a good connection between theory and data. this appears self-evident, but unfortunately, it is not. simply referring to an empiristic research tradition is not enough. similarly, theoretical contributions should pursue a high level of conceptual clarity and (where appropriate) identify their relation to empirical approaches. jddm encourages multi-disciplinary dialogue by discussing the foundational issues of research on decisions and decision-making. the conceptual basis of investigations is of great importance not only for hypotheses generation but also for the interpretation of empirical findings. hence, jddm wishes to encourage a conceptual discourse that refers to psychological traditions of theory and tries to bring them into dialogue. in this vein, we will soon issue a call-for-papers that targets this context and aims to revitalize the theoretical foundations of the field as well as to incentivize innovations. increasing worldwide contributions the covid-19 pandemic showed that dynamic decisionmaking is required all over the world. the same observation holds true for the even more complex and threatening problem of climate change. if ddm is international, should ddm research not be the same? while jddm features contributions from a range of different locations, articles by authors from englishand german-speaking countries have clearly been the majority so far. we will therefore work harder on making the contributorship more diverse not only in terms of scientific disciplines covered but also with regard to a more complete representation from researchers from all over the world. to facilitate this mission, we will continue to charge no article processing fees for any contributions to jddm. summary 1) jddm is a diamond open access journal free to authors or readers that has clear scientific and ethical standards guiding the publication process and follows a model of “constructive peer review”. 2) jddm aims to be a multi-disciplinary journal in decision science, with a focus on dynamic and complex real-world problems. this includes, among others, cognitive science, economics, philosophy, psychology, sociology, or human factors engineering. jddm accepts quantitative and qualitative empirical work as well as theoretical contributions with a clear focus on complex dynamic decision processes. 3) jddm is an international journal and encourages contributions by researchers from outside europe and the us. 10.11588/jddm.2021.1.82929 jddm | 2021 | volume 7 | article 1 | 2 https://doi.org/10.11588/jddm.2021.1.82929 schoppek et al.: supporting open access publishing declaration of conflicting interests: the authors declare they have no conflict of interests. author contributions: the editors of jddm have equally contributed to this editorial. editor-in-chief wolfgang schoppek has taken the responsibility of establishing the conceptual outline. handling editor: wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: schoppek, w., fischer, a., funke, j., holt, d. v., & wendt, a. n. (2021). supporting open access publishing in the field of dynamic decision making. journal of dynamic decision making, 7, 1. https://doi.org/10.11588/10.11588/jddm.2021.1.82929 published: 18.08.2021 references dörner, d. (1996). the logic of failure: recognizing and avoiding error in complex situations. new york, ny: basic books. fischer, a., holt, d., & funke, j. (2015). promoting the growing field of dynamic decision making. journal of dynamic decision making, 1, 1-3. doi: 10.11588/jddm.2015.1.23807 osman, m., glass, b., & hola, z. (2015). approaches to learning to control dynamic uncertainty. systems, 3, 211–236. doi: 10.3390/systems3040211 wendt, a. n. (2020) the problem of the task. pseudointeractivity as an experimental paradigm of phenomenological psychology. frontiers in psychology, 11, 855. doi: 10.3389/fpsyg.2020.00855 10.11588/jddm.2021.1.82929 jddm | 2021 | volume 7 | article 1 | 3 https://doi.org/10.11588/10.11588/jddm.2021.1.82929 http://dx.doi.org/10.11588/jddm.2015.1.23807 http://dx.doi.org/10.3390/systems3040213 http://dx.doi.org/10.3390/systems3040213 http://dx.doi.org/10.3389/fpsyg.2020.00855 http://dx.doi.org/10.3389/fpsyg.2020.00855 https://doi.org/10.11588/jddm.2021.1.82929 editorial web-scraping the jddm database: citations, reads and downloads. andreas fischer, daniel v. holt, and joachim funke department of psychology, heidelberg university the journal of dynamic decision making has justlaunched its fourth volume by the end of 2018. in previous editorials we reflected on the definition of dynamic decision making (fischer, holt & funke, 2015), on the journals’s content (fischer, holt, & funke, 2016) and on related research topics (fischer, holt & funke, 2017). today, we want to take the opportunity of writing our fourth editorial to reflect on the citations, reads, and downloads of papers published in the journal of dynamic decision making, given that four years have passed since we started publishing.1 current citation profile readers interested in the citation profile of the journal of dynamic decision making may refer to the journal’s google scholar account (user 9tt0ojaaaaaj). at the moment of writing this editorial, google scholar lists 92 citations over the last four years. figure 1 visualizes how different articles published in the journal of dynamic decision making contribute to this number of citations. 16 x kretzschmar & süß 2015 12 x güß et. al. 2015 12 x fischer & neubert 2105 9 x editorial 2015 9 x hundertmark et al. 2015 7 x dutt 2015 6 x engelhart et al. 2017 5 x wendt 2017 4 x gonzalez et al. 2016 3 x vangsnes et al. 2017 3 x editorial et al. 2016 2 x frank & kluge 2017 2 x sharma et al. 2017 2 x rohe wt al. 2016 figure 1. citations of jddm articles listed by google scholar until the end of 2018. current download profile the numbers of online reads and article downloads can be retrieved online for each article published in the journal of dynamic decision making. figure 2 and 4 visualize how the continuous growth in reads and downloads over the years is composed by the reads and downloads of different articles. with regard to reads, some papers seem to be increasingly recognized over the years (e.g. rohe et al., 2016, kretzschmar et al., 2015), whereas the number of full-text downloads tends to stabilize or decrease over the years for most papers. however, the journal of dynamic decision making is still a young journal and we are looking forward to seeing how these numbers will develop over time in the years to come (especially with regard to the more recent volumes). figure 3 and 5 demonstrate how the reads and downloads of papers published in the journal of dynamic decision making have accumulated since the journal started publishing in semptember 29th, 2015. as figure 3 shows, some papers have accumulated reads more rapidly than others (especially since march 2018, e.g., kretzschmar et al., 2015, rohe et al., 2016, sharma et al. 2017, berisha et al., 2018). with regard to article downloads the accumulation tends to be slower (see figure 5) and it seems to be more uniformly over time (with some noticable peaks right after a paper’s publication, e.g., vangsness et al. 2017). again we have to keep in mind, that up to now the journal’s history barely covers a time span of four years, and we are really looking forward to monitoring the journal’s growth over time. some papers have managed to accumulate a comparatively high number of citations, reads and downloads (e.g., kretzschmar et al., 2015, fischer & neubert, 2015 are among the top-5 in each category) and other more recent papers also show promising patterns with regard to reads and downloads (e.g., rohe et al., 2016, sharma et al. 2017) and we are looking forward to monitoring their future reception within the research community. 1 analyses were done on december 30th by the first author using r (version 3.5.1) and the r software packages stringr (version 1.3.1) and packcircles (version 0.3.3). code for web-scraping the jddm-database is available on the first author’s github account andreasfischer1985. 10.11588/jddm.2018.1.57846 jddm | 2018 | volume 4 | article 4 | 1 https://scholar.google.de/citations?hl=de&user=9tt0ojaaaaaj https://scholar.google.de/citations?hl=de&user=9tt0ojaaaaaj https://github.com/andreasfischer1985 https://doi.org/10.11588/jddm.2018.1.57846 fischer et al.: web-scraping jddm 2015 2016 2017 2018 cleotilde gonzalez et al. (2018) karlijn van den broek (2018) editoral (2017) cleotilde gonzalez et al. (2016) natassia goode et al. (2016) barbara frank et al. (2017) lisa vangsness et al. (2017) cleotilde gonzalez et al. (2016) jan hundertmark et al. (2015) editoral (2016) michael engelhart et al. (2017) alexander nicolai wendt (2017) editoral (2015) gentrit berisha et al. (2018) c. dominik güss et al. (2015) neha sharma et al. (2017) andreas fischer et al. (2015) varun dutt et al. (2015) miriam sophia rohe et al. (2016) andré kretzschmar et al. (2015) article reads per year 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 441 1917 2948 4122 figure 2. reads since the journal’s launch in 2015. 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 accumulation of article reads over time cu m u la te d r e a d s 20 15 20 16 ja n 20 17 fe b 20 17 m ar 2 01 7 ap r 2 01 7 m ay 2 01 7 ju n 20 17 ju l 2 01 7 au g 20 17 se p 20 17 o ct 2 01 7 no v 20 17 de c 20 17 ja n 20 18 fe b 20 18 m ar 2 01 8 ap r 2 01 8 m ay 2 01 8 ju n 20 18 ju l 2 01 8 au g 20 18 se p 20 18 o ct 2 01 8 no v 20 18 de c 20 18 andré kretzschmar et al. (2015) miriam sophia rohe et al. (2016) varun dutt et al. (2015) andreas fischer et al. (2015) neha sharma et al. (2017) c. dominik güss et al. (2015) gentrit berisha et al. (2018) editoral (2015) alexander nicolai wendt (2017) michael engelhart et al. (2017) editoral (2016) jan hundertmark et al. (2015) cleotilde gonzalez et al. (2016) lisa vangsness et al. (2017) barbara frank et al. (2017) natassia goode et al. (2016) cleotilde gonzalez et al. (2016) editoral (2017) karlijn van den broek (2018) cleotilde gonzalez et al. (2018) figure 3. accumulation of reads since the journal’s launch in 2015. 10.11588/jddm.2018.1.57846 jddm | 2018 | volume 4 | article 4 | 2 https://doi.org/10.11588/jddm.2018.1.57846 fischer et al.: web-scraping jddm 2015 2016 2017 2018 cleotilde gonzalez et al. (2018) karlijn van den broek (2018) editoral (2017) cleotilde gonzalez et al. (2016) natassia goode et al. (2016) barbara frank et al. (2017) neha sharma et al. (2017) editoral (2016) cleotilde gonzalez et al. (2016) jan hundertmark et al. (2015) michael engelhart et al. (2017) varun dutt et al. (2015) alexander nicolai wendt (2017) gentrit berisha et al. (2018) miriam sophia rohe et al. (2016) c. dominik güss et al. (2015) editoral (2015) andreas fischer et al. (2015) lisa vangsness et al. (2017) andré kretzschmar et al. (2015) article downloads per year 0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 293 1673 2405 2670 figure 4. downloads since the journal’s launch in 2015. 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 accumulation of article downloads over time cu m u la te d d o w n lo a d s 20 15 20 16 ja n 20 17 fe b 20 17 m ar 2 01 7 ap r 2 01 7 m ay 2 01 7 ju n 20 17 ju l 2 01 7 au g 20 17 se p 20 17 o ct 2 01 7 no v 20 17 de c 20 17 ja n 20 18 fe b 20 18 m ar 2 01 8 ap r 2 01 8 m ay 2 01 8 ju n 20 18 ju l 2 01 8 au g 20 18 se p 20 18 o ct 2 01 8 no v 20 18 de c 20 18 andré kretzschmar et al. (2015) lisa vangsness et al. (2017) andreas fischer et al. (2015) editoral (2015) c. dominik güss et al. (2015) miriam sophia rohe et al. (2016) gentrit berisha et al. (2018) alexander nicolai wendt (2017) varun dutt et al. (2015) michael engelhart et al. (2017) jan hundertmark et al. (2015) cleotilde gonzalez et al. (2016) editoral (2016) neha sharma et al. (2017) barbara frank et al. (2017) natassia goode et al. (2016) cleotilde gonzalez et al. (2016) editoral (2017) karlijn van den broek (2018) cleotilde gonzalez et al. (2018) figure 5. accumulation of downloads since the journal’s launch in 2015. 10.11588/jddm.2018.1.57846 jddm | 2018 | volume 4 | article 4 | 3 https://doi.org/10.11588/jddm.2018.1.57846 fischer et al.: web-scraping jddm announcements after many years of research on complex problem solving and dynamic decision making joachim funke, founding member of the journal of dynamic decision making, will retire. in his stead, wolfgang schoppek from the university of bayreuth will leave the editorial board and join our editorial team. we would like to thank all our reviewers and this year’s guest editor anna-lena schubert, who did a great job in improving the quality of submissions: • jens beckmann (durham university, united kingdom), • dietrich dörner (university of bamberg, germany), • helen fischer (heidelberg university, germany), • joachim funke (heidelberg university, germany), • cleotilde gonzalez (carnegie mellon university, united states), • annette kluge (university of bochum, germany), • andré kretzschmar (university of tuebingen, germany) • stephan kröner (university of erlangen-nuremberg, germany), • wolfgang schoppek (bamberg university, germany), • anna-lena schubert (guest editor from heidelberg university, germany), • david a. tobinski (university of duisburg-essen, germany). we also would like to thank the members of our editorial board for their ongoing support of the journal of dynamic decision making. • jens f. beckmann (durham university, united kingdom), • benő csapó (university of szeged, hungary), • dietrich dörner (university of bamberg, germany), • cleotilde gonzalez (carnegie mellon university, united states), • c. dominik güss (university of north florida, united states), • oswald huber (université de fribourg, switzerland), • annette kluge (university of bochum, germany), • stephan kröner (university of erlangen-nuremberg, germany), • gyöngyvér molnár (university of szeged, hungary), • magda osman (queen mary university of london, united kingdom), • wolfgang schoppek (university of bayreuth, germany), • david a. tobinski (university of duisburg essen, germany). finally, we would like to encourage all researchers working in the field of dynamic decision making to contribute their work to our journal be it original research, theoretical contributions, replication studies or opinion articles. there are many reasons to choose the journal of dynamic decision making (jddm) as your outlet, such as • the short time between submission, peer review, and final publication; • currently (and hopefully in the future) no article processing fees; • open-access to your articles for researchers worldwide; • the possibility to add supplementary material; • easy citation and direct access to articles with digital object identifiers (dois); and • high visibility of published articles through listing in the directory of open access journals (doaj) and google scholar. moreover, heidelberg university library as the host of jddm has sustainably operated for more than 625 years. we are therefore confident that articles published in jddm will remain freely accessible for decades (if not centuries) to come! declaration of conflicting interests: the authors declare they have no conflict of interests. author contributions: the first author did the analyses and wrote most parts of the manuscript. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: fischer, a., holt, d. v., & funke, j. (2018). web-scraping the jddm database: citations, reads and downloads. journal of dynamic decision making, 4, 4. doi: jddm.2018.1.57846 published: 31 dec 2018 10.11588/jddm.2018.1.57846 jddm | 2018 | volume 4 | article 4 | 4 https://doi.org/10.11588/jddm.2018.1.57846 https://doi.org/10.11588/jddm.2018.1.57846 fischer et al.: web-scraping jddm references berisha, g., pula, j. s., & krasniqi, b.(2018). convergent validity of two decision making style measures. journal of dynamic decision making, 4, 1. doi:10.11588/jddm.2018.1.43102 fischer, a., holt, d. v., & funke, j. (2015). promoting the growing field of dynamic decision making. journal of dynamic decision making, 1, 1. doi:10.11588/jddm.2015.1.23807 fischer, a., holt, d. v., & funke, j. (2016). the first year of the journal of dynamic decision making. journal of dynamic decision making, 2, 1. doi:10.11588/jddm.2016.1.28995 fischer, a., holt, d. v., & funke, j. (2017). looking back at the third volume of the journal of dynamic decision making. journal of dynamic decision making, 3, 6. doi:10.11588/jddm.2017.1.43868 fischer, a. & neubert, j.c. (2015). the multiple faces of complex problems: a model of problem solving competency and its implications for training and assessment. journal of dynamic decision making,1, 6. doi: 10.11588/jddm.2015.1.23945 kretzschmar, a. & süß, h.-m. (2015). a study on the training of complex problem solving competence. journal of dynamic decision making, 1, 4. doi:10.11588/jddm.2015.1.15455 rohe, m. s., funke, j., storch, m., & weber, j. (2016). can motto-goals outperform learning and performance goals? influence of goal setting on performance and affect in a complex problem solving task. journal of dynamic decision making, 2, 3. doi:10.11588/jddm.2016.1.28510 sharma, n., & dutt, v. (2017). modeling decisions from experience: how models with a set of parameters for aggregate choices explain individual choices. journal of dynamic decision making, 3, 3. doi:10.11588/jddm.2017.1.37687 vangsness, l. & young, m. e. (2017). the role of difficulty in dynamic risk mitigation decisions. journal of dynamic decision making, 3, 5. doi:10.11588/jddm.2017.1.41543 10.11588/jddm.2018.1.57846 jddm | 2018 | volume 4 | article 4 | 5 http://dx.doi.org/10.11588/jddm.2018.1.43102 http://dx.doi.org/10.11588/jddm.2015.1.23807 http://dx.doi.org/10.11588/jddm.2016.1.28995 http://dx.doi.org/10.11588/jddm.2017.1.43868 http://dx.doi.org/10.11588/jddm.2015.1.23945 http://dx.doi.org/10.11588/jddm.2015.1.15455 http://dx.doi.org/10.11588/jddm.2016.1.28510 http://dx.doi.org/10.11588/jddm.2017.1.37687 http://dx.doi.org/10.11588/jddm.2017.1.41543 https://doi.org/10.11588/jddm.2018.1.57846 opinion the future of problem solving research is not complexity, but dynamic uncertainty magda osman1 and denis omar verduga palencia1 1queen mary university of london, united kingdom. research on complex problem solving (cps) has reached a stage where certain standards have been achieved, whereas the future development is quite ambiguous. in this situation, the editors of the journal of dynamic decision making asked a number of representative authors to share their point of view with respect to seven questions about the relevance of (complex) problem solving as a research area, about the contribution of laboratory-based cps research to solving real life problems, about the roles of knowledge, strategies, and intuition in cps, and about the existence of expertise in cps. why should there continue to be problem solving research (in addition to research on memory, decision-making, motivation etc.)? while it has a long and well established history, partic-ularly the european tradition of complex problem solving (here after cps) research, its presence in basic science research in cognitive psychology has been somewhat muffled by its stronger role in applied work. most notably, work on cps has advanced into applications to educational training, intelligence testing, and a variety of public and private sectors (e.g. automated systems, power plants, air traffic control, flight decks, medicine, data communications network management) (e.g., kluge, 2014; müller, & oehm, 2019; woods & hollnagel, 1987). before we go on to answer the questions posed in this special issue, it is worth looking back to put some historical context to some the issues we consider. close to 100 years ago in parker’s (1920a, 1920b, 1920c, 1920d) papers, he situates problem solving in the context of educational programmes being implemented in the late 1800’s to help students develop the necessary skills to think about a variety of real-world problems that they will face in adult life. several of the rhetorical questions parker posed are in the exact same vein as those presented in this special issue 100 years on. parker’s (1920a) states “a problem is a question involving doubt” (1920a, pp. 16), and “to maintain the state of doubt and to carry on systematic and protracted inquiry-these are the essentials of thinking” (1920d, pp. 258). for us, where the critical connection is between the distant past (parker, 1920a, 1920d) and recent past (osman, 2010a, 2010b), as well as the future for research on cps, is the word “doubt” which is essentially a synonym of “uncertainty”. we return to this point at the end of this introduction. the word complex in problem solving research has been an albatross around its neck, as the english expression goes. the burden of trying to agree on what is a complex problem and how to classify problems of different levels of complexity still follows the field around (e.g. liu & li, 2012; quesada, kintsch, & gomez, 2005; schoppek, kluge, osman, & funke, 2018), and there still doesn’t appear to be unity on this subject. what seems to be of key interest to communities beyond the study of cps, is the fact that this research field, like no other, got there first (e.g. toda, 1962) in trying to characterise the various conditions that we face in the real world when we coordinate a series of thoughts and actions over time to overcome a problem we hadn’t anticipated, or a problem for which there is no obvious single solution, or a problem for which there are no clearly defined goals, or a problem which cannot even be precisely defined. the common theme here is that, as parker had stated, they all present the problem solver with doubt. actually, we are now at a stage where we can say that that doubt can be more formally described as epistemic uncertainty – a lack of knowledge that the individual has, and aleatoric uncertainty – the inherent noisiness of the conditions (e.g. the weather system) of the problem (e.g., deciding where to direct resources for a future hurricane on the pacific coast). and, just as parker had asserted 100 years ago, a certain level of epistemic uncertainty is required during the problem solving process, because without it, we wouldn’t search for more information to inform how we can better solve the problem. thus, our position is a fairly strong one, which is that we champion the advances from the field of cps, but we achieve this by ignoring that it refers to itself as complex, instead we trade complex for uncertainty, actually in particular, dynamic uncertainty, which also happens to be a key feature of dörner and funke’s (2017) definition of cps. we also champion the advances from cps, by anchoring on a psychological process, decision-making under (dynamic) uncertainty, which actually refers to problem solving, and control-based decision-making, and judgement, and learning, and executive memory, etc... we don’t begrudge the value of these other mental activities because they are bound to what is needed to solving problems in complex dynamic worlds, but the field needs to move on with the times and engage with the topics that have much more to say about research in the cognitive sciences. for all the sophisticated modelling that goes on (for review see holt & osman, 2017) that tells us how to think about formally capturing the phenomena of interest, what the cognitive science community doesn’t have, that the cps community has in ample supply, is a history of expertise corresponding author: magda osman, queen mary university of london, mile end road london e1 4ns, united kingdom. email: m.osman@qmul.ac.uk 10.11588/jddm.2019.1.69300 jddm | 2019 | volume 5 | article 11 | 1 mailto: https://doi.org/10.11588/jddm.2019.1.69300 osman: the future of problem solving in devising clever paradigms and innovative measurement tools, and a deep understanding of the psychological intricacies of real world problem solving. one parting thought which is, again, inspired by parker’s (1920a) thinking, is that he understood the importance of the distinction between the value of solving problems individually and where group processes are likely to be important to problem solution. moreover, he spent time in several of his articles characterising the different types of social contexts in which problems arise. collaborative decision-making and problem solving is a serious matter of current interest in many research fields (e.g., behavioural economics, macroeconomics, social psychology, decision sciences, data sciences). here also we would advocate that the field of cps needs to be embedded in, and engage with researchers from these fields. this is because there is much for this field to contribute and advance our understanding of how groups behave and perform in real world “complex” problems – or as we might more comfortably say, dynamically uncertain decision-making contexts! we would propose that the cognitive sciences can’t operate without dedicated researchers examining how people solve problems, that is a given. there isn’t a need to disband the whole field, its value is obviously understood by many, but not by those that should draw from it (e.g., cognitive science/decision science/data science communities). as an illustration, we would say that our recent attempts to bridge computer science with cognitive science, is through the vehicle of cps – though as we said earlier, we call it dynamic decision-making in (dynamically) uncertain environments (verduga & osman, 2019a, 2019b). why have we done this? because we know that different disciplines can benefit from the insights made from cps, and we want to engage with computer science, and cognitive science, and so we have found a way to engage but at the expense of labelling what we do as cps. for instance, traditional artificial intelligence (ai) approaches to problem solving like the “general problem solver” (newell et al., 1959), intended to solve well-defined problems, with some limited success. artificial intelligence as a field was aware of this when their researchers looked at the work done in the cognitive sciences and found that to capture cognition required understanding how complex problems are solved. the focus initially was on trying to replicate behaviour in well-defined static tasks (propositional logic, chess, math puzzles, etc.). the advances using these tasks took ai only so far. ai found that research done using these microworlds as a test bed for training artificial agents, particularly in decision making, proved useful as a way to outperform humans in specific complex environments (robertsont & watson, 2014; vinyals et al, 2019). as a by-product of continuing the research in problem solving, by having a common playground in which human and artificial agents are acting, such as those developed in the field of cps, can help enrich psychology and ai, contributing to the design of artificial agents as well as computational modelling of human behaviour (leibo, 2018). what are the connections between current cps research practice and real problems? where do you see potential for development towards stronger relations? — given the artificiality of the laboratory situation, do participants really adopt the presented problems? what insights can be gained despite this artificiality and which cannot? yes, there has been work on problem solving in the lab, such as the early work of dörner’s, where participants would spend days solving problems in the lab. berry’s sugar factory task, while obviously not a realistic task showed us ways of understanding how people make decisions while interacting with a non-linear environment. these paradigms have been instrumental in raising important research questions that have also helped advanced the way we develop measurement tools to examine cps behaviour. there are also field studies, such as klein’s work notable examples, where problem solving was observed in the wild – naturalistic decision-making. one avenue of work (field study vs. lab study) is no less valid than the other. both strategies of studying problem solving provided important insights. for instance, in our recent work (verduga & osman, 2019a, 2019b), we developed a gaming environment where players were making dynamic decisions in an alien environment where they were interacting with other agents (competitors/threats). is this an artificial environment? yes. is there anything to be gained for this despite it being artificial? yes, obviously. how is this achieved? this is where theorising and modelling help to constrain the psychological phenomena of interest and allow for careful derivation of hypotheses that can be tested and characterised formally. moreover, in the study of virtual reality (millela, 2015; menelas & benaoudia, 2017), and more specifically the gamification of cps scenarios, illustrate how the artificiality of lab based cps studies are mitigated by demonstrating how they lead to improved experiences of immersion (i.e. believing you are in a real interactive environment). so, we should not have to worry about answering this question by giving up what we have done for the past 60 or more years and only do field research, we are already addressing fundamental basic science and applied questions using paradigms that continue to be of value to many. what evidence exists for the influence of other kinds of knowledge besides structural knowledge on the results of cps? which of these kinds of knowledge should be examined in future research? here we would recast the question by talking about what, other than structural knowledge (or perhaps structural representation) (schoppek, 2002, 2004), is needed? in other words, what other representations are necessary beyond structural representations (by which we refer to as causal representations) in problem solving? actually, we would say that causal representations are key (holt & osman, 2017; osman, 2014, 2017) in and of themselves. if the audience is willing to accept our various slights of hand where we exchange “complex” for uncertain, and “problem solving” for decision-making, then there is an amassing body of work that shows that our cognition, at least high order cognition, though may others would argue all of our cognitive faculties, is premised on causal representations that connect our actions with observable effects in the world 10.11588/jddm.2019.1.69300 jddm | 2019 | volume 5 | article 11 | 2 https://doi.org/10.11588/jddm.2019.1.69300 osman: the future of problem solving (or even imagined effects in the world – prospective thinking osman, 2015). so, it’s not necessarily the case that other types of knowledge supersede causal knowledge, what needs to be better understood is how our causal representations evolve with more experience in dynamically uncertain decision-making contexts, and how they can be improved, and how do we construct causal representations in collaborative decision-making contexts, and are they better than those developed individually? what evidence is available for the impact of strategies (except votat) on the results of cps? which of these strategies should be examined more closely? again, if the reader is willing to accept our slights of hand where we make exchanges between key terms to discuss decision-making under dynamic uncertainty, then the field invites a whole raft of strategies that have an important bearing, and actually connect with many other disciplines (for review see mehlhorn et al, 2015; verduga & osman, 2019a, 2019b). we are specifically referring to the exploration-exploitation trade-off; searching for new information in order to make a more informed decision vs. utilising the information currently available to make repeated decisions. this trade-off is particularly common in environments of dynamic uncertainty and has practical implications regarding the effective use of resources under time-critical conditions for which the consequences of a sub-optimal decision are high. for instance, many real world problems involve resolving the exploration-exploitation trade-off problem, because accuracy in identifying objects of interest carries the cost of waiting for better and more reliable information, against less accuracy, by making a decision quickly but based on unreliable data. how humans deal with this trade-off is still open to debate, but approaches like the hybrid exploration strategy (gershman, 2018), combining random and directed exploration (looking for options that provide more information about the environment) could give us insight on the topic and therefore should be the subject of further examination in the context of cps/dynamic decisionmaking. is there intuitive cps? while there is a tradition of research examining the role of implicit or intuitive processes in the field of cps, there is also work that is highly critical of the methods used to demonstrate that these phenomena are actually critical to problem solving (decision-making), or for that matter, more radically, if they are actually present at all (for review see osman, 2010). there are many important discoveries that can be made without having to worry about whether the processes are conscious or not, especially given the many current controversies around the theories that have proposed that there are fundamental and dissociated systems/processes which are conscious and unconscious. there is a lot of work that has contested the value of this line of examination, (melnikoff, & bargh, 2018; osman, 2018) in areas outside of problem solving, as well as within the field (e.g., osman, 2008). if we ask the question: how do we improve cps (dynamic decision-making)? then we assume that we can present individuals (and maybe groups) critical knowledge at critical stages to support learning. if we go down the root of discussing intuition, then we get down to having to define what it is, reliably measuring it, and then demonstrating the conditions under which it most likely appears (for some and not others) and how that might present a barrier to improvements in cps above and beyond other well researched factors (e.g. individual differences, training techniques). in judgment and decision-making, as well as reasoning research, this has yet to be achieved in a way that has not been the source of enormous controversy and now significant challenge. so, the field of cps, ought to learn from the mistakes of others. what distinguishes experts in cps from laypersons? we actually cannot add any more to the rather pithy and highly insightful point that parker (1920d) made. he of course was referring to both children and adults when he characterised what constitutes the hallmark of great thinking, which is to raise doubt and use that to inform further inquiry. this doesn’t only apply in the context of cps, it is a property that many have been appealing to, and encouraging more training in to help us all navigate a complex information dense world (lewandowsky, mann, brown, & friedman, 2016). without doubt, scepticism, uncertainty, we are too quick to settle on something that requires challenge and to encourage us to learn and improve. we leave the final words to parker (1920d. pp 258) “the easiest way is to accept any suggestion that seems plausible and thereby bring to an end the condition of mental uneasiness. reflective thinking is always more or less troublesome, because it involves overcoming the inertia that inclines one to accept suggestions at their face value; it involves willingness to endure a condition of mental unrest.” declaration of conflicting interests: the author declares he has no conflict of interests. author contributions: both authors contributed to the content of this manuscript. the abstract was added by the editors. handling editor: andreas fischer and wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: osman, m. & palencia d.o.v. (2019). the future of problem solving research is not complexity, but dynamic uncertainty journal of dynamic decision making, 5, 11. doi: 10.11588/jddm.2019.1.69300 published: 31 dec 2019 10.11588/jddm.2019.1.69300 jddm | 2019 | volume 5 | article 11 | 3 https://doi.org/10.11588/jddm.2019.1.69300 https://doi.org/10.11588/jddm.2019.1.69300 osman: the future of problem solving references dörner, d., & funke, j. (2017). cps: what it is and what it is not. frontiers in psychology, 8, 1153. doi: 10.3389/fpsyg.2017.01153 gershman, s.j.(2018). deconstructing the human algorithms for exploration. cognition, 173, 34-42. doi: 10.1016/j.cognition.2017.12.014 kluge, a. (2014). the acquisition of knowledge and skills for taskwork and teamwork to control complex technical systems: a cognitive and macroergonomics perspective. heidelberg: springer. doi: 10.1007/978-94-007-5049-4 leibo, j.z., d’autume, c.d., zoran, d., amos, d., beattie, c., anderson, k., castañeda, a.g., sanchez, m., green, s., gruslys, a., legg, s., hassabis, d., & botvinick, m.m. (2018). psychlab: a psychology laboratory for deep reinforcement learning agents. arxiv, abs/1801.08116 lewandowsky, s., mann, m., brown, n., & friedman, h. (2016). science and the public: debate, denial, and skepticism. journal of social and political psychology 4(2), 537–553. doi: 10.5964/jspp.v4i2.604 liu, p., & li, z. (2012). task complexity: a review and conceptualization framework. international journal of industrial ergonomics, 42(6), 553-568. doi: 10.1016/j.ergon.2012.09.001 mehlhorn, k., newell, b. r., todd, p. m., lee, m. d., morgan, k., braithwaite, v. a., ... & gonzalez, c. (2015). unpacking the exploration–exploitation tradeoff: a synthesis of human and animal literatures. decision, 2(3), 191215. doi:10.1037/dec0000033 melnikoff, d. e., & bargh, j. a. (2018). the mythical number two. trends in cognitive sciences, 22(4), 280-293. doi: 10.1016/j.tics.2018.02.001 menelas, b.-a.j., benaoudia, r.s. (2017). use of haptics to promote learning outcomes in serious games. multimodal technologies interact, 1, 31. doi: 10.3390/mti1040031 milella, f. (2015) problem-solving by immersive virtual reality: towards a more efficient product emergence process in automotive. journal of multidisciplinary engineering science and technology (jmest), 2(4), 860 867. müller, r., & oehm, l. (2019). process industries versus discrete processing: how system characteristics affect operator tasks. cognition, technology & work, 21(2), 337-356. doi: 10.1007/s10111-018-0511-1 newell, a.; shaw, j.c.; simon, h.a. (1959). report on a general problem-solving program. proceedings of the international conference on information processing, 256–264. osman, m. (2008). observation can be as effective as action in problem solving. cognitive science, 32(1), 162-183. doi:10.1080/03640210701703683 osman, m. (2010a). controlling uncertainty: a review of human behavior in complex dynamic environments. psychological bulletin, 136(1), 65-86. doi:10.1037/a0017815 osman, m. (2010b). controlling uncertainty: decision making and learning in complex worlds. john wiley & sons. doi: 10.1002/9781444328226 osman, m. (2014). future-minded: the psychology of agency and control. london, uk: macmillan international higher education. osman, m. (2015). future-minded: the role of prospection in agency, control, and other goal-directed processes. frontiers in psychology, 6, 154. doi: 10.3389/fpsyg.2015.00154 osman, m. (2017). planning and control. the oxford handbook of causal reasoning, 279-293. doi: 10.1093/oxfordhb/9780199399550.013.19 osman, m. (2018). persistent maladies: the case of twomind syndrome. trends in cognitive sciences, 276-277. doi: 10.1016/j.tics.2018.02.005 parker, s. c. (1920a). problem-solving or practice in thinking. i. the elementary school journal, 21(1), 16-25. doi: 10.1086/454872 parker, s. c. (1920b). problem-solving or practice in thinking. ii. the elementary school journal, 21(2), 98-111. doi: 10.1086/454872 parker, s. c. (1920c). problem-solving or practice in thinking. iii. the elementary school journal, 21(3), 174-188. doi: 10.1086/454912 parker, s. c. (1920d). problem-solving or practice in thinking. iv. the elementary school journal, 21(4), 257-272. doi: 10.1086/454933 quesada, j., kintsch, w., & gomez, e. (2005). complex problem-solving: a field in search of a definition? theoretical issues in ergonomics science, 6(1), 5-33. doi: 10.1080/14639220512331311553 robertson, g., & watson, i. (2014). a review of realtime strategy game ai. ai magazine, 35(4), 75-104. doi: 10.1609/aimag.v35i4.2478 schoppek, w. (2002). examples, rules, and strategies in the control of dynamic systems. cognitive science quarterly, 2(1), 6392. schoppek, w. (2004). teaching structural knowledge in the control of dynamic systems: direction of causality makes a difference. proceedings of the annual meeting of the cognitive science society, 26(26). schoppek, w., kluge, a., osman, m., & funke, j. (2018). cps beyond the psychometric approach. frontiers in psychology, 9, 1224. doi: 10.3389/978-2-88945-573-7 toda, m. (1962). the design of a fungus-eater: a model of human behavior in an unsophisticated environment. behavioral science, 7(2), 164-183. doi: 10.1002/bs.3830070203 verduga, d, o., & osman, m. (2019a). investigating the exploration-exploitation trade-off in dynamic environments with multiple agents. proceedings of the 41st annual meeting of the cognitive science society, montreal, canada. verduga, d, o., & osman, m. (2019b). psyrts: a web platform for experiments in human decision-making in rts environments. ieee conference on games (cog), london, uk. doi: 10.1109/cig.2019.8848101 vinyals, o., babuschkin, i., chung, j., mathieu, m., jaderberg, m., et al. (2019). alphastar: mastering the real-time strategy game starcraft ii. url: https://deepmind.com/blog/alphastarmastering-real-time-strategy-game-starcraft-ii/. woods, d. d., & hollnagel, e. (1987). mapping cognitive demands in complex problem-solving worlds. international journal of man-machine studies, 26(2), 257-275. doi: 10.1016/s00207373(87)80095-0 10.11588/jddm.2019.1.69300 jddm | 2019 | volume 5 | article 11 | 4 https://doi.org/10.3389/fpsyg.2017.01153 https://doi.org/10.1016/j.cognition.2017.12.014 https://doi.org/10.1016/j.cognition.2017.12.014 https://doi.org/10.1007/978-94-007-5049-4 http://https://arxiv.org/abs/1801.08116 https://doi.org/10.5964/jspp.v4i2.604 https://doi.org/10.5964/jspp.v4i2.604 https://doi.org/10.1016/j.ergon.2012.09.001 https://doi.org/10.1037/dec0000033 https://doi.org/10.1016/j.tics.2018.02.001 https://doi.org/10.1016/j.tics.2018.02.001 https://doi.org/10.3390/mti1040031 https://doi.org/10.1007/s10111-018-0511-1 https://doi.org/10.1007/s10111-018-0511-1 https://doi.org/10.1080/03640210701703683 https://doi.org/10.1037/a0017815 https://doi.org/10.1002/9781444328226 https://doi.org/10.1002/9781444328226 https://doi.org/10.3389/fpsyg.2015.00154 https://doi.org/10.1093/oxfordhb/9780199399550.013.19 https://doi.org/10.1093/oxfordhb/9780199399550.013.19 https://doi.org/10.1016/j.tics.2018.02.005 https://doi.org/10.1016/j.tics.2018.02.005 https://doi.org/10.1086/454872 https://doi.org/10.1086/454872 hhttps://doi.org/10.1086/454872 hhttps://doi.org/10.1086/454872 https://doi.org/10.1086/454912 https://doi.org/10.1086/454912 https://doi.org/10.1086/454933 https://doi.org/10.1086/454933 https://doi.org/10.1080/14639220512331311553 https://doi.org/10.1080/14639220512331311553 https://doi.org/10.1609/aimag.v35i4.2478 https://doi.org/10.1609/aimag.v35i4.2478 https://doi.org/10.3389/978-2-88945-573-7 https://doi.org/10.1002/bs.3830070203 https://doi.org/10.1109/cig.2019.8848101 https://doi.org/10.1109/cig.2019.8848101 https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/. https://doi.org/10.1016/s0020-7373(87)80095-0 https://doi.org/10.1016/s0020-7373(87)80095-0 https://doi.org/10.11588/jddm.2019.1.69300 opinion heigh-ho: cps and the seven questions – some thoughts on contemporary complex problem solving research jens f. beckmann1 1durham university, united kingdom. in this paper, i share some reflections on a set of questions posed by the editors of journal for dynamic decision making in relation to research in complex problem solving (cps). these questions, especially in their combination, suggest problems in cps research with regard to identity, direction, and purpose. i focus on three issues. the first issue is the diversity in objectives and methodological approaches subsumed under the label of cps. the resulting conceptual ambiguity makes it challenging to define cps and thus, to identify ways to develop cps research further. the second issue is the tendency in contemporary cps research to employ psychometrics for autotelic purposes rather than utilising it as the tool that helps linking the conceptual with the empirical in psychological research. the third issue refers to the conceptual vacuum around the essential element of cps, namely the concept of complexity. the tendency to substitute complexity (as a psychological concept) with difficulty (as a psychometric concept) tends to perpetuate the existing conceptual limitations and to compound the circularity associated with an operational definition of cps. indifference towards these issues is major hindrance to resolving the issues around identity, direction and purpose, and to making meaningful progress in cps research. heigh-ho: cps and the seven questions – some thoughts on contemporary complex problem solving research as henry louis mencken could have said, for every complex problem there is an easy solution that is neat, plausible, and wrong1. i think it will not come to anyone’s surprise that this rings also true for research in complex problem solving (cps). therefore, i shall not attempt to offer neat answers to the seven questions about cps as they have been posed by the editors of this special issue. in the well-known fairy tale, each of the seven dwarfs asked a question when they – to their surprise – encountered the presence of snow white in their abode after returning from a hard day’s work in the mine. whilst their questions led to one unifying answer, i do not expect this to be the case for the cps-related questions asked here. also, these cpsrelated questions do not come to anyone’s surprise, i am afraid: (q1) why should there be cps research? (q2) how does current cps research contribute to solving “real life” problems? (q3) can laboratory-based cps research be relevant? (q4) what is the role of knowledge in cps? (q5) what is the role of strategies in cps? (q6) what is the role of intuition in cps? (q7) do experts solve cps problems differently to laypersons? asking questions is one way of stock-taking, allowing to reflect on what has been achieved, what is the status quo, and where should we go from here. somewhat paradoxically, part of the answers to those questions lies in the questions themselves. from these questions it seems apparent that cps research does have problems. these problems, some of them more complex than others, suggest issues with identity, direction and purpose. it is my impression that many of these issues are largely self-inflicted, which nurtures my optimism that resolving these is in our hands, or stated more appropriately, in our minds. the distinction between hands and minds shall also serve as reminder of the importance of the substantive, or the conceptual, and of the role the empirical has as a means to an end in the process of conducting empirical research in the social sciences. it therefore provides some orientation for how to address the problems complex problem solving has to solve. diversity to get anywhere near a meaningful starting point for reflections on the above-stated questions, we first need to determine what cps stands for. as it turns out, this is a bigger problem as anticipated by some. the label “cps” has been used in many, quite different contexts and for diverse purposes. for instance, cps has been used for labelling a research paradigm employed to study psychological concepts such as information processing, learning, decision-making, causal reasoning, knowledge acquisition (see q4), strategy use (see q5), and more. cps has also been used as a label for a predominantly cognitive ability (i.e., “complex problem solving ability”). another, innocuous but also rather uninformative use of cps would be as a label for a class of observed behaviour that individuals exhibit when confronted with computer-simulated microworlds. these are just a few examples for the diversity in the use of the label cps (i.e., as description of a research methodology, as description of latent psychological constructs, or as description of observable behaviour). such diversity and the resulting lack of clarity in meaning creates a considerable 1 the exact quote is “... there is always a well-known solution to every human problem – neat, plausible, and wrong” (mencken, 1921, p. 158). 10.11588/jddm.2019.1.69301 jddm | 2019 | volume 5 | article 12 | 1 https://doi.org/10.11588/jddm.2019.1.69301 beckmann: some thoughts on cps challenge to passing a verdict on cps research’s raison d’être (see q1). this diversity problem will not necessarily be solved by focussing on only one use and ignoring the other alternative uses. by looking at the body of empirical research on cps related to abilities one might get the impression that depending on the correlations found to scores from other measures of cognitive abilities (e.g., reasoning tests), cps is “narrowed or downgraded” to a skill, or “widened or upgraded” to a competency, or anything in between. such conceptual flexibility is a side effect of a predominantly operational definition of cps (i.e., cps is what one measures with tasks that carry the label cps). treating ability, skill and competency as synonyms, as it happens occasionally, makes the problem of vagueness even worse. or, as they say, what happens in vagueness stays in vagueness. there are, of course, attempts to get a more conceptual handle on the definition problem. in a more recent attempt to navigate the complex landscape of cps research, dörner and funke (2017) have suggested that the term complex problem solving should be reserved for dealings with ill-defined problems. there is some irony in it as cps research appears rather ill-defined itself. this seems to create an odd impasse: any efforts to define cps would make it potentially non-cps. the situation, however, is a bit more complex than that. the distinction between welldefined and ill-defined is not a dichotomy, which creates the challenge of determining where well-defined ends and ill-defined begins on an imaginary definition scale. in addition, problems that were previously considered ill-defined might become ever so slightly better defined as we progress in our conceptual understanding. the fact that we tend to contemplate the same or similar questions over the past 4 decades seem to suggest that the pace with which we make conceptual progress is rather slow. whilst this might reduce the risk of dealing with moving targets when it comes to defining cps, it should not be mistaken as justification for contend with the status quo. an other way of addressing the ambiguity problem and to cater towards a more or less peaceful co-existence of different conceptual foci and methodological approaches in cps research would be to declare cps a mere “umbrella” term. emphasising communalities (e.g., shared interests, goals, and tool use) and downplaying (real or otherwise) differences has been necessary and strategically functional in the early days of cps research when the primary challenge was to establish a “new idea”. having reached a state now where everyone seems to have staked their claims in this mine (heigh-ho, heigh-ho!) the all-encompassing umbrella has outlived its usefulness. one side effect of continuing to subsume considerable diversity under a common label is that constructive impulses (e.g., via feedback from outside or within cps research) tend to have limited productive impact as it seems to be always “the others” who should take notice. it is not too difficult to see how that can stifle conceptual progress in a research area such as cps. a non-definition of an ill-defined problem in seeming contradiction to the above-mentioned notion of “true” cps being ill-defined, i continue with an attempt to offer some clarifications regarding the term complex problem solving. i start at the end: problem solving. a simple, yet powerful and therefore widely accepted framework of problem solving posits that all problems are comprised of three components: a current state of affairs; a goal state of affairs; and a set of steps that need to be taken to move from one to the other. i would argue that this applies even to ill-defined problems. problems vary in ways of which these components are specified or known, and which of these components are expected to be identified by the problem solver. that means, the actual problem could be either to identify the set of actions needed to reach a goal state, or to find out what the unknown end state is going to be, or to “backward engineer” what the initial state of affairs was given the end state and the set of transition rules. for example, in a performance management context, having received feedback that current performance levels are deemed unsatisfactory (i.e., evaluative framing of the current state of affairs) and being made aware of what the expected performance levels are (i.e., defining the goal state of affairs) leaves one with the problem of identifying how to move from one to the other. or, in the aftermath of a financial collapse of a company one might be tasked with identifying “where it all went wrong”. the problem solving focus here would be on the transition processes from retrospectively inferred previous states of affairs that have led to the current state of affairs. another example might be to estimate the likelihood of convalescence (i.e., goal state of affairs) given the clinical symptoms presented in a patient (i.e., current state of affairs) and the knowledge about the effect principles of a specific combination of treatments (i.e., set of probabilistic transitions). based on such rather simple framework using three basic problem components and varying what is known and what is unknown has the potential to capture a wide range of (even “real-life”, see q2) problems. problems, of course, also differ with regard to the level of detail with which these components are or can be specified. “real-life” problems tend to be more towards the lower end of this spectrum and might therefore be considered ill-defined. here the effort required in clarifying the current state of affairs and / or the desired state of affairs can be much greater than figuring out the actual steps that need to be taken to move from one to the other (wood, cogin, & beckmann, 2009). when striving for real life relevancy it is worth to keep in mind that not all “real-life” problems are necessarily “ill-defined” and being “ill-defined” does not make a problem a “real-life” problem. real life relevance (see q3) or “ecological validity” (as one of those terms afflicted by inflationary use) of cps is neither “proven” via correlations with some “real-life” variables (see q2), nor achieved by “making it look like the real thing”. the use of semantically enriched cover stories and accordingly labelled variables in cps tasks can be seen as one example for such attempts. as it has been shown (e.g., beckmann & goode, 2014, 2017) this has more often than not detrimental effects on the quality of what is measured in the end. as borsboom and colleagues remind us “. . . the problem of validity cannot be solved by psychometric techniques or models alone. on the contrary, it must be addressed by substantive theory. validity is the one problem in testing that psychology cannot contract out to methodology” (borsboom, mellenbergh, & van heerden, 2004, p. 1062). reflections in relation to q2 could benefit from a better conceptual foundation. one impulse to that effect could come from lewin’s notion of geschehenstyp (or “principle of the type of event”, as an imperfect translation, lewin, 1927/1992). it posits that the task paradigm used in experimental, laboratory-based research needs to reflect the structure and function of the processes involved in the class of real-life situations that is targeted (see q3). it is the “common logic of research in the laboratory and the field” what generalisation claims have to be 10.11588/jddm.2019.1.69301 jddm | 2019 | volume 5 | article 12 | 2 https://doi.org/10.11588/jddm.2019.1.69301 beckmann: some thoughts on cps based on (gadenne, 1976). in the context of cps research this means that cognitive (or otherwise) processes triggered by experimental tasks need to overlap with what we expect to take place when dealing with said real-life problems. as said, this cannot be achieved through attempts to create some form of superficial resemblance of surface features (such as variable labels) between lab task and targeted real-life problem. it means, however, that some ex ante ideas are needed about both the real-life problem and the laboratory task (see q3). such ideas have to be derived from theory based task analyses. consulting what psychological theories have to offer (truly) prior to engaging in data collection reduces the risk of declaring post hoc interpretations of correlation patterns as “proof of a theory”. another advantage of putting the conceptual horse back in front of the empirical carriage is that it will help us to identify where gaps exist in our theories. meaningful research should focus on strengthening the theoretical foundations of our conceptual understanding of the psychological processes involved when people attempt to deal with complex, dynamically changing challenges in their lives. the empirical side of research is a means to that end. consulting “the oracle of numbers” instead cannot serve as a substitute for the conceptual work we ought to be doing. the elephant in the ... problem space after some clarifications regarding problem solving, i will now focus on the remaining component of the term cps, which is “complex”. as has been discussed previously at lengths and in detail (beckmann & goode, 2017; see also beckmann, birney & goode, 2017; beckmann, 2010; birney, beckmann, & seah, 2016), i argue that the lack of a shared understanding of what we mean by complexity in cps research is the major barrier to any meaningful progress. this problem is rooted in a predominantly datadriven take on cps, which promotes the tendency to put the (empirical) cart before the (conceptual) horse. in research that starts with data and ends with data the concept of complexity is substituted by the notion of difficulty. difficulty, however, is a psychometric concept with limited conceptual and explanatory reach. it descriptively informs us about that one challenge (e.g., an intelligence test item or a cps task) has been successfully tackled by fewer people than another challenge. the former is then declared being more difficult than the latter. if one were interested in the reasons for such outcome (i.e., an explanation), one would be left with a tautological argument (i.e., because more people solved the second problem). simply replacing difficulty with complexity, as it happens notoriously often in cps research, will not provide an explanation. as stated previously “. . . complexity reflects ex ante considerations of the cognitive demands imposed by the task and the circumstances under which the task is to be performed . . . , which makes complexity a primarily cognitive concept. difficulty, in contrast, is experiential, person-bound and by definition, statistical” (beckmann, birney, & goode, 2017, p. 2). by using difficulty and complexity interchangeably one mistakes descriptions for explanations. we need to understand, or description 6= explanation some research where certain result patterns are interpreted as indications of intuition (see q6) or strategy (see q5) could serve as examples for such tendency. the term “intuition” seems all too readily be used as an “explanation” for situations where perceived success in problem solving is seemingly independent of structural, causal, strategic, or prior “world-knowledge” (see q4). such constellation of knowledge-free success could, however, be an effect of employing ineffective measures of problem-relevant knowledge, which often is a result of a weak conceptual foundation of the operationalisation of knowledge in cps. there are, of course, other potential reasons for not finding correlational links between knowledge and problem solving performance. for instance, when problem solvers are asked to bring a system’s current state to a set goal state, they might follow the simple, knowledge free heuristic of an intervention-by-intervention optimisation (e.g., beckmann & goode, 2017; beckmann & guthke, 1995). labelling such interaction behaviour “intuition” would be misleading. an uncritical use of “intuition” in cps creates the considerable risk of masking remaining limitations in our conceptualisation (i.e., underpinning theories) and operationalisation (i.e., the variables derived from measures), and the links between the two. the underlying problem is that many labels we tend to use for describing observed phenomena in cps carry pseudo-explanatory meanings, so that descriptions are being mistaken as explanations. this also applies to the use of “strategy” (see q5) for labelling clusters of interaction patterns that have been identified by statistical routines trawling sets of secondary data (a.k.a. log file analysis). again, “intuition” would be an attractive candidate for labelling interaction behaviour that has resulted in acceptable outcomes, yet does not exceed the statistically necessary systematicity threshold for constituting a “strategy” cluster. in other words, failing to identify some sort of systematicity pattern (or residual, in statistical terms) in the average problem solver’s interaction with the task, which would otherwise be given the label “strategy x”, is not necessarily a reason for labelling it “intuition”. in the context of strategy-focussed cps research that starts with a priori considerations of how effective and efficient interaction behaviour for knowledge acquisition should look like, the so-called votat2 strategy plays a prominent role. votat, however, has little relevance in dealing with real world problems as complex, dynamic (“ill-defined”?) problems faced “in the wild” as such problems tend to not afford one with the freedom to vary none or only one variable at a time. hence, strategies used by successful problem solvers in the lab will not necessarily separate the successful from the less successful problem solvers in real life (see q2, q3, & q5). using expert-novice comparisons as another example for a research topic within cps (see q7), conceptual groundwork needs to clarify a few basics before engaging in data collection (or the analysis of existing data). for instance: given that expertise is domain and knowledge specific, and given that “true” complex problem solving is “ill-defined” and/or “ill-structured”, what cps expertise would look like? the answer to this question has implications for how to measure cps expertise. another issue that should be addressed conceptually prior to engaging in the empirical 2 votat, which stands for vary-one-variable-at-a-time, is only functional if such interaction behaviour is preceded by a zero intervention, which is needed to identify autonomic changes in the to be explored system. therefore, the “desired” strategy should be more appropriately labelled vonat (vary-one-or-none-ata-time, beckmann, birney, & goode, 2017, p. 4; beckmann and goode, 2014; p. 279; beckmann and goode, 2017, p. 9). 10.11588/jddm.2019.1.69301 jddm | 2019 | volume 5 | article 12 | 3 https://doi.org/10.11588/jddm.2019.1.69301 beckmann: some thoughts on cps would be: does it take 10 years of deliberate practice to become an expert in cps? if so, what would be considered deliberate practice in cps? or, do the artificial time horizons as they are typical for computerised cps tasks or simulations allow for a more time efficient acquisition of expertise? does (extensive) experience in dealing with cps tasks as they are used in cps research satisfy the criterion of expertise? given the diversity of approaches to measure cps, predictions (i.e., made prior to “knowing the outcome”) of where expertise is expected to shine and where not would be needed. otherwise one is confronted with a situation where those who outperform others are post hoc declared experts, which, of course, will have limited exploratory value. unexpected (or undesired?) correlation patterns can be dealt with by either expressing doubts regarding the status of expertise achieved by problem solvers involved in the study, or by referring to the limited accessibility of experts’ knowledge – be it declarative or strategic (see q4) – due to their higher levels of cognitive automatization, which, consequently, might even lead to the “conclusion” that experts come to problem solutions intuitively (see q6). post hoc interpretations such as these are readily available to “protect” a potentially inadequate conceptual foundation from being challenged (and eventually improved). structure in the ill-structured research in cps (as is the case for any empirical research in the social sciences) can be characterised as a hierarchy of three objectives. these are description, explanation and intervention. “. . . a proper description of the phenomena of interest is a necessary (but not sufficient) precondition for developing adequate levels of understanding of the causal mechanisms that underlie them. an adequate understanding or explanation of the phenomena under question is, again, a necessary, yet not sufficient precondition for research to have meaningful impact in the “real world,” for example, in form of effective interventions.” (beckmann, 2018, p. 121). each objective calls upon specific sets of methodologies, including research design and sampling. for instance, the analysis of secondary observational data – as it is the foundation of log file analyses – would represent a mismatch to ambitions of establishing an understanding of causal effects. or, if we were to consult the seven dwarfs in the well-known fairy tale: while going about their mining work enthusiastically the short folks sing “we dig dig dig . . . up everything in sight; we dig up diamonds by the score; a thousand rubies, sometimes more; but we don’t know what we dig ’em for; we dig dig dig a-dig dig . . . ”. i am inclined to interpret these lyrics as a reminder of the potential risks that lie in data mining. based on descriptive data we cannot make (post hoc) conceptual, explanatory claims. descriptions of observed effects should not be interpreted as evidence of understanding. to avoid misunderstandings, log file analyses are certainly useful approaches to describe problem solvers’ (average) behaviour interacting with computerised cps tasks. results of logfile analyses can be effective in forming preliminary speculations about cognitive, conative or other psychological processes involved. such speculations, translated in hypotheses, however, need to be properly tested in controlled experiments before we can claim to have gained some conceptual insights regarding their role in cps. conclusion = solution? in this paper i have shared thoughts regarding some issues in cps research that were innervated by a set of seven questions raised by the editors of this special issue. i wish these reflections to be perceived as an encouragement to a reversion to the conceptual foundations of problem solving and decision making. if cps research were to do so, i suspect two things to emerge. on the one hand, i expect a realisation that we already know more than our current research practices seem to suggest. on the other hand, however, we will notice that our theoretical foundation has substantial limitations in describing, explaining and prescribing real world problem solving. our efforts should primarily be directed towards the development or refinement of problem solving theories. deficits in theory cannot be compensated for, by, say, larger sets of data. such necessary reorientation would have to start with research geared towards a thorough description of the phenomena of interest (but without misinterpreting this as explanation), which then creates the foundation for research that aims to establish an understanding of these phenomena. the methodology necessary for that differs considerably from the one suitable for the purpose of description. only if subsequent research efforts to prescribe interventions or pedagogies are based on such understanding, will we be in a promising position to devise effective strategies for the development and acquisition of problem solving competencies that enable individuals or groups of individuals to tackle the complex challenges in real life. this stricture also applies to using cps tasks as assessment tools. their meaningfulness and usefulness cannot be established by primarily relying on correlation matrices, it rather requires sufficient levels of conceptual understanding first. the linchpin of such reprioritisation of research efforts in cps is a solid conceptual understanding of complexity. i started with the expectation that the seven questions stated at the beginning cannot be addressed by one single answer (as it was the case for the questions asked by the seven dwarfs as they tried to establish who has broken into their house). i would like to argue, however, that these seven cps-related questions should be addressed by a single question: why do we not have a better understanding of what complexity means in complex problem solving? if we continue to fail addressing this question, we are likely to mull over the same seven questions stated in the beginning of this paper in, say, ten years’ time, should cps research not already have fallen into disregard by then. as it was a stumble that brought snow white back to life in the fairy tale, i hope that an overdue step change in cps research, namely to start with theory and not with data, will breathe fresh life into it too. declaration of conflicting interests: the author declares he has no conflict of interests. author contributions: the author is completely responsible for the content of this manuscript. handling editor: andreas fischer and wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. 10.11588/jddm.2019.1.69301 jddm | 2019 | volume 5 | article 12 | 4 https://doi.org/10.11588/jddm.2019.1.69301 beckmann: some thoughts on cps citation: beckmann, j.f. (2019). heigh-ho: cps and the seven questions – some thoughts on contemporary complex problem solving research journal of dynamic decision making, 5, 12. doi: 10.11588/jddm.2019.1.69301 published: 31 dec 2019 references beckmann, j.f. (2010). taming a beast of burden – on some issues with the conceptualization and operationalisation of cognitive load. learning and instruction, 20, 250-264. doi:10.1016/j.learninstruc.2009.02.024 beckmann, j.f., & goode, n. (2014). the benefit of being naïve and knowing it: the unfavourable impact of perceived context familiarity on learning in complex problem solving tasks. instructional science, 42(2), 271-290. doi: 10.1007/s11251-013-92807 beckmann, j.f., & guthke, j. (1995). complex problem solving, intelligence, and learning ability. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective (pp. 177–200). hillsdale, nj: erlbaum. doi: 10.4324/9781315806723 beckmann, j.f. (2018). deferential trespassing: looking through and at an intersectional lens. new directions for child and adolescent development, 161, 119-123. doi: 10.1002/cad.20243 beckmann, j.f., & goode, n. (2017). missing the wood for the wrong trees: on the difficulty of defining the complexity of complex problem solving scenarios. journal of intelligence, 5,15. doi: 10.3390/jintelligence5020015 beckmann, j.f., birney, d.p. & goode, n. (2017). beyond psychometrics: the difference between difficult problem solving and complex problem solving. frontiers in psychology, 8, 1739. doi: 10.3389/fpsyg.2017.01739 birney, d.p., beckmann, j.f., & seah, y.z. (2016). more than the eye of the beholder: the interplay of person, task and situation factors in evaluative judgments of creativity. learning and individual differences, 51, 400-408. doi: 10.1016/j.lindif.2015.07.007 borsboom, d., mellenbergh, g.j., & van heerden, j. (2004). the concept of validity. psychological review, 111(4), 1061-1071. doi: 10.1037/0033-295x.111.4.1061 dörner, d. & funke, j. (2017). complex problem solving: what it is and what it is not. frontiers in psychology, 8, 1153. doi: 10.3389/fpsyg.2017.01153 gadenne, v. (1976). die gültigkeit psychologicher untersuchungen [the validity of psychological inquiry]. stuttgart: kohlhammer. lewin, k. (1927). gesetz und experiment in der psychologie [law and experiment in psychology]. symposium, 1, 375-421. lewin, k. (1992). law and experiment in psychology. science in context, 5, 385-416. doi:10.1017/s0269889700001241 mencken, h.l. (1921). prejudices. second series. london: jonathan cape. wood, r.e., cogin, j., & beckmann, j.f. (2009) managerial problem solving: frameworks, tools & techniques. mcgraw hill australia. corresponding author: jens f. beckmann, school of education, durham university, durham dh1 1ta, united kingdom. email: j.beckmann@durham.ac.uk 10.11588/jddm.2019.1.69301 jddm | 2019 | volume 5 | article 12 | 5 https://doi.org/10.11588/jddm.2019.1.69301 http://doi.org/10.1016/j.learninstruc.2009.02.024 http://doi.org/10.1007/s11251-013-9280-7 http://doi.org/10.1007/s11251-013-9280-7 http://doi.org/10.4324/9781315806723 http://doi.org/10.4324/9781315806723 http://doi.org/10.1002/cad.20243 http://doi.org/10.3390/jintelligence5020015 http://doi.org/10.3390/jintelligence5020015 http://doi.org/10.3389/fpsyg.2017.01739 http://doi.org/10.3389/fpsyg.2017.01739 http://doi.org/10.1016/j.lindif.2015.07.007 http://doi.org/10.1016/j.lindif.2015.07.007 http://doi.org/10.1037/0033-295x.111.4.1061 http://doi.org/10.3389/fpsyg.2017.01153 http://doi.org/10.3389/fpsyg.2017.01153 http://doi.org/10.1017/s0269889700001241 mailto: https://doi.org/10.11588/jddm.2019.1.69301 original research dynamic sunk costs: importance matters when opportunity costs are explicit jason harman1, claudia gonzález-vallejo2, and jeffrey b. vancouver2 1louisiana state university, department of psychology, usa, and 2ohio university, department of psychology, usa the sunk cost fallacy is a well-established phenomenon where decision makers continue to commit resources, or escalate commitment, because of previously committed efforts, even when they have knowledge that their returns will not outweigh their investment. most research on the sunk cost fallacy is done using hypothetical scenarios where participants make a single decision to continue with a project or to abandon it. this paradigm has several limitations resulting in a relatively limited understanding of sunk cost behavior. to address some of these limitations, we created a dynamic repeated choice paradigm where sunk costs are learned over time and opportunity costs are explicit. over three experiments we show that the sunk cost fallacy depends on the relative a priori importance of the goal being invested in. we observed escalation of commitment only when the sunk cost domain is more important than alternatives (explicit opportunity costs). participants showed de-escalation of commitment to the sunk costs domain otherwise. keywords: sunk cost fallacy; opportunity costs; repeated decisions; resource allocation; multiple goal pursuit introduction time may be our most valuable resource, and bal-ancing our time among multiple pursuits is a pervasive and recurrent decision. from simple choices, whether to answer an email or chat with a colleague, to more difficult ones such as whether to continue to pursue a college major that is unsatisfying, how we choose to spend our time may be one of the most important decisions we make on a daily basis. one complicating factor in larger allocation decisions, such as changing a college major, is the time and energy already spent pursuing that goal. though normative economic theories posit that only incremental costs (the cost/time spent moving forward) should influence decisions, a large body of evidence has shown that previously invested resources influence future decisions. most notably, the sunk cost effect (arkes & blumer, 1985) refers to the tendency to continue to invest in a project that clearly has no future benefit driven by the influence of previously invested resources. in a typical sunk cost experiment, a person is presented with a hypothetical project that is nearly completed when it becomes apparent that the project will not produce any benefit. the person is then asked if he or she should commit resources to complete the project. although additional investment would mean throwing good money after bad, a majority of participants choose to complete the project, presumably because they view the already invested resources as wasted if the project is not finished (arkes, 1996; arkes & hutzel, 1997). a majority of sunk cost studies use the hypothetical one time choice scenario outlined previously (notable exceptions include; arkes & blumer, 1985 — experiment 2 and strough, schlosnagle, karns, lemaster, & pichayayothin, 2013). two limitations of these paradigms are that they are one time choices, whereas real life decisions of this type are made repeatedly over time, and they often do not make opportunity costs, i.e., other choice options that the resources could be put toward, explicit (see northcraft & neale, 1986, for one exception). for example, a decision to change your college major is a one-time choice, however if you choose not to change, the choice is available every day and your preference for continuing a commitment could change over time. a related point is that sunk costs are not always known with certainty. dissatisfaction with a college major today could indicate that a future provided by another major would be more rewarding, or it could be a passing phase. in this case, one would expect certainty to grow with accumulating evidence of dissatisfaction, leading to stronger confidence that a course of action is becoming a sunk cost over time. the second shortcoming of typical scenario based sunk costs studies is the lack of explicit opportunity costs. when presented with a choice to continue a project or not, the framing of the choice is between a sure loss (not continuing) and a probable larger loss (continuing). in reality, not continuing a project frees resources to be used elsewhere and should actually be seen as a sure loss with probable gains. northcraft and neal (1986) confirmed this framing, showing that participants did not consider opportunity costs when they are not made explicit. when opportunity costs were made explicit, 12 of 20 participants switched to discontinue earlier choices. a related note is that with no explicit opportunity costs, the relative importance of a domain is unclear. sunk costs may be easy to corresponding author: jason harman, e-mail: jharman@lsu.edu 10.11588/jddm.2020.1.71968 jddm | 2020 | volume 6 | article 2 | 1 mailto:jharman@lsu.edu https://doi.org/10.11588/jddm.2020.1.71968 harman et al.: dynamic sunk costs avoid in unimportant domains but more difficult in more important domains. no previous research has looked at the a priori importance of a domain in a sunk cost experiment. over three studies, we examined how people dynamically allocate their time among different pursuits when one pursuit does not increase sufficiently with investment (i.e., becomes a sunk cost). the paradigm we created is unique in that sunk costs are not explicitly stated, instead they are learned over time through passive feedback. additionally, our paradigm makes opportunity costs explicit (areas in which time is not allocated to decay). finally, we collect data on the perceived importance of domains beforehand. we find a dynamic sunk cost effect in two of the three studies with the key feature being a priori importance of a domain. when a sunk cost domain was clearly more important than the opportunity costs, participants escalated commitment towards the domain. in all other cases participants de-escalated commitment. experiment 1 methods to explore time allocation decisions, we created a computer game we called sim-life (see figure 1) where participants made repeated choices to spend their free time among three domains. the status on each domain (i.e., the cumulative rewards or losses) is displayed throughout the task; the status of a domain improves over time when it is selected and decays over time when not chosen. participants made 100 choices, simulating the number of days in an academic quarter. after each choice participants were shown a 5-second slide show of pictures representing the domain they had just selected to simulate the passage of time. the status of each domain was calculated on a 100 point scale however participants were not aware of their exact status at any point. instead the visual feedback was presented as a scale which had ten categorical levels (i.e. the scale only moved when their underlying score crossed a ten point threshold: see (figure 1). table 1 displays the feedback functions for the task used in all three experiments. because a domain’s status decreases if it is not chosen, opportunity costs of choosing only one or two options are learned over time. all domains were set at 90 on trial one, visually displayed as the next to highest point on the scale, and changed accordingly depending on the choices participants made. we varied the manner in which a domain was a sunk cost. in condition 1, only relatively small positive point increments were obtained when the sunk cost domain was chosen. in condition 2, relatively large decrements of points resulted when the sunk cost domain was not chosen. that is, selection produces either relatively little reward, or non-selection produces relatively large losses. although we predicted learning effects between these conditions (sunk costs would be identified faster in condition 1 than in condition 2) these differences did not affect any of the global results we report here and thus are omitted for brevity. before starting the game, participants indicated how important each domain was to themselves in three separate questions. the first was a rank ordering of the three domains. second, they indicated on a scale from 1 to 5 how important each domain was to them. finally, participants allocated 100 points between the three domains representing their relative importance. results from these three measures were consistent and we present the mean ratings for all three measures in table 2. in the game, participants were told that they have a set schedule of classes, studying, and work that leaves them about two hours of free time each day that they can choose to spend on one of the three domains. in experiment 1, the three domains were friends, their romantic partner, and academics. in addition to the reason for the sunk cost (weak reward vs. strong loss), our other between-subjects condition was which domain was instantiated as the sunk cost (i.e., a losing domain). based on pilot data, we expected academics to be considered more important than the other two domains. figure 1. screenshot of experiment 1 paradigm. participants fifty nine (37 female, median age = 20) undergraduate participants were randomly assigned to one of three conditions. the only difference between conditions was which domain was instantiated as a sunk cost: relationship, friends, or academics. table 1. status points as a function of selection or non-selection for the three domains. sunk cost domain x domain y condition 1 decay per trial -3 -3 -3 if chosen +4 +7 +7 condition 2 decay per trial -6 -3 -3 if chosen +7 +7 +7 10.11588/jddm.2020.1.71968 jddm | 2020 | volume 6 | article 2 | 2 https://doi.org/10.11588/jddm.2020.1.71968 harman et al.: dynamic sunk costs figure 2. experiment 1 choice frequency. the mean choice frequency across blocks of ten trials for each domain is displayed for each condition. the sunk cost domain, indicated for each condition above the graph, is represented by the dashed line. results mean importance ratings for each experiment are shown in table 2 along with standard errors. for simplicity, we only present results of paired samples t-tests based on the 1-5 importance ratings using a bonferroni correction. consistent with pilot studies, participants rated academics as more important than relationship, t (58) = 7.28, p < .001, and friends, t (58) = 8.23, p < .001. there was no difference between relationship and friends, t (58) = 0.146, p = .88. to analyze choice over time, we split the data into 10 sequential blocks of trials, calculating the choice frequency for each domain in each block. mean choice proportions for each condition are shown in (figure 2 with the domain instantiated as a sunk cost highlighted. to examine choice behavior, individual change scores across the ten blocks of trials were calculated for each participant in each domain. negative scores indicate de-escalation of resource allocation across trials, and positive scores indicate increased resource allocation. to test whether choices between the sunk cost domain and the other two domains differed, we performed a repeated measures anova on the change scores from each domain across the three conditions revealing a significant interaction between domain change scores and which domain was the sunk cost, f(4, 110) = 2.658, p = .04, h2 = .081. when academics (the most important domain) was the sunk cost, participants escalated their resource allocation to academics as trials progressed (mean change score = 2.03, sd = 2.83, significantly different from zero, t (30) = 3.99, p < .001). when the friends domain was the sunk cost, participants de-escalated commitment, allocating fewer choices to the domain as trials progressed on average (mean change score = -1, sd = 2.68) though this was not significantly different than 0, t (15) = 1.49, p = .15. when relationships was the sunk cost, participants neither escalated nor de-escalated resource allocation to that domain across trials (mean change score = 0.1765, sd = 2.5, t (15) = 1.49, p = .15). to quantify the detrimental effect of escalating the commitment of resources to a dynamic sunk cost we examined the status of each domain at the end of the task. the status of each domain is a direct function of choice frequency (given the feedback structure in table 1). the mean final status on a scale of 1 to 100 for each condition for experiment 1 is shown in table 3. when participants cut their losses, that is stop selection of domains that were not advantageous (friends and relationships), they did better overall with a mean of 2.05 in each of the sunk cost domains and an overall mean of 32.45 across all domains. in contrast, when the domain they cared most about (academics) had a sunk cost structure, they escalated commitment to that domain earning fewer points altogether with a final mean of 24.4 in that domain, and an overall mean of 11.96 across all three domains (compared to 32.45 in the other conditions). discussion results from experiment 1 show that participants will escalate commitment in the face of sunk costs when the choice is an important domain, in this case, academics. when the sunk cost was relationships or friends, participants did not appear to have much difficulty in deescalating time allocation towards that domain. although our intuition is that the importance of the domain leads to the observed results, any number of alternative factors could have led to the difference between the academics condition and the other. for example, participants were college students performing an experiment in an academic building and could very easily have been performing in a manner consistent with presentation bias. additionally, academics is a more quantifiable domain compared to the social domains of relationships and friends, and the labeling of the feedback reflects this difference (see (figure 1). to control for these possible confounding factors, experiment 2 was designed to make the 3 domains more equivalent and comparable. analyses using the other two measures of domain importance are all consistent with the reported results. 10.11588/jddm.2020.1.71968 jddm | 2020 | volume 6 | article 2 | 3 https://doi.org/10.11588/jddm.2020.1.71968 harman et al.: dynamic sunk costs table 2. mean importance ratings for each domain in experiments 1 to 3 (standard error). importance was measured with three different scales, rank order, five-point likert scale, and 100 point allocation. experiment 1 experiment 2 experiment 3 relationship friends academics english history math sociology history psychology rank order 2.2 (.64) 2.4 (.70) 1.2 (.50) 2.2 (.79) 1.7 (.71) 1.9 (.81) 2.5 (.72) 2.2 (.53) 1.2 (.57) importance (1-5) 3.3 (1.1) 3.3 (1.1) 4.5 (.62) 3.2 (1.3) 4.1 (1.0) 3.8 (1.2) 2.7 (1.2) 3.2 (1.0) 4.5 (.99) 100 point allocation 25.8 (11.3) 23.6 (11.5) 48.9 (14.1) 27.2 (11.7) 38.2 (12.6) 34.5 (11.1) 24.0 (13.6) 26.5 (10.3) 49.5 (15.2) experiment 2 to control for differences between domains, this experiment used the same paradigm but with three academic classes as the domains making all feedback on the same scale of grades (see (figure 3). classes were chosen so that there would be no clear a priori differences in domain importance. we predicted that participants in experiment 2 should de-escalate commitment in the face of sunk costs, regardless of which domain was instantiated as the sunk cost domain. methods the same procedure from experiment 1 was used with 72 (44 female, median age 20) undergraduate participants. all participants provided informed consent and received course credit for participating. the three domains in this experiment were three academic classes, history, english, and math. participants were instructed to allocate their studying time among the three classes however they wanted to over the course of 100 simulated days. figure 3. screenshot from experiment 2. results and discussion mean importance ratings show that english was rated less important than history, t (71) = 4.95, p < .001, and math, t (71) = 2.67, p < .01. there was no difference in importance ratings between history and math, t (71) = 2.076, p = .05. (figure 4 shows the mean choice proportion for each domain across trials. in each condition, participants initially allocated more resources to the sunk cost domain (presumably attempting to keep the three domains equal before realizing it was a sunk cost), but decreased their choices for the sunk cost domain as trials progressed. there was a significant interaction between change scores from each domain and the three sunk cost conditions, f (4, 138) = 10.55, p < .001, h2 = 2.15. the decrease in choices for the sunk cost was significant for english (m = -1.42, t (23) = -3.729, p < .001), history (m = -1.33, t (26) = -2.66, p = .013), and math (m = -1.95, t (20) = -4.34, p < .001) in their respective sunk cost conditions. experiment 2 showed that when the three domains were comparable, participants cut their losses in the face of sunk costs. that is, they behaved normatively. we designed the stimuli so that the three domains would be equally important. though the importance ratings differed, this was driven primarily by english being rated less important than history or math. the difference in ratings between history and math was very small and non-significant with no class clearly more important than the other two. to further test the explanation that the importance of the domain is crucial to escalating commitment in the face of sunk costs, experiment 3 was designed to extend both of the previous experiments using three academic classes but manipulating the importance of one. experiment 3 in this experiment we used the academic classes of history, sociology, and psychology. the importance of one domain, psychology, was manipulated by telling participants that “although you would like to have the highest gpa at the end of the quarter, psychology is required for your major and you need to obtain a c or better to avoid retaking the class”. methods with the exception of the additional instructions, experiment 3 followed the same procedure as the first two experiments with 96 undergraduate participants (55 female, median age 20) who provided informed consent and received course credit for participating. 10.11588/jddm.2020.1.71968 jddm | 2020 | volume 6 | article 2 | 4 https://doi.org/10.11588/jddm.2020.1.71968 harman et al.: dynamic sunk costs figure 4. experiment 2 choice frequencies. the mean choice frequency across blocks of ten trials for each domain is displayed for each condition. the sunk cost domain, indicated for each condition above the graph, is represented by the dashed line. results and discussion table 2 lists the mean importance ratings for each domain. the manipulation of domain importance was successful with participants rating psychology as more important than history, t (95) = 10.31, p < .001, and sociology, t (95) = 9.06, p < .001. mean choice frequencies are shown in (figure 5. in the conditions where sociology and history were the sunk cost domain, choices mirrored the results from experiment 2. testing choice frequency change scores against zero show that participants de-escalated commitment for both sociology (m = -1.94, t (32) = -4.31, p < .001) and history (m = -1.94, t (32) = -5.02, p < .001) in their respective sunk cost conditions. in the condition where the manipulated domain, psychology, was the sunk cost, participants escalated commitment, increasing their allocation of time for psychology, m = 1.4, t (32) = 2.99, p < .01. as shown in table 3, the mean value of each domain (essentially their gpa at the end of the game) reflects the optimality of choice strategy. in the sociology and history conditions, participants on average failed the sunk cost domain, but passed their other two classes (the mean scores of 49 and 25 are represented as a grade of c and d respectively during the game), whereas in the psychology condition, where participants escalated commitment in the face of sunk costs, the mean ending values indicate that on average a participant in this condition spent so much time on psychology that he or she failed all three classes. averaging the final status of the three domains for an individual is analogous to a semester gpa. using this measure, participants in the psychology condition scored significantly lower than participants in both the history condition, t (61) = 5.49, p < .01, d = 1.4, and the math condition, t (61) = 6.18, p < .01, d = 1.6. there was no difference between the history and math conditions, t (61) = .393, p = .696, d = .09. general discussion there is a large body of work in judgment and decision making detailing how human choices deviate from what are considered rational or normative standards table 3. final status of each domain in experiment 1-3. the mean (sd) final status (0-100) for each domain (row) is shown in each condition (column). experiment 1 domain sunk cost condition relationship friends academics relationship 2.1 (4.9) 31.9 (33.1) 8.3 (17.7) friends 19.6 (29.2) 2.0 (4.9) 3.2 (9.4) academics 69.8 (30.0) 69.3 (33.8) 24.2 (21.3) experiment 2 sunk cost condition english history math english 1.2 (3.5) 17.3 (21.8) 26.8 (31.1) history 29.8 (21.2) 1.9 (5.3) 31.7 (29.5) math 24.1 (27.5) 21 (26.5) 1.04 (3.0) experiment 3 sunk cost condition sociology history psychology sociology 1.6 (5.9) 24.3 (23.7) 4.9 (12.2) history 25.4 (26.4) 0.48 (2.1) 5.7 (11.1) psychology 46.2 (24.7) 51.9 (20.7) 18.9 (26.1) (e.g. shafir & leboeuf, 2002; sleesman, conlon, mcnamara, & miles, 2012). our work follows, expands, and qualifies some of these basic finding by showing that in repeated choices, people will escalate commitment to a domain in the face of sunk costs when the domain is important. in other situations, individuals are able to cut their losses, de-escalating commitment. in contrast to typical sunk cost studies, the present tasks made the opportunity costs available by participants experiencing decrements in domains not chosen. additionally, this study adds to our knowledge of sunk costs by creating sunk costs that are dynamic (learned over time) and measuring the perceived importance of each domain. this was sufficient information for individuals to make adjustments (de-escalate commitment) in domains that they cared equally about but not when one domain was more important. 10.11588/jddm.2020.1.71968 jddm | 2020 | volume 6 | article 2 | 5 https://doi.org/10.11588/jddm.2020.1.71968 harman et al.: dynamic sunk costs figure 5. experiment 3 choice frequencies. the mean choice frequency across block of ten trials for each domain is displayed for each condition. the sunk cost domain, indicated for each condition above the graph, is represented by the dashed line. clearly, a more important domain provides greater utility per unit of gain, however in the long run those gains could not compensate for the overall decrements observed in other aspects of the situations we created. in the last experiment for example, when the most important course was psychology, participants earned on average only half of the points earned in the other subjects. the only time such a gain would make sense from a utility perspective would be if psychology was weighted 75% or more than the other subjects. even though we were successful at manipulating the importance of the psychology course, the importance ratings do not reflect such an extreme split of weighting. we grant that the measures are imperfect; however, we believe that utility considerations alone do not explain the findings. the motivational reasons behind the unwillingness to give up on an important domain may be traced to individuals’ life history and culture that places great value in not giving up on things that matter. ‘waste not want not’ and ‘finish what you start’ are phrases that capture what children are taught to build character and achieve long term goals (arkes, 1996). additionally, important domains have special emotional significance and could influence feedback processes in dynamic choice tasks leading to choices that reflect recency or primacy (hogarth & einhorn, 1992; gonzález vallejo, et al., 2013). follow up work is needed to more fully explore the processes that lead to dynamic sunk costs in important domains but not others. implications from escalating commitment can be serious, both financially and personally. aging, negative life events, and economic forces are but a few factors that can place people in a position where they would be better off disengaging from an activity once enjoyed (worsch, et al., 2004; dohrenwend & dohrenwend, 1974; held, 1986; wrosch & freund, 2001). for example, new parents may find that they are unable to spend the same amount of time pursuing leisure activities as they once could. beyond external changes, personal choices about time allocation can have detrimental effects on wellbeing. expanded work hours for example have negative effects on marital relations (white & keith, 1990). our results could lead to research on possible interventions designed to improve wellbeing, including making opportunity costs more salient and reframing a problematic domain to minimize its perceived importance. declaration of conflicting interests: the authors declare no conflicts of interest. author contributions: all authors contributed equally to this work. handling editor: wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: harman, j., gonzález-vallejo, c, & vancouver, j. b. (2020). dynamic sunk costs: importance matters when opportunity costs are explicit journal of dynamic decision making, 6, 2. doi:10.11588/jddm.2020.1.71968 received: 16.03.2020 accepted: 17.07.2020 published: 17.12.2020 references arkes, h. r. (1996). the psychology of waste. journal of behavioral decision making, 9(3), 213-224. https://doi.org/10.1002/(sici)10990771(199609)9:3%3c213::aid-bdm230%3e3.0.co;2-1 arkes, h. r. & blumer, c. (1985). the psychology of sunk cost. organizational behavior and human decision processes, 35(1), 124-140. https://doi.org/10.1016/0749-5978(85)90049-4 arkes, h. r. & hutzel, l. (1997). waste heuristics. in m. bazerman, d. messick, a. tenbrunsel, and k. wade-benzoni (eds.), environment, ethics, and behavior: the psychology of environmental valuation and degradation (pp. 154-168). san francisco, 10.11588/jddm.2020.1.71968 jddm | 2020 | volume 6 | article 2 | 6 https://doi.org/10.11588/jddm.2020.1.71968 https://doi.org/10.1002/(sici)1099-0771(199609)9:3%3c213::aid-bdm230%3e3.0.co;2-1 https://doi.org/10.1002/(sici)1099-0771(199609)9:3%3c213::aid-bdm230%3e3.0.co;2-1 https://doi.org/10.1016/0749-5978(85)90049-4 https://doi.org/10.11588/jddm.2020.1.71968 harman et al.: dynamic sunk costs ca: new lexington press. baron, j. (2008) thinking and deciding. new york: cambridge university press. dohrenwend, b. s., & dohrenwend, b. p. (1974). stressful life events: their nature and effects. oxford england: john wiley & sons. gonzález vallejo, c., cheng, j., phillips, n., chimeli, j., bellezza, f., harman, j., lassiter, g. d. and lindberg, m. j. (2013), early positive information impacts final evaluations: no deliberationwithout-attention effect and a test of a dynamic judgment model. journal of behavioral decision making, 27(3), 209-225. https://doi.org/10.1002/bdm.1796 held, t. (1986). institutionalization and deinstitutionalization of the life course. human development, 29(3), 157-162. https://doi.org/10.1159/000337845 hogarth, r. m., & einhorn, h. j. (1992). order effects in belief updating: the belief-adjustment model. cognitive psychology, 24(1), 1-55. https://doi.org/10.1016/0010-0285(92)90002-j northcraft, g. b. & neale, m. a. (1986) opportunity costs and the framing of resource allocation decisions. organizational behavior and human decision processes, 37(3), 348-356. https://doi.org/10.1016/0749-5978(86)90034-8 plous, s. (1993). the psychology of judgment and decision making. mcgraw-hill book company. shafir, e., & leboeuf, r. a. (2002). rationality. annual review of psychology 53(1), 491-517. https://doi.org/10.1146/annurev.psych.53.100901.135213j shin, j. & arieley, d. (2004). keeping doors open: the effect of unavailability on incentives to keep options viable. management science, 50(5), 575-586. https://doi.org/10.1287/mnsc.1030.0148 strough, j., schlosnagle, l., karns, t., lemaster, p., & pichayayothin, n. (2014). no time to waste: restricting life-span temporal horizons decreases the sunk-cost fallacy. journal of behavioral decision making, 27(1), 78-94. https://doi.org/10.1002/bdm.1781 tversky, a., & kahneman, d. (1974). judgment under uncertainty: heuristics and biases. science, 185(4157), 1124-1131. https://doi.org/10.1126/science.185.4157.1124 wrosch, c., & freund, a. m. (2001). self-regulation of normative and non-normative developmental challenges. human development, 44(5), 264-283. https://doi.org/10.1159/000057066 wrosch, c., scheier, m. f., miller, g. e., schulz, r., & carver, c. s. (2003). adaptive self-regulation of unattainable goals: goal disengagement, goal reengagement, and subjective well-being. personality and social psychology bulletin, 29(12), 1494-1508. https://doi.org/10.1177/0146167203256921 white, l., & keith, b. (1990). the effect of shift work on the quality and stability of marital relations. journal of marriage and the family, 52(2), 453-462. https://doi.org/10.2307/353039 10.11588/jddm.2020.1.71968 jddm | 2020 | volume 6 | article 2 | 7 https://doi.org/10.1002/bdm.1796 https://doi.org/10.1159/000337845 https://doi.org/10.1016/0010-0285(92)90002-j https://doi.org/10.1016/0749-5978(86)90034-8 https://doi.org/10.1146/annurev.psych.53.100901.135213 https://doi.org/10.1287/mnsc.1030.0148 https://doi.org/10.1002/bdm.1781 https://doi.org/10.1126/science.185.4157.1124 https://doi.org/10.1159/000057066 https://doi.org/10.1177/0146167203256921 https://doi.org/10.2307/353039 https://doi.org/10.11588/jddm.2020.1.71968 opinion article illuminating divergence in perceptions in natural resource management: a case for the investigation of the heterogeneity in mental models karlijn van den broek heidelberg university much research has been dedicated to map mental models of natural resources to aid effective management of the natural resource. the variety of approaches result in a variety of outputs, but most research in this domain reports mental models that have been aggregated across participants. this results in a misrepresentation of mental models as it overlooks valuable variance in understanding between individuals that could be key in effective decisionmaking. this paper illustrates such variance in mental models through a case study that explored mental models of the nile perch fisheries at lake victoria. this case study suggests that divergence in mental model present a barrier to effective management of the fisheries. hence, this paper proposes avenues to further investigate and report the heterogeneity of mental models between and within individuals. such research uncovers divergence in understanding, which can be addressed to aid decisionmaking in natural resource management. keywords: cognitive maps, decision-making, divergent perceptions, fisheries, lake victoria, method, mental models, natural resource management, stakeholders, system understanding investigating mental models of natural resources mental models are internal constructs that structure anexternal environment, facilitate interpretation and function as an important factor in individual decision making (denzau & north, 1994). these cognitive representations can reflect complex dynamic systems and its functioning, the components of the system (the driving forces) and dynamics. mental models allow a person to describe, explain and predict system states and allow decision-makers to adopt strategies for interaction with that system (rouse & morris, 1986; veldhuyzen & stassen, 1977). a comprehensive literature is available on mental models in relation to natural resources. these include people’s mental models of climate change and vulnerability to natural hazards (amelung, fischer, kruse, & sauerborn, 2016; bostrom, 2016; dutt & gonzalez, 2012; gigerenzer & gaissmaier, 2011; halbrendt et al., 2014; henly-shepard, gray, & cox, 2015; kumar & dutt, 2018; leiserowitz, smith, & marlon, 2010; otto-banaszak, matczak, wesseler, & wechsung, 2011; sterman, 2008; tschakert & sagoe, 2009; weber, 2006), agricultural dynamics (gray et al., 2015; halbrendt et al., 2014; hoffman, lubell, & hillis, 2014; vanwindekens, baret, & stilmant, 2014), water management (jones, ross, lynam, & perez, 2014; kolkman, kok, & van der veen, 2005; lynam et al., 2012), forest management (kearney, bradley, kaplan, r., & kaplan, s., 1999; tikkanen, isokääntä, pykäläinen, & leskinen, 2006), eutrophication (cloern, 2001; janssen, 2001), lake ecosystems (downing et al., 2014; hobbs et al., 2016), and fisheries (garavitobermúdez, lundholm, & crona, 2016; gray, chan, clark, & jordan, 2012; gray, hilsberg, mcfall, & arlinghaus, 2015; henly-shepard et al., 2015; li, gray, & sutton, 2016; radomski & goeman, 1996). mental models are assessed through a range of methods including (semi-structured) interviews with open-ended questions (abel, ross, & walker, 1998; findlater, donner, satterfield, & kandlikar, 2018; garavito-bermudez, 2018; jones, ross, lynam, & perez, 2014; otto-banaszak, matczak, wesseler, & wechsung, 2011); the fuzzy cognitive mapping approach in which participants draw a cognitive map reflecting the dynamic processes of the subject at hand (gray et al., 2015; henly-shepard et al., 2015; özesmi & özesmi, 2003; tschakert & sagoe, 2009); the conceptual content cognitive map, in which concepts are identified and organised among certain dimensions (kearney, & kaplan, 1999) and the ardi method in which participants identify the actors, resources, dynamics and their interactions (etienne, du toit, & pollard, 2011; mathevet, etienne, lynam, & calvet, 2011). most of these approaches are employed with groups of participants co-constructing the representation of the dynamic system of the natural resource, while few researchers have applied these methods on an individual level (findlater et al., 2018; gray et al., 2015; jones et al., 2014; otto-banaszak et al., 2011). the outcomes of such methods are also presented in different ways. while only few studies measure mental models on an individual level, even fewer report mental models at this level. the latter type of research presents selected individual cognitive maps (findlater et al., 2018) or describes the interviews and concepts generated per person (otto-banaszak et al., 2011). however, the majority of research on mental models, including most of the research that measured mental models on an individual level, aggregate the mental models across their participants. research that presents mental models for different groups (e.g. different stakeholders) present separate cognitive maps (abel, ross, & walker, 1998), statistics of the mental models corresponding author: karlijn van den broek, alfred-weber-institute for economics, heidelberg university, bergheimer straße 58, de 69115 heidelberg, e-mail: karlijn.vandenbroek@awi.uni-heidelberg.de 10.11588/jddm.2018.1.51316 jddm | 2018 | volume 4 | article 2 | 1 mailto:karlijn.vandenbroek@awi.uni-heidelberg.de https://doi.org/10.11588/jddm.2018.1.51316 van den broek: heterogeneity in mental models. (findlater et al., 2018; gray et al., 2015; mathevet et al., 2011) or a textual description of the patterns in the mental models (garavito-bermudez, 2018). in such studies, qualitative interview data is coded (through word-search or consensus analysis) into existing categories (findlater, donner, satterfield, & kandlikar, 2018) or into a coding system derived from the data (mathevet et al., 2011), which in turn allows for a statistical description of the models. another popular approach is to combine the responses from all participants (and thus all groups) into one allcompassing model. such an aggregated model includes all the variables that were initiated by each participant (gray et al., 2015; gray et al., 2012; mathevet et al., 2011; tschakert & sagoe, 2009). individual cognitive maps of environmental issues have been found to include 23 variables on average and combining just 20 individual mental models may result in a collective mental model that includes 120 variables (özesmi & özesmi, 2004). such complex models are hardly helpful for decision-makers that want to identify opportunities to better manage the natural resource. therefore, in this paper we propose that mental model data can be better exploited by considering the variance in mental models across individuals. exploring mental models of nile perch fish stock at lake victoria this paper presents a case study of an exploration of the heterogeneity of mental models of the nile perch fish stock among different stakeholders at lake victoria. a thorough stakeholder analysis resulted in a sample consisting of 76 participants from 33 different institutions, in uganda, kenya and tanzania. these included 9 governmental organisations, 9 ngo’s, 5 business organisations, 3 research institutions and 7 community groups. to ensure a wide variety of approaches, matching the exploratory approach, mental models were assessed though a combination of interviews and cognitive mapping, as well as on an individual as group level. the interactions with the stakeholders during field trips at lake victoria showed great heterogeneity in their mental models in terms of 1) the state of the nile perch stock, and 2) the causes of changes to the stock. the issue in relation to the nile perch fishery was characterised differently and at different degrees of specificity among stakeholders. while most participants reported that the nile perch stock had decreased, others thought that the nile perch stock had increased. some stakeholders reported that the nile perch fish stock had declined whilst others mentioned a reduction in fish catch. still others reported that the reduction in catch was specific to mature nile perch fit for export. not only did stakeholders provide different accounts of this problem, heterogeneity was also apparent in perceptions of the drivers of changes in fish catch. examples of the drivers discussed include: fishing pressure, illegal fishing, climate change, fishing in breeding grounds, presence of water hyacinth, floods, growing populations, local demand for immature nile perch, corruption, the open access nature of the lake, commercialization of the fishing industry in the region, a lack of enforcement, and a lack of ownership or responsibility to conserve. see figure 1 for an example of a cognitive map drawn by a group of fishers. similarly, discussions with stakeholders about the future of the nile perch stock demonstrated diverse views. some stakeholders were convinced the nile perch stock was steadily increasing, some perceived the stock to be highly volatile while others assumed a stable stock flow, and some experienced the stock to be decreasing rapidly. stakeholders who believed that the nile perch would decrease rapidly also strongly differed in the envisioned period until a tipping point. whilst some expected this to happen as soon as in the next five years, others expected a 50-year period. these differences in mental models may be highly problematic in terms of collaboration between stakeholders toward the management of the lake’s resources. indeed, in the discussions with the stakeholders, different types of stakeholders (including fishing communities, businesses, and governments) emphasized that there was insufficient collaboration between the stakeholders to manage the lake’s resources. misrepresentations of mental models from this exploratory work, it is clear that mental models of dynamic natural resource systems may differ widely across stakeholders. conservation issues may be interpreted differently, including the driving forces and processes that lead to the issue. since it is likely that these differences in perceptions prohibit effective decision-making between stakeholders to manage the natural resource, it is this difference between individuals that is of interest. that is, differences in mental models may underpin challenges in natural resource management. nevertheless, it is this variance in mental models between individuals that is often overlooked in mental model research. many approaches in the mental model literature report aggregated models, including the elicitation of mental models in group settings and the aggregation of individual models. however, it is unlikely that such an aggregated model can be found in a single participant. this assumption of homogenous models therefore results in a misrepresentation of mental models in the natural resource literature. mental models are often elicited to demonstrate how a certain dynamic system works and to directly infer management solutions from the mental models. for example, the fuzzy cognitive mapping approach is often used to conduct a scenario analysis, which is to inform decisionmaking to address the conservation issue (gray, gray, et al., 2015; gray, hilsberg, et al., 2015). such approaches assume that the participants will (jointly) produce a mental model that reflects the processes accurately. however, the divergence in mental models suggests that it is unlikely that all participants participants (individually or jointly) will have an accurate understanding of the system. the mental model approach could, alternatively, provide an opportunity to map out differences in understanding between individuals, thereby illuminating the divergence of the perception of the environmental problem. investigating heterogeneity of mental models to address dynamic natural resource issues many of the current approaches in mental model research disregard valuable information by not inspecting the variance in mental models that can underpin challenges in decision-making and addressing the conservation issues effectively. investigating this heterogeneity in mental models may therefore be key to improve decision-making processes. that is, divergence in mental models has been found to 10.11588/jddm.2018.1.51316 jddm | 2018 | volume 4 | article 2 | 2 https://doi.org/10.11588/jddm.2018.1.51316 van den broek: heterogeneity in mental models. overpopula tion fishing regulations corruption high demand poverty use of illegal fishing gear availability of illegal gear overfishing water pollution reduced water level migration of fishers use of poison for fishing nile perch stock large fish industry use of artificial fish feed figure 1. cognitive map created by a group 6 ugandan fishers. affect communication processes between decision-makers (blickensderfer, cannon-bowers, & salas, 1997; marks, zaccaro, & mathieu, 2000; waller, gupta, & giambatista, 2004), coordination among decision-makers (marks, sabella, burke, & zaccaro, 2002), collective efficacy (the belief among group members that the required action can be organised and executed; mathieu, rapp, maynard, & mangos, 2010) and strategy implementation (gurtner, tschan, semmer, & nägele, 2007). convergence between mental models of individuals within a group can be even more important than the accuracy of their mental models for group performance. for example, in a study where basketball players rated the effectiveness of strategic actions for basketball scenario’s (which had been rated by subject matter experts), the accuracy of the team members (their agreement with the subject matter experts) could not predict the team’s last season’s performance while the agreement between participants on the actions did (webber, chen, payne, marsh, & zaccaro, 2000). mapping out the heterogeneity in understanding can therefore provide a first step to enhance convergence in mental models between individuals to aid decision-making. furthermore, the identification of the divergence of mental models facilitates tailoring conservation campaigns to the stakeholder’s mental model, since messages tailored to recipient’s characteristics are most effective (van den broek, bolderdijk & steg, 2017). the field of natural resource management would particularly benefit from this, as conservation of natural resources requires the collaboration of diverse stakeholders. heterogeneity in mental models can be measured by examining the variance (standard deviations, ranges etc.) in the complexity of the mental models (number of variables included, number of links included, the ratio of these two, the density) and concepts in the mental models (variance in central, forcing and receiving variables across participants; gray, gray, et al., 2015). for example, variables from all mental models can be listed, and it can be indicated how frequently they were included in an individual mental model. furthermore, individual mental models that together represent the heterogeneity of the mental models can be reported. reporting such findings in addition to communalities across mental models (e.g. mean number of variables, most common links) will ensure the presentation of a complete picture of the heterogeneity of the mental models. an aggregated mental model of the perceptions of the nile perch stock at lake victoria would have disregarded key nuances. such an aggregated model would only include 4.4 concepts, with 4.5 links. the typical mental model would show that the stakeholders think the nile perch stock has declined, and that this is due to corruption, which is linked to the use of illegal fishing gear, climate change and water pollution. however, when we consider the full range of mental models, we see that stakeholders have diverse perceptions of the causes for the decline of the nile perch. the number of concepts included ranges from 1 to 16, with a standard deviation of 3.4, and the number of links ranges from 1 to 19, with a standard deviation of 4.0. inspecting the variety of the concepts and links, we now see that some stakeholders focus on the responsibility of the fisher (attributing the use of illegal gear to a lack of awareness, or a lack of ownership of the lake’s resources), or the consumer (high demand for nile perch leading to overfishing), or the government (lack of monitoring, effective policy), still others focus on demographic factors ( overpopulation leading to overfishing, poverty causing fishers to use illegal fishing gear). such divergence in perceptions may be explained by a number of individual differences between stakeholders. research has shown that differences in the number of target species of fishers, and their dependency on the species, influences fishers’ perception of the ecosystem structure, and the complexity of mental model of the ecosystem (garavito-bermúdez, 2018; garavito-bermúdez et al., 2016; gray, hilsberg, mcfall, & arlinghaus, 2015). moreover, many stakeholders expressed that they expect significant differences in mental models between migratory fishers and indigenous fishers because of differences in perceived ownership of the natural resources between the two groups. furthermore, research has demonstrated that eliciting a mental model near the natural resource results in more specific mental models with lower density compared to elicitation practices that are conducted at people’s homes, which were more generic and more dense (jones et al., 2014). similarly, the interaction with the lake (type of interaction, frequency), is likely to affect mental models and may cause systematic difference between stakeholder groups. knowing which factors underpin such variance may provide an indication on how to harmonize mental models to aid decision making. mental models of complex systems inevitably leave room 10.11588/jddm.2018.1.51316 jddm | 2018 | volume 4 | article 2 | 3 https://doi.org/10.11588/jddm.2018.1.51316 van den broek: heterogeneity in mental models. for disagreement, but few studies report the variance in mental models of their sample, and the heterogeneity of mental models has therefore not yet received sufficient research attention. such research would demonstrate the divergence in understanding, which can then be addressed to aid decision-making among individuals. besides this heterogeneity in mental models between individuals, it is also important to investigate the variance in mental models within individuals. that is, future research should also consider the development of the mental models. little research on natural resource mental models has investigated if these mental models are static, or change over time. since the latter is more likely due to changing environments and the updating of mental models with new information, it is important to understand how these mental models change and how this affects decision-making. through repeated measures of mental models, the stable components of mental models can be distinguished from the dynamic components. such research would further our understanding of the heterogeneity of mental models that inform decision-making processes. acknowledgements: the author would like to thank the multitip team for their contributions to the development of this project. in particular, the author would like to thank prof. funke for inspiring this paper and his contributions to the conceptualization of this paper. this project was funded by the bundesministerium für bildung und forschung. declaration of conflicting interests: the author declares that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: van den broek, k. (2018). illuminating divergence in perceptions in natural resource management: a case for the investigation of the heterogeneity in mental models. journal of dynamic decision making, 4, 2. doi:10.11588/10.11588/jddm.2018.1.51316 received: 23 aug 2018 accepted: 28 nov 2018 published: 07 dec 2018 references abel, n., ross, h., & walker, p. (1998). mental models in rangeland research, communication and management. the rangeland journal, 20(1), 77–91. doi:10.1071/rj9980077 amelung, d., fischer, h., kruse, l., & sauerborn, r. (2016). defogging climate change communication: how cognitive research can promote effective climate communication. frontiers in psychology, 7(1340), 1–4. doi:10.3389/fpsyg.2016.01340 blickensderfer, e., cannon-bowers, j. a., & salas, e. (1997). training teams to self-correct: an empirical investigation. in 12th annual meeting of the society of industrial and organizational psychology. st. louis, mo. bostrom, a. (2016). mental models and risk perceptions related to climate change. oxford research encyclopedia of climate science, 1–31. doi:10.1093/acrefore/9780190228620.013.303 cloern, j. e. (2001). our evolving conceptual model of the coastal eutrophication problem. marine ecology progress series, 210, 223–253. doi:10.3354/meps210223 denzau, a. t., & north, d. c. (1994). shared mental models: ideologies and institutions. kyklos, 47(1), 3–31. doi:10.1111/j.1467-6435.1994.tb02246.x downing, a. s., van nes, e. h., balirwa, j. s., beuving, j., bwathondi, p., chapman, l. j., . . . mooij, w. m. (2014). coupled human and natural system dynamics as key to the sustainability of lake victoria’s ecosystem services. ecology and society, 19(4): 31. doi:10.5751/es-06965-190431 dutt, v., & gonzalez, c. (2012). decisions from experience reduce misconceptions about climate change. journal of environmental psychology, 32(1), 19–29. doi:10.1016/j.jenvp.2011.10.003 etienne, m., du toit, d. r., & pollard, s. (2011). ardi: a co-construction method for participatory modeling in natural resources management. ecology and society, 16(1): 44. doi:10.5751/es-03748-160144 findlater, k. m., donner, s. d., satterfield, t., & kandlikar, m. (2018). integration anxiety: the cognitive isolation of climate change. global environmental change, 50(september 2017), 178–189. doi:10.1016/j.gloenvcha.2018.02.010 garavito-bermúdez, d. (2018). learning ecosystem complexity: a study on small-scale fishers’ ecological knowledge generation. environmental education research, 24(4), 625–626. doi:10.1080/13504622.2016.1269877 garavito-bermúdez, d., lundholm, c., & crona, b. (2016). linking a conceptual framework on systems thinking with experiential knowledge. environmental education research, 22(1), 89–110. doi:10.1080/13504622.2014.936307 gigerenzer, g., & gaissmaier, w. (2011). heuristic decision making. annual review of psychology, 62(1), 451–482. doi:10.1146/annurev-psych-120709-145346 gray, s. a., gray, s., de kok, j. l., helfgott, a. e. r., o’dwyer, b., jordan, r., & nyaki, a. (2015). using fuzzy cognitive mapping as a participatory approach to analyze change, preferred states, and perceived resilience of social-ecological systems. ecology and society, 20(2): 11. doi:10.5751/es-07396-200211 gray, s., chan, a., clark, d., & jordan, r. (2012). modeling the integration of stakeholder knowledge in social – ecological decision-making: benefits and limitations to knowledge diversity. ecological modelling, 229, 88–96. doi:10.1016/j.ecolmodel.2011.09.011 gray, s., hilsberg, j., mcfall, a., & arlinghaus, r. (2015). the structure and function of angler mental models about fish population ecology: the influence of specialization and target species. journal of outdoor recreation and tourism, 12, 1–13. doi:10.1016/j.jort.2015.09.001 gurtner, a., tschan, f., semmer, n. k., & nägele, c. (2007). getting groups to develop good strategies: effects of reflexivity interventions on team process, team performance, and shared mental models. organizational behavior and human decision processes, 102(2), 127–142. doi:10.1016/j.obhdp.2006.05.002 halbrendt, j., gray, s. a., crow, s., radovich, t., kimura, a. h., & tamang, b. b. (2014). differences in farmer and expert beliefs and the perceived impacts of conservation agriculture. global environmental change, 28(1), 50–62. doi:10.1016/j.gloenvcha.2014.05.001 10.11588/jddm.2018.1.51316 jddm | 2018 | volume 4 | article 2 | 4 https://doi.org/10.11588/jddm.2018.1.51316 https://doi.org/10.1071/rj9980077 https://doi.org/10.3389/fpsyg.2016.01340 https://doi.org/10.1093/acrefore/9780190228620.013.303 https://doi.org/10.3354/meps210223 https://doi.org/10.1111/j.1467-6435.1994.tb02246.x https://doi.org/10.5751/es-06965-190431 https://doi.org/10.1016/j.jenvp.2011.10.003 https://doi.org/10.5751/es-03748-160144 https://doi.org/10.1016/j.gloenvcha.2018.02.010 https://doi.org/10.1080/13504622.2016.1269877 https://doi.org/10.1080/13504622.2014.936307 https://doi.org/10.1146/annurev-psych-120709-145346 https://doi.org/10.5751/es-07396-200211 https://doi.org/10.1016/j.ecolmodel.2011.09.011 https://doi.org/10.1016/j.jort.2015.09.001 https://doi.org/10.1016/j.obhdp.2006.05.002 https://doi.org/10.1016/j.gloenvcha.2014.05.001 https://doi.org/10.11588/jddm.2018.1.51316 van den broek: heterogeneity in mental models. henly-shepard, s., gray, s. a., & cox, l. j. (2015). the use of participatory modeling to promote social learning and facilitate community disaster planning. environmental science and policy, 45, 109–122. doi:10.1016/j.envsci.2014.10.004 hobbs, b. f., ludsin, s. a., knight, r. l., ryan, p. a., ciborowski, j. j. h., & ciborowski, j. a. n. j. h. (2016). fuzzy cognitive mapping as a tool to define management objectives for complex ecosystems. ecological applications, 12(5), 1548–1565. doi:10.2307/3099990 hoffman, m., lubell, m., & hillis, v. (2014). linking knowledge and action through mental models of sustainable agriculture. proceedings of the national academy of sciences, 111(36), 13016–13021. doi:10.1073/pnas.1400435111 janssen, m. a. (2001). an exploratory integrated model to assess management of lake eutrophication. ecological modelling, 140(1–2), 111–124. doi:10.1016/s0304-3800(01)00260-5 jones, n. a., ross, h., lynam, t., & perez, p. (2014). eliciting mental models: a comparison of interview procedures in the context of natural resource management. ecology and society, 19(1): 13. doi:10.5751/es-06248-190113 kearney, a. r., bradley, g., kaplan, r., & kaplan, s. (1999). stakeholder perspectives on appropriate forest management in the pacific northwest. forest science, 45(1), 62–73. kolkman, m. j., kok, m., & van der veen, a. (2005). mental model mapping as a new tool to analyse the use of information in decision-making in integrated water management. physics and chemistry of the earth, 30(4–5 spec. iss.), 317– 332. doi:10.1016/j.pce.2005.01.002 kumar, m., & dutt, v. (2018). experience in a climate microworld: influence of surface and structure learning, problem difficulty, and decision aids in reducing stock-flow misconceptions. frontiers in psychology, 9(mar), 1–19. doi:10.3389/fpsyg.2018.00299 leiserowitz, a., smith, n., & marlon, j. r. (2010). americans’ knowledge of climate change. yale project on climate change communication. retrieved from http://environment.yale.edu/climate/files/ climatechangeknowledge2010.pdf%5cnpapers3://publication/ uuid/230ca706-26c2-4156-8133-1ac9e7701e6a li, o., gray, s. a., & sutton, s. g. (2016). mapping recreational fishers’ informal learning of scientific information using a fuzzy cognitive mapping approach to mental modelling. fisheries management and ecology, 23(3–4), 315–329. doi:10.1111/fme.12174 lynam, t., mathevet, r., etienne, m., stone-jovicich, s., leitch, a., jones, n., . . . perez, p. (2012). waypoints on a journey of discovery: mental models in humanenvironment interactions. ecology and society, 17(3): 23. doi:10.5751/es-05118-170323 marks, m. a., zaccaro, s. j., & mathieu, j. e. (2000). performance implications of leader briefings and team-interaction training for team adaptation to novel environments. journal of applied psychology, 85(6), 971. doi:10.1037//0021-9010.85.6.971 marks, m. a., sabella, m. j., burke, c. s., & zaccaro, s. j. (2002). the impact of cross-training on team effectiveness. journal of applied psychology, 87(1), 3–13. doi:10.1037/0021-9010.87.1.3 mathevet, r., etienne, m., lynam, t., & calvet, c. (2011). water management in the camargue biosphere reserve: insights from comparative mental models analysis. ecology and society, 16(1): 43. doi:10.5751/es-04007-160143 mathieu, j. e., rapp, t. l., maynard, m. t., & mangos, p. m. (2010). interactive effects of team and task shared mental models as related to air traffic controllers’ collective efficacy and effectiveness. human performance, 23(1), 22–40. doi:10.1080/08959280903400150 otto-banaszak, i., matczak, p., wesseler, j., & wechsung, f. (2011). different perceptions of adaptation to climate change: a mental model approach applied to the evidence from expert interviews. regional environmental change, 11(2), 217–228. doi:10.1007/s10113-010-0144-2 özesmi, u., & özesmi, s. (2003). a participatory approach to ecosystem conservation: fuzzy cognitive maps and stakeholder group analysis in uluabat lake, turkey. environmental management, 31(4), 518–531. doi:10.1007/s00267-002-2841-1 özesmi, u., & özesmi, s. l. (2004). ecological models based on people’s knowledge: a multi-step fuzzy cognitive mapping approach. ecological modelling, 176(1–2), 43–64. doi:10.1016/j.ecolmodel.2003.10.027 radomski, p. j., & goeman, t. j. (1996). decision making and modeling in freshwater sport-fisheries management. fisheries, 21(12), 14–21. doi:10.1577/15488446(1996)021<0014:dmamif>2.0.co;2 rouse, w., & morris, n. (1986). on looking into the black box: prospects and limits in the search for mental models. psychological bulletin, 100(3), 349–363. doi:10.1037//0033-2909.100.3.349 sterman, j. d. (2008). risk communication on climate: mental models and mass balance. science, 322(5901), 532–533. doi:10.1126/science.1162574 tikkanen, j., isokääntä, t., pykäläinen, j., & leskinen, p. (2006). applying cognitive mapping approach to explore the objective-structure of forest owners in a northern finnish case area. forest policy and economics, 9(2), 139–152. doi:10.1016/j.forpol.2005.04.001 tschakert, p., & sagoe, r. (2009). mental models: understanding the causes and consequences of climate change. in communitybased adaptation to climate change (pp. 154–159). retrieved from http://www.iadb.org/intal/intalcdi/pe/2010/04833 .pdf#page=156 van den broek, k., bolderdijk, j. w., & steg, l. (2017). individual differences in values determine the relative persuasiveness of biospheric, economic and combined appeals. journal of environmental psychology, 53, 145–156. 145–156. doi:10.1016/j.jenvp.2017.07.009 vanwindekens, f. m., baret, p. v., & stilmant, d. (2014). a new approach for comparing and categorizing farmers’ systems of practice based on cognitive mapping and graph theory indicators. ecological modelling, 274, 1–11. doi:10.1016/j.ecolmodel.2013.11.026 veldhuyzen, w., & stassen, h. g. (1977). the internal model concept: an application to modeling human control of large ships. human factors, 19(4), 367–380. doi:10.1177/001872087701900405 waller, m. j., gupta, n., & giambatista, r. c. (2004). effects of adaptive behaviors and shared mental models on control crew performance. management science, 50(11), 1534–1544. doi:10.1287/mnsc.1040.0210 webber, s. s., chen, g., payne, s. c., marsh, s. m., & zaccaro, s. j. (2000). enhancing team mental model measurement with performance appraisal practices. organizational research methods, 3(4), 307–322. doi:10.1177/109442810034001 weber, e. u. (2006). experience-based and description-based perceptions of long-term risk: why global warming does not scare us (yet). climatic change, 77(1–2), 103–120. doi:10.1007/s10584006-9060-3 10.11588/jddm.2018.1.51316 jddm | 2018 | volume 4 | article 2 | 5 https://doi.org/10.1016/j.envsci.2014.10.004 https://doi.org/10.2307/3099990 https://doi.org/10.1073/pnas.1400435111 https://doi.org/10.1016/s0304-3800(01)00260-5 https://doi.org/10.5751/es-06248-190113 https://doi.org/10.1016/j.pce.2005.01.002 https://doi.org/10.3389/fpsyg.2018.00299 http://environment.yale.edu/climate/files/climatechangeknowledge2010.pdf%5cnpapers3://publication/uuid/230ca706-26c2-4156-8133-1ac9e7701e6a http://environment.yale.edu/climate/files/climatechangeknowledge2010.pdf%5cnpapers3://publication/uuid/230ca706-26c2-4156-8133-1ac9e7701e6a http://environment.yale.edu/climate/files/climatechangeknowledge2010.pdf%5cnpapers3://publication/uuid/230ca706-26c2-4156-8133-1ac9e7701e6a https://doi.org/10.1111/fme.12174 https://doi.org/10.5751/es-05118-170323 https://doi.org/10.1037//0021-9010.85.6.971 https://doi.org/10.1037/0021-9010.87.1.3 https://doi.org/10.5751/es-04007-160143 https://doi.org/10.1080/08959280903400150 https://doi.org/10.1007/s10113-010-0144-2 https://doi.org/10.1007/s00267-002-2841-1 https://doi.org/10.1016/j.ecolmodel.2003.10.027 https://doi.org/10.1577/1548-8446(1996)021<0014:dmamif>2.0.co;2 https://doi.org/10.1577/1548-8446(1996)021<0014:dmamif>2.0.co;2 https://doi.org/10.1037//0033-2909.100.3.349 https://doi.org/10.1126/science.1162574 https://doi.org/10.1016/j.forpol.2005.04.001 http://www.iadb.org/intal/intalcdi/pe/2010/04833.pdf#page=156 http://www.iadb.org/intal/intalcdi/pe/2010/04833.pdf#page=156 https://doi.org/10.1016/j.jenvp.2017.07.009 https://doi.org/10.1016/j.ecolmodel.2013.11.026 https://doi.org/10.1177/001872087701900405 https://doi.org/10.1287/mnsc.1040.0210 https://doi.org/10.1177/109442810034001 https://doi.org/10.1007/s10584-006-9060-3 https://doi.org/10.1007/s10584-006-9060-3 https://doi.org/10.11588/jddm.2018.1.51316 original research exploration and exploitation during information search and consequential choice cleotilde gonzalez1 and varun dutt2 1dynamic decision making laboratory, department of social and decision sciences, carnegie mellon university, pittsburgh, pa, usa and 2school of computing and electrical engineering and school of humanities and social sciences, indian institute of technology mandi, india before making a choice we often search and explore the options available. for example, we try clothes on before selecting the one to buy and we search for career options before deciding a career to pursue. although the exploration process, where one is free to sample available options is pervasive, we know little about how and why humans explore an environment before making choices. this research contributes to the clarification of some of the phenomena that describe how people perform search during free sampling: we find a gradual decrease of exploration and, in parallel, a tendency to explore and choose options of high value. these patterns provide support to the existence of learning and an exploration-exploitation tradeoffs that may occur during free sampling. thus, exploration in free sampling is not led by the purely epistemic value of the available options. rather, exploration during free sampling is a learning process that is influenced by memory effects and by the value of the options available, where participants pursue options of high value more frequently. these parallel processes predict the consequential choice. keywords: choice, decisions from experience, explorationexploitation, sampling, instance-based learning theory an important aspect of decision-making in many dailysituations involves a process of exploration of the available options before making a choice for real. such is the case when we search for information on the web before making a purchase (pirolli & card, 1999), when we search around for possible partners before making a dating selection (todd, penke, fasolo, & lenton, 2007), and when a radiologist examines a scan of a patient for possible diagnosis before deciding the treatment (wolfe, 2012). despite the relevance of the exploration process in many naturalistic tasks, we know relatively little about how and why humans explore an environment and how the information obtained from exploration is used in making choices. this research contributes to clarifying some of the aspects of search and exploration in experiential binary choice. in principle, a rational explorer should sample all available options for as long as possible before making a choice given that new information may be collected from exploration, which is expected to lead to better choices. however, previous studies involving free sampling in binary choice reveal at least five patterns of exploration that do not conform to the rational explorer: (1) people rely on surprisingly small samples (hertwig & pleskac, 2008, 2010); (2) they tend to sample more when higher and more variable payoffs are involved (hau, pleskac, kiefer, & hertwig, 2008; mehlhorn, ben-asher, dutt, & gonzalez, 2014); (3) they follow generally two exploration policies (piecewise or comprehensive strategies) (hills & hertwig, 2010); (4) they reduce their rate of exploration over time (gonzalez & dutt, 2011, 2012); and (5) they tend to choose the option that they sampled more often (gonzalez & dutt, 2012). the main contribution of the current research is the clarification of the relationship between the rate of exploration over time and the tendency to explore and choose the high value option that would lead to the best result. in a binary choice task with free sampling, we demonstrate that a reduction in exploration occurs in parallel with a tendency to select an option with the higher experienced mean more often, regardless of the exploration policy that participants take. free sampling and the exploration-exploitation tradeoff in the study of decisions from experience, researchers have developed an experimental paradigm to study the process of exploration and subsequent choice in a binary task. the paradigm, sampling paradigm (see figure 1), provides a way for participants to explore the two options freely, for as long as they desire and in the order they desire, before making one choice for real (camilleri & newell, 2011; hertwig & erev, 2009; rakow & newell, 2010). although most of the studies in this paradigm have concentrated on highlighting the choice after exploration to contrast with traditional choice from description (e.g., hertwig, barron, weber, & erev, 2004), this paradigm opens a window for investigating the behavior, processes, and strategies that people pursue during exploration, before making a choice. the exploration rate in this task has been found to decrease over an increasing number of repeated samples (gonzalez & dutt, 2011, 2012; teodorescu & erev, 2014); and, people tend to choose the option that they sampled more frequently and more recently (gonzalez & dutt, 2011, 2012). a few studies have suggested that the decrease in exploration rate might be related to the process of discovering an option that maximizes outcomes (gonzalez & dutt, 2011, 2012); while some find a more extreme effect: that a decrease of exploration occurs even when it is most optimal to keep exploring (teodorescu & erev, 2014). hills and hertwig (2012) questioned the robustness of the observation in gonzalez and dutt (2011) about the reduction of exploration rate over time and their suggestion corresponding author: cleotilde gonzalez, dynamic decision making laboratory, carnegie mellon university, pittsburgh, pa 15213, usa. email: coty@cmu.edu 10.11588/jddm.2016.1.29308 jddm | 2016 | volume 2 | article 2 | 1 mailto:coty@cmu.edu https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.29308 gonzalez & dutt: exploration and exploitation during sampling figure 1. the sampling paradigm of decisions from experience. during the sampling phase people select options freely. by selecting an option an outcome is drawn from a distribution, presented as a result. the figure shows a problem of choice between two options a: a .8 chance of earning $4 and .2 chance of earning $0; and b: earning $3 for sure. participants first sample the two options a and b to discover their values and once they are satisfied with the information they choose one of the two options (a or b) for real. that participants explore options that corresponded to the highest value (sampling-h). the heart of their argument is that in the sampling paradigm, an impression of reduced exploration over time (alternation rate, a-rate, in binary choice) is produced by an inverse relationship between the sample size and the a-rate, and the aggregation of participants with different sample sizes. gonzalez and dutt (2012) showed that the reduction of a-rate during sampling occurs even when sample length is controlled for and that people tend to explore and choose according to the value of the options. they argued that a decrease of exploration in the sampling paradigm might be related to an implicit goal of discovering which of the two options maximizes rewards, suggesting that an exploration-exploitation tradeoff may be occurring during free sampling. what happens during free sampling is still unclear. one possibility is that exploration is simply random (hertwig & erev, 2009; rakow & newell, 2010). the assumption of a random search is very reasonable for a rational explorer that wants to maximize the information obtained, and it is very commonly used in a large variety of cognitive models attempting to account for the choice after sampling (see gonzalez and dutt, 2011 for a review of these models). however, this assumption of randomness does not explain the patterns of exploration found in human data (fiedler, 2000; fiedler & kareev, 2006; gonzalez & dutt, 2011). a random sampling assumption would presuppose a stability of other factors like learning from the sampling process and being influenced by memory effects (e.g., frequency and recency of experienced outcomes). some argue that in the sampling paradigm there cannot be explorationexploitation tradeoffs because the sampling process is separated from choice and it can only be used to obtain information without any concerns about costs and rewards but with the only purpose of informing the consequential choice after sampling (hills & hertwig, 2012). psychologists would generally suggest that more informed decisions are a result of larger sample sizes (i.e., the value of information increases with more samples) (fiedler, 2000). thus, a strong and robust finding that people draw small samples (a median number between 11 and 19 times) before making a choice (e.g., gonzalez & dutt, 2011; hau et al., 2008; hertwig & erev, 2009; hills & hertwig, 2010) make the "information acquisition" possibility less likely. if exploration was used to obtain information without concerns about identifying the option that provides the maximum rewards, the number of samples would be larger. however, the search process during sampling is, in fact, costly (fiedler & kareev, 2006; kareev, 2000; hau et al., 2008), and there might be some advantages to fewer samples. studies have shown that fewer samples may render the choice simpler and surprisingly good (hertwig & pleskac, 2008, 2010), because fewer samples lead to larger initial differences between two options being considered, compared to the differences given by their objective probabilities (i.e., the “amplification effect”). furthermore, this differentiation between the two options with small samples may ease the choice process and may lead to choices that, although are not optimal, are good enough (hertwig & pleskac, 2008, 2010). hau et al. (2008) demonstrated that people consider and account for perceived costs during sampling. their studies show that sampling is costly in terms of opportunity costs. for example, sampling might take time during which people cannot pursue other activities. furthermore, they demonstrated that people consider the magnitude of outcomes when deciding whether or not to continue sampling the options: when the values of the outcomes were increased (resulting in higher opportunity costs from not choosing the option with the higher expected value), the sample size doubled (a median of 33) compared to the same problems in hertwig et al. (2004). thus, the amount of search does depend on the value of the outcomes involved. furthermore, gonzalez and dutt (2012) found a pattern of decreased exploration with increased sampling in hau et al.’s (2008) data. this pattern occurred at the average and individual participant levels. they demonstrated that the patterns of decreased exploration during sampling occur regardless of the sample length and that the frequency of sampling-h was indicative of the final choice. these patterns of reduced exploration in a sampling paradigm are very similar to those found in consequential choice paradigms, leading the authors (gonzalez & dutt, 2011, 2012) to suggest the presence of an exploration-exploitation tradeoff during free sampling similar to that found in consequential choice (biele, erev, & eyal, 2009; camilleri & newell, 2011; gonzalez & dutt, 2011; mehlhorn et al., 2014). the studies reviewed above provide support for the idea that people explore options during free sampling in a way that the process is led by the economic value of the options rather than by the pure epistemic value. that is, the search process may serve to discover and pursue the maximizing option, and then the patterns of decreased exploration rate should be inversely related to patterns of increased sampling-h rate over more samples. search strategies: how do people explore in a free sampling binary task hills and hertwig (2010) used data from experiments in the sampling paradigm to investigate the strategies that humans may use during sampling. they used the alternation rate between the two options to investigate two prominent sampling strategies: piecewise, where options are explored very rapidly and participants alternate back10.11588/jddm.2016.1.29308 jddm | 2016 | volume 2 | article 2 | 2 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.29308 gonzalez & dutt: exploration and exploitation during sampling and-forth between them in a zigzag manner (see figure 2, left panel); and comprehensive, where participants explore one option more deeply before making a switch to the other and switching back-and-forth between the options is less frequent (see figure 2, right panel). hills and hertwig (2010) discovered that although our search may reveal the same information, the strategies we use during sampling influence the subsequent choice. for example, the piecewise strategy more often resulted in the underweighting of rare outcomes (e.g., $0 in the example shown in figure 2) compared to the comprehensive strategy. they also found that the piecewise strategy resulted in less consistency (agreement) between the predictions from sampling behavior and the consequential choice. however, hills and hertwig (2010) left one important question unanswered (page 5): "why is the way people search indicative of the final decisions they make?" we expect the dynamics of exploration and exploitation and the inverse relationship between exploration rate and sampling-h rate to be the answer to this question. gonzalez and dutt (2011, 2012) suggested that the main features of decisions from experience can be captured with the hypothesis that people tend to select the option that led to the best value in similar situations in the past. a formalization of this underlying process is provided by instance-based learning theory (iblt) (gonzalez, lerch, & lebiere, 2003). in essence, iblt proposed that decisions are made by retrieving experiences from past similar situations and selecting the option that led to the best outcomes. in agreement with other instance-based theories of learning (dienes & fahey, 1995; logan, 1988) and reinforcement-learning processes (erev & roth, 1998), iblt proposes that depending on the consistency or variability of environmental conditions, there would be a gradual transition from exploration to exploitation of options that have provided the best outcome based on experience. the main choice rule in iblt is to select the option with the maximum experienced value (called blending) (gonzalez & dutt, 2011; lejarraga, dutt, & gonzalez, 2012). when the blended value of one option (a) is higher than the blended value of the other option (b), choose a, otherwise choose b. in the example of figure 1, there are three instances, each corresponding to each possible outcome: (a, 4), (a, 0), and (b, 3). each option (a and b) has a blended value calculated as the sum of each experienced outcome based on the cognitive probability (probability of recalling that outcome from memory). the cognitive probability of an instance is determined by several memory factors including the frequency and recency (memory decay) (these are components of an activation mechanism obtained from the act-r theory of cognition; anderson & lebiere, 1998). if an outcome has been experienced more often and more recently, that instance would have higher activation, which increases its probability of retrieval (see formalization of these mechanisms in gonzalez & dutt, 2011; and lejarraga et al., 2012). iblt’s mechanisms predict a process in which there is a gradual transition from more exploration of the available options towards exploitation of options that have resulted in the best outcomes through experience (gonzalez et al., 2003). given the possibility that people would strategize about how to explore the options, selecting a preferred strategy over other strategies (hills & hertwig, 2010), we question if the same or different dynamics may emerge during sampling when piecewise or comprehensive strategies are used. note that the piecewise and comprehensive strategies are "idealized"; that is, they are two extremes of a continuum of alternation or exploration processes (hills & hertwig, 2010). in what follows, we analyze the dynamics of exploration and sampling-h overall and under piecewise and comprehensive strategies, by relying on a large data set of the sampling paradigm that is publicly available (erev et al., 2010). method two data sets from the tpt’s sampling competition were put together: an estimation set (60 problems) and a competition set (60 new problems derived using the same algorithm as the estimation set), which are both available online (erev et al., 2010). all problems involved sampling between two unlabeled buttons, one associated with a safe option that offered a medium (m) outcome with certainty and the other associated with a risky option that offered a high (h) outcome with some probability (ph) and a low (l) outcome with the complementary probability (1-ph) (see erev et al., 2010 regarding the problem generation algorithm and data collection methods). in each of the estimation and competition sets, 40 participants were randomly assigned into two groups of 20 participants each, and each group completed 30 of the 60 problems in a random order.1 participants were allowed to sample the options freely as long as they wanted and in their desired order before making a consequential final choice. although participants could sample freely; however, the median sample size across the two options was small (= 9 samples). using the same assumption as in hills and hertwig (2010), we only considered those problems where participants saw all the outcomes for both of the options2 , obtaining a data set with 74 participants, 120 problems, encompassing 18,113 sampling decisions, and 988 observations (observations is a unique combination of participant, problem, and set that is used as the unit of our analysis). we calculated the a-rate as done by gonzalez and dutt (2011, 2012) and hills & hertwig (2010). for each observation starting in the second sample, we coded whether the participant switched the choice (=1) or not (=0) from the previous sample (the very first sample was marked as a missing value as there was no sample preceding it). then, the alternation rate was defined as the average at each sample computed across observations. to calculate the samplingh rate we first identified the option with the high expected value in each problem. based on the definition of those problems, in 63 problems the high expected 1 when we downloaded the sample-by-sample dataset from technion prediction tournament’s website, we found the dataset only contained 79 participants (thus, one participants’ sampling data was absent in the estimation set). 2 mehlhorn et al. (2014) found that variability of outcomes in options during sampling has an effect on people’s choice. by making participants see all possible outcomes on options, we disregard the role variability may play in influencing human choice. 10.11588/jddm.2016.1.29308 jddm | 2016 | volume 2 | article 2 | 3 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.29308 gonzalez & dutt: exploration and exploitation during sampling figure 2. examples of piecewise (left panel) and comprehensive strategies (right panel). when a person decides to stop the search they are asked to make a consequential choice. in this example, risky option a: represents a .8 chance to get $4 and .2 chance to get $0 and safe option b: gets $3 for sure. choices are influenced by the strategy of search according to hills & hertwig (2010). value option was the safe option, and in 57 problems it was the risky option. then, we checked whether the option sampled by a participant was the high expected value option, and coded this as 1; otherwise, the choice was coded as 0. we then aggregated high choices across all participants and problems for different samples and defined the sampling-h rate per sample. results figure 3 shows the overall a-rate and sampling-h rate across samples including all observations in the data set. this figure shows all sample trials up to the point in which there were at least two observations left in the data set (sample number 133). the figure shows a gradual increase of the sampling-h and in parallel, a gradual decrease of the a-rate with increased sample trials. given that people rely on small samples (hills & hertwig, 2010; hau et al., 2008) the number of observations decreases rapidly with increased samples. this explains the noisy averages as sample sizes increase, given that they involve fewer participants (gonzalez & dutt, 2011, 2012; hills & hertwig, 2012). using the median sample size of the overall data set (median = 10) with a cochran’s q test, we found a significant difference in the a-rate across the first 10 samples, χ2(8) = 81.66, p < .001) and a significant difference in the sampling-h rate across the first 10 samples, χ2(8) = 39.84, p < .001. a pairwise comparison revealed a decrease in the a-rate from .40 in sample #2 to .23 in sample #10, a 43% drop (z = -7.14, p < .001); and an increase of 13% in the sampling-h rate from .52 in sample #2 to .60 in sample #10, (z = -1.98, p < .05). this result suggests that across samples, participants explore between the two buttons less, while increasingly selecting the option with the higher expected value. in fact, the sampling-h rate was significantly and negatively correlated to the a-rate, rs = –.48, p < .01. figure 3. the average sampling-h rate and a-rate across samples. sampling-h and a-rate for piecewise and comprehensive search strategies to analyze behavior for different sampling strategies, we first analyzed the distribution of the alternation rate between the two options (see figure 4) and followed hills and hertwig’s (2010) procedure of classifying participants according to their a-rate. the arate in the tpt data set varied widely, from a minimum of 0.07 to a maximum of 1.0. the median in the tpt data set was higher (.27) (shown by the dotted line in figure 4) than in hills and hertwig’s (2010) data (.16), but the distribution was similarly bimodal with peaks in the 0.15-0.20 and 0.45-0.50 a-rate intervals. accordingly, all participants with an a-rate less than 0.27 were categorized as following a comprehensive strategy; whereas, all participants with an a-rate 10.11588/jddm.2016.1.29308 jddm | 2016 | volume 2 | article 2 | 4 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.29308 gonzalez & dutt: exploration and exploitation during sampling above 0.27 were categorized as following the piecewise strategy. figure 4. histogram of participants’ a-rate (averaged across all problem played by a participant in a set). the a-rate in a problem for a participant is expressed as the ratio of observed switches and the maximum number of allowable switches (n-1, where n are the number of samples) in a problem. the dotted line represents the median value = 0.27. figure 5 shows the a-rate and sampling-h rate for piecewise (left panel) and comprehensive (right panel) strategies. the maximum trial in which there were more than one observation left in the data set was 72 for the piecewise strategy and 133 for the comprehensive strategy. that is, people who alternated more often tended to take fewer samples than people who alternated less often. the median sample size for the piecewise group was 8, while the median sample size for the comprehensive group was 12. this result supports similar observations by hills and hertwig (2010), and also rakow, demes, and newell (2008). a general observation of these patterns indicates a decrease in a-rate over increased trials and an increase in the sampling-h rate regardless of the sampling strategy. for participants following a piecewise strategy, there was a significant decrease in the a-rate (χ2(6) = 82.17, p < .001) and a significant increase in the sampling-h rate (χ2(6) = 21.69, p < .01) over the first 8 samples. for participants following a comprehensive strategy, there was a significant decrease in the a-rate (χ2(10) = 42.44, p < .001), but not a significant increase in the sampling-h rate across 12 samples (χ2(10) = 2.29, p = .99). although the trend of sampling-h rate increases over time on average, this result may be due to the different orders in which participants may sample one or the other option (see the discussion of results). for example, it is possible that some participants start by exploring the high option and then move to the low expected value option, and others do the reverse order in the comprehensive strategy. although these clean patterns of exploration are only idealistic in the comprehensive strategy, what matters in this research is that for both comprehensive and piecewise strategies the sampling-h rate was significantly and negatively correlated to the a-rate (comprehensive: rs = –.35, p < .01; piecewise: rs = –.24, p < .05). consistency between sampling and final choice figure 6 reports the proportion of total agreement between predicted final choice based upon samplingh rate during sampling and participant’s actual final choice. for this analysis, we classified participants based upon the median sampling-h rate (similar to how participants were classified as following the piecewise and comprehensive strategies). the median sampling-h rate during sampling was 0.50. observations below this rate were classified as infrequent sampling-h, and those at or above 0.5 were classified as frequent sampling-h. among the frequent and infrequent sampling-h, we also identified those observations that followed the piecewise strategy (median alternation rate > 0.27) and the comprehensive strategy (median alternation rate < 0.27). within each of the four combinations of sampling strategy and samplingh rate, we calculated the average of the outcomes obtained during sampling in each option. as per iblt, the option with the highest average would be the one that is predicted to be chosen at the final choice. we matched the predicted final choice based upon the highest average with the actual final choice. next, we calculated the proportion of agreement by averaging such matches across all observations in each of the four combinations. as observed in figure 6, regardless of the alternation strategy and frequency of sampling-h, there is a high consistency (> 50%) between predicted final choices at the end of sampling and the actual final choice made by participants. the consequential choice after sampling was equally predicted for piecewise and comprehensive strategies, and for both infrequent (z = -0.913, p = .38) and frequent sampling-h participants (z = -0.642, p = .53). figure 6. consistency between sampling behavior and consequential choice for frequent and infrequent sampling-h participants following the piecewise and comprehensive strategies. discussion our results clarify the relationship between the rate of exploration and the tendency to explore the option with high value during free sampling. we find a decrease in exploration rate and an increase in the rate 10.11588/jddm.2016.1.29308 jddm | 2016 | volume 2 | article 2 | 5 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.29308 gonzalez & dutt: exploration and exploitation during sampling figure 5. the a-rate and sampling-h rate across samples for the piecewise and comprehensive strategies. of sampling of the option of high value. our results show significant inverse dynamics between the a-rate and the sampling-h rate in binary-choice problems. furthermore, these negatively correlated dynamics appear regardless of the search strategies people might adopt during sampling. finally, we also show that the final consequential choice can be accurately predicted by the frequency of selection of the high option during sampling. these results are important as they provide support to an initial suggestion (gonzalez & dutt, 2011, 2012) that a decrease in the exploration rate during sampling is related to the process of discovering the option that maximizes expected value. our results indicate that free sampling is not a simple random process where participants explore the options with the goal of informing their future decisions. rather, we show that learning during free sampling is a gradual discovery of the best option while reducing the exploration effort with more samples. as suggested by theoretical accounts of decisions from experience, participants seem to gradually move from a process of exploration to the exploitation of the best option, and they end up choosing the option with agrees to this patterns of sampling (gonzalez et al., 2003). as in hills and hertwig’s findings (2010) we also identified two idealized search strategies: piecewise and comprehensive. however, regardless of which strategy was used, the search process seemed to serve the same purpose as demonstrated by similar increase in the sample rate from the high value option, while this process is inversely related to a gradual decrease in exploration. these phenomena are explained by iblt’s learning process which suggests that choice is led by a dynamic formation of the value of the options through experience (blending). the process of discovering the most valuable option starts with more exploration (reflected in higher alternation between the two options in binary choice), but as the better option becomes evident through experience, the amount of exploration is reduced. we find that regardless of the exploratory strategy, people exhibit similar dynamics of decreased exploration and increased selection of the high value option over time. for example, with a piecewise strategy and using the example of figure 1, the activation of the outcome for the safe option ($3, in the example) will be high given the frequency of selection of this option; while the activation of the outcomes for the risky option ($4 and $0) will vary according to the frequency with which these outcomes are observed. the $4 instance is experienced more often and could result in a higher activation than the $0 instance, which is a rare event (the activation equation has some stochastic noise, see gonzalez & dutt, 2011). when $0 and $4 are combined through blending, it is expected that the blended value of the risky option would more often be slightly higher than the one of the safe option and as a result, the risky option is expected to be chosen increasingly over the safe option, resulting in a gradual reduction of alternation between the two options where the risky option is often selected (here, risky option is also the high value option and the selection of this option increases the sampling-h rate). with a comprehensive strategy and using the example of figure 1, the activation of the three instances would greatly depend on the order in which the options are selected and on the number of times that an option is consecutively selected. if the risky option is selected first (as in the example of figure 2, right panel) and then a switch is made to the safe option, the activation of the outcomes for the risky option ($4 and $0) would decay during the longer exploration of the safe option. this order would increase the chances of choosing the safe option more often than the risky (low samplingh rate) and decrease alternation between the two options. the reverse order of exploration would predict an increased chance of choosing the risky over the safe option, resulting in higher sampling-h rate. in conclusion, this research contributes towards understanding the relationships between explorationexploitation processes during free sampling, their dynamics, and their consequences for choice. our results 10.11588/jddm.2016.1.29308 jddm | 2016 | volume 2 | article 2 | 6 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.29308 gonzalez & dutt: exploration and exploitation during sampling indicate that regardless of explicit search strategies, a decrease in exploration is observed in parallel to an increase in the selection of the high value option. however, a general conclusion regarding these phenomena is expected to depend on the dynamics of probabilities and outcomes of the environment over the course of free sampling. in highly dynamic environments, the diversity of options would make it more challenging for humans to discriminate among familiar classes of objects and more exploration would be required. although decisions might become increasingly similar with task practice, higher sampling-h rates might be slower in dynamic and diverse environments. understanding and predicting the rate at which exploration decreases and sampling-h rate increases has important implications for training and learning from experience. presumably, one could strategically manipulate the speed of these transitions by introducing surprising outcomes during sampling, which may keep people interested in alternating between options (thus, inviting increased exploration and delay exploitation). furthermore, another likely way of influencing the speed of of these transitions may be via introducing more options. when there are more than 2-options to choose from, it is likely that transitions will be delayed compared to when confronted with just 2-options. that is because more options in the choice set would likely make it difficult for people to find the options of high value. some of these ideas form the immediate next steps for us to investigate in the near future. acknowledgements: a significant portion of this research was undertaken while varun dutt was at the ddmlab, carnegie mellon university. this research was supported by the national science foundation award ses-1530479 to cleotilde gonzalez. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: the authors contributed equally to this work. supplementary material: no supplementary material available. copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: gonzalez, c. & dutt, v. (2016). exploration and exploitation during information search and consequential choice. journal of dynamic decision making, 2, 2. doi:10.11588/jddm.2016.1.29308 received: 06 april 2016 accepted: 12 july 2016 published: 29 july 2016 references anderson, j. r., & lebiere, c. (1998). the atomic components of thought. hillsdale, nj: erlbaum. biele, g., erev, i., & eyal, e. (2009). learning, risk attitude and hot stoves in restless bandit problems. journal of mathematical psychology, 53(3), 155-167. doi:10.1016/j.jmp.2008.05.006 camilleri, a. r., & newell, b. r. (2011). when and why rare events are underweighted: a direct comparison of the sampling, partial feedback, full feedback and description choice paradigms. psychonomic bulletin & review, 18(2), 377-384. doi:10.3758/s13423-010-0040-2 dienes, z., & fahey, r. (1995). role of specific instances in controlling a dynamic system. journal of experimental psychology: learning, memory and cognition, 21(4), 848-862. doi:10.1037/0278-7393.21.4.848 erev, i., ert, e., roth, a. e., haruvy, e., herzog, s., hau, r. hertwig, r., stewart, t., west, r., & lebiere, c. (2010). a choice prediction competition for choices from experience and from description. journal of behavioral decision making, 23, 15-47. doi:10.1002/bdm.683 erev, i., & roth, a. e. (1998). predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. the american economic review, 88(4), 848881. doi:10.1002/bdm.683 fiedler, k. (2000). beware of samples! a cognitive-ecological sampling approach to judgment biases. psychological review, 107(4), 659-676. doi:10.1037/0033-295x.107.4.659 fiedler, k., & kareev, y. (2006). does decision quality (always) increase with the size of information samples? some vicissitudes in applying the law of large numbers. journal of experimental psychology: learning, memory, & cognition, 32(4), 883-903. doi:10.1037/0278-7393.32.4.883 gonzalez, c., & dutt, v. (2011). instance-based learning: integrating sampling and repeated decisions from experience. psychological review, 118(4), 523-551. doi:10.1037/a0024558 gonzalez, c., & dutt, v. (2012). refuting data aggregation arguments and how the ibl model stands criticism: a reply to hills and hertwig (2012). psychological review, 119(4), 893-898. doi:10.1037/a0029445 gonzalez, c., lerch, j. f., & lebiere, c. (2003). instance-based learning in dynamic decision making. cognitive science, 27(4), 591-635. doi:10.1016/s0364-0213(03)00031-4 hau, r., pleskac, t. j., kiefer, j., & hertwig, r. (2008). the description-experience gap in risky choice: the role of sample size and experienced probabilities. journal of behavioral decision making, 21(5), 493-518. doi:10.1002/bdm.598 hertwig, r., barron, g., weber, e. u., & erev, i. (2004). decisions from experience and the effect of rare events in risky choice. psychological science, 15(8), 534-539. doi:10.1111/j.09567976.2004.00715.x hertwig, r., & erev, i. (2009). the description-experience gap in risky choice. trends in cognitive sciences, 13(12), 517-523. doi:10.1016/j.tics.2009.09.004 hertwig, r., & pleskac, t. j. (2008). the game of life: how small samples render choice simpler. in n. chater & m. oaksford (eds.), the probabilistic mind: prospects for bayesian cognitive science (pp. 209-235). oxford, uk: oxford university press. hertwig, r., & pleskac, t. j. (2010). decision from expe10.11588/jddm.2016.1.29308 jddm | 2016 | volume 2 | article 2 | 7 http://dx.doi.org/10.1016/j.jmp.2008.05.006 http://dx.doi.org/10.3758/s13423-010-0040-2 http://dx.doi.org/10.1037/0278-7393.21.4.848 http://dx.doi.org/10.1002/bdm.683 http://dx.doi.org/10.1002/bdm.683 http://dx.doi.org/10.1037/0033-295x.107.4.659 http://dx.doi.org/10.1037/0278-7393.32.4.883 http://dx.doi.org/10.1037/a0024558 http://dx.doi.org/10.1037/a0029445 http://dx.doi.org/10.1016/s0364-0213(03)00031-4 http://dx.doi.org/10.1002/bdm.598 http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.1016/j.tics.2009.09.004 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.29308 gonzalez & dutt: exploration and exploitation during sampling rience: why small samples? cognition, 115(2), 225-237. doi:10.1111/j.0956-7976.2004.00715.x hills, t. t., & hertwig, r. (2010). information search in decisions from experience: do our patterns of sampling foreshadow our decisions? psychological science, 21(12), 17871792. doi:10.1177/0956797610387443 hills, t. t., & hertwig, r. (2012). two distinct exploratory behaviors in decisions from experience: comment on gonzalez and dutt (2011). psychological review, 119(4), 888-892. doi:10.1037/a0028004 kareev, y. (2000). seven (indeed, plus or minus two) and the detection of correlations. psychological review, 107(2), 397402. doi:10.1037/0033-295x.107.2.397 lejarraga, t., dutt, v., & gonzalez, c. (2012). instancebased learning: a general model of repeated binary choice. journal of behavioral decision making, 25(2), 143-153. doi:10.1002/bdm.722 logan, g. d. (1988). toward an instance theory of automatization. psychological review, 95(4), 492-527. doi:10.1037/0033295x.95.4.492 mehlhorn, k., ben-asher, n., dutt, v., & gonzalez, c. (2014). observed variability and values matter: towards a better understanding of information search and decisions from experience. journal of behavioral decision making, 27(4), 328-339. doi:10.1002/bdm.1809 pirolli, p., & card, s. (1999). information foraging. psychological review, 106(4), 643-675. doi:10.1037/0033-295x.106.4.643 rakow, t., demes, k. a., & newell, b. r. (2008). biased samples not mode of presentation: re-examining the apparent underweighting of rare events in experience-based choice. organizational behavior and human decision processes, 106(2), 168-179. doi:10.1016/j.obhdp.2008.02.001 rakow, t. & newell, b. r. (2010). degrees of uncertainty: an overview and framework for future research on experience-based choice. journal of behavioral decision making, 23(1), 1-14. doi:10.1002/bdm.681 teodorescu, k., & erev, i. (2014). on the decision to explore new alternatives: the coexistence of underand overexploration. journal of behavioral decision making, 27(2), 109123. doi:10.1002/bdm.1785 todd, p. m., penke, l., fasolo, b., & lenton, a. p. (2007). different cognitive processes underlie human mate choices and mate preferences. proceedings of the national academy of sciences, 104(38), 15011. doi:10.1073/pnas.0705290104 wolfe, j. m. (2012). saved by a log: how do humans perform hybrid visual and memory search? psychological science, 23(7), 698–703. doi:10.1177/0956797612443968 10.11588/jddm.2016.1.29308 jddm | 2016 | volume 2 | article 2 | 8 http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.1177/0956797610387443 http://dx.doi.org/10.1037/a0028004 http://dx.doi.org/10.1037/0033-295x.107.2.397 http://dx.doi.org/10.1002/bdm.722 http://dx.doi.org/10.1037/0033-295x.95.4.492 http://dx.doi.org/10.1037/0033-295x.95.4.492 http://dx.doi.org/10.1002/bdm.1809 http://dx.doi.org/10.1037/0033-295x.106.4.643 http://dx.doi.org/10.1016/j.obhdp.2008.02.001 http://dx.doi.org/10.1002/bdm.681 http://dx.doi.org/10.1002/bdm.1785. http://dx.doi.org/10.1073/pnas.0705290104 http://dx.doi.org/10.1177/0956797612443968 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.29308 original research dynamic mouselabweb: individualized information display matrixes zeliha yıldırım1 and semra erpolat taşabat2 1 middle east technical university, ankara, turkey 2 mimar sinan university, istanbul, turkey this paper introduces dynamic mouselabweb, a computerized process tracing tool that was designed to create flexible decision-making settings that are similar to real life. while dynamic mouselabweb is an extension of mouselabweb, it differs in that it creates individualized information display matrixes (idms) rather than presenting predetermined idms, so participants decide on the attributes and alternatives before the decision task. this structure can improve the involvement of decisionmakers in the decision process and it gives researchers a chance to observe decision-making behaviors and to explore new decision-making strategies when the decision task has only the decision-makers’ important attributes and appealing alternatives. in order to measure the effect of this dynamic structure, two groups of students worked on job selection tasks, one in the dynamic mouselabweb program (n = 32) and one in the traditional mouselabweb program (n = 20). results indicate a significant difference between the decision-making behaviors of these two groups. the students who chose a job on dynamic mouselabweb were informed more and they spent more time on task than the other group; in other words, they were involved more in the decision process. keywords: decision-making, process tracing, information display matrix, mouselabweb this article consists of three main sections; the first sec-tion is dedicated to a literature review in order to make the readers familiar with the background information, the second is created for technical aspects of the software, and the last section explains the experiment. literature review decision models there are mainly two approaches to studying decisionmaking behavior: 1) the outcome-based approach and 2) the process tracing approach (harte & koele, 2001). the outcome-based approach attempts to construct mathematical models for the relationship between input (information) and output (decisions) to reveal the cognitive patterns underlying decision-making processes. this approach has been applied for over two centuries (abelson & levi, 1985; brehmer, 1994; dawes, 1979; einhorn, kleinmuntz, & kleinmuntz, 1979; ford, schmitt, schechtmann, hults, & doherty, 1989; westenberg & koele, 1994), but it has a large number of limitations. for example, the approach implies fitting mathematical models to data on decisions (output) in various situations (input) in order to infer the underlying decision-making processes without taking process data into account. according to svenson (1979), these models provide surface descriptions of the processes rather than a detailed information concerning stages in actual decision processes. as a result, researchers have suggested that cognitive processes cannot be sufficiently understood by solely studying input-output relationships and other research methods need to be included (ford, et al., 1989; payne, braunstein, and carroll, 1978; svenson, 1979). researchers can overcome these concerns by employing the processtracing approach, which focuses on patterns of information acquisition rather than the output (i.e., the decisions). in most process-tracing studies, information is presented in an information display matrix which has at least two alternatives which are characterized by at least two attributes (ford et al., 1989). by observing the information acquisition process, such as the amount of time spent making decisions, sequences of information acquisition and the amount of information accessed, researchers have gained insights into cognitive processes and developed more accurate predictive models (einhorn & hogarth, 1981). process tracing methods several methods have been used to monitor information acquisition processes, such as verbal protocols (jarvenpaa, 1989), eye movement recording (russo & dosher, 1983), information display boards (payne, 1976), and the mouselab system (payne, bettman, & johnson, 1988). in earlier process-tracing studies, such as those carried out toward the end of the 1970s, participants were asked to make a decision by looking at information on cards located in envelopes on an information display board (payne, 1976). in order to access information about a particular attribute for a particular alternative, the participant had to open the associated envelope and read the card, which would then be placed back in the envelope. in the meantime, corresponding author: zeliha yıldırım, middle east technical university, üniversiteler mahallesi dumlupınar bulvarı no:1, ankara, turkey; e-mail: zlhyldrm@gmail.com 10.11588/jddm.2019.1.63149 jddm | 2019 | volume 5 | article 4 | 1 mailto:zlhyldrm@gmail.com mailto:zlhyldrm@gmail.com https://doi.org/10.11588/jddm.2019.1.63149 yıldırım & taşabat: dynamic mouselabweb the researcher would write down the sequence of acquisitions and the number of times that the envelopes were opened. computerized process tracing, such as the mouselab system, employs computer graphics to display information displaying an information display matrix (payne et al., 1988). with this system, the values of attributes are concealed in boxes on a screen instead of in envelopes. each box is opened when the mouse cursor is moved over it (mouseover), and it remains open until the cursor is moved away (mouseout). only one box can be opened at a time (payne et al., 1988). in addition to offering all the features of mouselab, mouselabweb makes it possible to carry out research on the internet (willemsen & johnson 2004, 2011). mouselabweb (www.mouselabweb.org), which uses web technology (javascript, html, php, and mysql), has open-source codes, released under gnu general public license v3.0, so researchers can add new features. moreover, participants do not need to download plugins or other software; all they need is access to the internet and a mouse to use a mouselabweb page. along with mouselabweb there are other computerized process tracing tools, such as iscube (tabatabai, 1998), mousetrace (jasper & shapiro, 2002), phased narrowing (jasper & levin, 2001; levin & jasper, 1995), active information search (huber, wider, & huber, 1997; williamson, ranyard, & cuthbert, 2000), p1198 (andersson, 2001) and computershop (huneke, cole, & levin, 2004). decision strategies according to payne, bettman, and johnson (1992), a decision strategy is a “sequence of mental and effector (actions on the environment) operations that transform some initial state of knowledge into a final knowledge state so that the decision maker perceives that the particular decision problem is solved” (p. 109). knowing about decision makers’ cognitive processes enables us to infer the decision strategies used and it also enables us to predict future decisional behavior and decision outcomes (payne, braunstein, & carroll, 1978). furthermore, if we are better informed about people’s decision strategies we can design better decision support systems (browne, pitts, & wetherbe, 2007; montgomery, hosanagar, krishnan, & clay, 2004; bettman et al., 1993). although it is seldom possible to precisely identify a particular decision strategy, researchers can identify types of strategies by using process measures (ford et al., 1989). in general, decision strategies can be classified as being either compensatory or non-compensatory. with compensatory strategies, decision makers make tradeoffs between different values of multiple attributes and they engage in extensive information processing (stevenson et al., 1990). when a large amount of information is gathered in the process, this usually indicates that participants are employing compensatory decision-making strategies (ford et al., 1989). the weighted adding strategy, for example, is a compensatory strategy that requires decision makers to multiply each attribute’s subjective value by its importance weight to obtain an overall value for each alternative (abelson & levi, 1985). non-compensatory strategies are selective in the sense that decision makers restrict their attention to only part of the available information (beach & mitchell, 1978) and eliminate alternatives that do not meet their requirements. one example of this is the lexicographic strategy (lex), which selects the option with the best value for the most important attribute. according to the cost/benefit framework, the choice of a decision strategy is a function of both costs, i.e. the effort required to use a rule, and benefits, i.e. the ability of a strategy to select the best alternative (beach & mitchell, 1978; russo & dosher, 1983). by using this framework, the decision maker has a large repertoire of decision strategies, and he/she chooses one of these strategies contingent on the decision tasks’ characteristics, such as complexity of the decision task and time pressure, to adapt to decision settings (bettman et al., 1993). in addition to decision task-based variables, another important variable that affects the use of various decision-making strategies is the level of involvement of the decision maker. involvement is considered to be an important mediating variable of consumer behavior (mitchell, 1979). for example, an extensive information search might indicate a higher level of involvement (mittal, 1989). process measures when participants process information in a decision task, computerized process tracing tools record timedependent mouse events, e.g., mouseover (opening of boxes), mouseout (closing of boxes), the order in which boxes are opened, the amount of time boxes remain open, selected options, and total elapsed time since the display first appears on the screen automatically. by using this event-based data, several measures can be computed to characterize decision-making behavior, which are generally divided into depth, content, and sequence of searches (ford et al., 1989; jacoby, jaccard, kuss, troutman, & mazursky, 1987; jasper & shapiro, 2002, p. 370). depth measures focus on determining how extensively the participants attempted to acquire information. this includes decision-making time (hogarth, 1975; pollay, 1970), proportions of information sought (payne, 1976), reacquisition rates (jacoby, chestnut, weigl, & fisher, 1976), number of acquisitions (bettman et al., 1993), and the average amount of time spent per item of information acquired (bettman et al., 1993). content measures are used to quantify the relative weights assigned to the various types of information and they refer to exactly what information was acquired and which options were chosen. content measures include the total amount of time spent on the information in the boxes (bettman et al., 1993), time spent on the most important attribute/-s, and the proportion of time spent on the most important attribute. it is common to ask subjects to rate the importance they assign to at10.11588/jddm.2019.1.63149 jddm | 2019 | volume 5 | article 4 | 2 https://doi.org/10.11588/jddm.2019.1.63149 yıldırım & taşabat: dynamic mouselabweb tributes by using a likert rating scale question after the decision task. when this is done, the attribute with the highest rating is considered to be the most important one. sequence measures are used to describe whether information selection behavior was primarily attributeor alternative-based. an alternativebased search indicates that decision makers first consider several attributes of the same alternative before proceeding to the next one. in contrast, when they compare alternatives across attributes first, this is referred to as an attribute-based approach. a common metric for defining search patterns is the search index (si) (payne, 1976), which measures the relative use of alternative-based versus attribute-based searches. the index ranges from -1 to +1, indicating a completely attribute-based or a completely alternative-based information search, respectively. if both types of search patterns are used equally, the index is 0. a positive index value is often assumed to indicate the usage of compensatory strategies (e.g., weighted adding strategies), whereas a negative index value is interpreted as being an indicator for more non-compensatory strategies (e.g., elimination-by-aspects strategies). böckenholt and hynan (1994) proposed the strategy measure (sm), another index for search patterns, as they found in a simulation study that the si is unreliable when the number of alternatives and attributes is not identical. when the number of attributes is larger than the number of alternatives, a positive si index is more likely and when the number of attributes is smaller than the number of alternatives, a negative si index is more probable. the sm index also ranges from -1 to +1. other variables for sequences of searches are variability in the amount of information searched per option (payne, 1976), and variability in the amount of information searched per attribute (klayman, 1982). lower values for these two measures indicate that decision processing is more evenly spread across all the attributes or alternatives, respectively. a new tool: dynamic mouselabweb mouselabweb mouselabweb focuses on the acquisition of information to make inferences on the nature of the underlying cognition and test existing decision models. to do that, it presents information on an attributesalternatives matrix. in most computer-based displays, the attribute values are hidden behind boxes. participants navigate through a task by moving the mouse and they can reveal information as much as they need to make a decision. most computers record time and sequence of acquisitions with sufficient precision (1/60 th of a second), resulting in a record of box openings and closings. the interpretation of informationacquisition data is based on two testable assumptions (costa-gomes, crawford & broseta, 2001) about the relationship between search and cognition: the first, occurrence, states that if information is used by a decision-maker, it must have been seen by opening a box. the second, adjacency, assumes that information is acquired, rather than memorized because of limitations in short-term memory and the low cost of (re-)acquisition. motivation behind dynamic mouselabweb dynamic mouselabweb was motivated by the following aims: 1. overcome limitations of predetermined informational structures one of the limitations of idms is that the researcher has to determine the options, attributes and information available (jacoby et al., 1987). this can be alleviated by using an individualized structure., aschemann-witzel and hamm (2011) suggests as one of the possibilities for further development of idms: another method suggested by jacoby et al. (1987, p. 155) is an individualized idm; this takes account of the criticism of idm based on overly stark predefined informational structures: before the idm-survey is carried out, each individual participant is asked which of the product attributes are important to him or her, so that only these criteria are then offered in the matrix. (p. 4) 2. create stimulus close to real-life decision settings in the age of the internet, consumers tend to narrow down the options and criteria via online shopping websites so they can choose the best option. in light of this, researchers should prepare idms with this kind of dynamic structure. allowing decision makers to create their own idms rather than working with predefined ones resembles certain real-life information environments more closely. 3. encourage researchers to discover new decision strategies as huber (1980) states, future research should not be limited to decision strategies which are discussed in the literature: the list of simple strategies discovered so far covers only a subset of all possible simple strategies. ... at least some of the simple strategies are incomplete. for example, the lexicographic-ordering strategy assumes that the decision maker first selects the most important dimension. it is not clear what happens if there are two or more most important dimensions. (pp. 187–188) 10.11588/jddm.2019.1.63149 jddm | 2019 | volume 5 | article 4 | 3 https://doi.org/10.11588/jddm.2019.1.63149 yıldırım & taşabat: dynamic mouselabweb 4. increase the involvement of participants in decision processes in decision research, involvement is directly related to motivation to get information. we believe that if participants can decide what they want to know about the options that they are considering, they will be more motivated to get more information in the decision process. 5. make it freely available for other researchers dynamic mouselabweb is free software like mouselabweb, so researchers can redistribute it and/or modify it under the terms of the gnu general public license v3.0. technical aspects dynamic mouselabweb is an extended version of mouselabweb and uses all the features of mouselabweb. in this section, we list the basic features of mouselabweb and discuss the technical requirements for mouselabweb, and then we explain the structure of dynamic mouselabweb. basic features an experiment based on mouselabweb generally has more than one html page, such as an introduction page, pages for warm-up tasks, a decision task page and a survey page. each page is linked to the following page by defining “nexturl,” which is placed in the source code. mouselabweb pages can be generated using the mouselabweb designer (http://www.mouselabweb.org/designer/index.html), an online editor program. the main purpose of the designer is to generate the mouselabweb boxes structure, which can be designed by using different features. according to the study aim, researchers can use default features or create new features. mainly, mouselabweb has these basic features: 1) there are different options for box openings and box closings, 2) boxes can be fixed or counterbalanced or random, 3) you can include a header row/column for alternatives and attributes, 4) the layouts of boxes and selection buttons can be changed by altering css files, 5) the background scripting automatically saves the data, and the process data is sent to and stored on a database, 6) a questionnaire can be included before/after the decision task. dynamic features mouselabweb uses mysql, which is a relational database management system. we created three more tables in addition to the table that stores the process data to make mouselabweb dynamic. the functions of these three tables are as follows: • talternatives stores information about the alternatives, • tattributes stores information about the attributes, • tvalues stores the values of the attributes. the electronic supplementary material (esm) 1 explains the columns of these three tables and the information that they store. in order to use dynamic mouselabweb codes, you have to create these tables in a database. our study tables can be created by running the sql script in esm 2. in dynamic mouselabweb two additional screens are present before the decision task: 1) “choose the alternatives screen”, which uses information from the talternatives table in the database (figure 1 shows a capture of this screen; see esm 3 for page source code. 2) “choose the attributes screen”, which uses information from the tattributes table in the database (figure 2 shows a screen capture). this page’s source code is in electronic supplementary material 4. although we choose to display these two screens in that order, it is possible to change the order of the screens or omit one of them and use predefined attributes /alternatives instead. after these two selections, the names of columns and rows, and information of boxes can be obtained from the database by using the tables’ associations. to do this, we added steps in the source code of the decision task screen. the source code of the decision task is provided esm 5. for a screen capture of a decision task, see figure 3. to create a dynamic decision task, researchers should determine the alternatives, the attributes, and the attributes’ values in advance. then they need to create three tables in the database to keep their information and should make necessary changes in the codes of the sets of alternatives/attributes. figure 4 illustrates the process of creating a study in dynamic mouselabweb. 10.11588/jddm.2019.1.63149 jddm | 2019 | volume 5 | article 4 | 4 http://www.mouselabweb.org/designer/index.html https://doi.org/10.11588/jddm.2019.1.63149 yıldırım & taşabat: dynamic mouselabweb figure 1. a screen capture of a set of alternatives screen. figure 2. a screen capture of a set of attributes screen. technical requirements dynamic mouselabweb has the same technical requirements as mouselabweb. we recommend to visit the download page (http://www.mouselabweb.org/download.php), where you can find the program files and a guide that explains how to install the software and launch an experiment. we encourage you to create a mouselabweb page using mouselabweb designer to get familiar with the software. you can design an experiment in an offline environment or with an online version of mouselabweb. basically, you need a web server and php/mysql capabilities on your system as well as a database to store the study data. you can find all the necessary files that you have to put in the study folder in esm 6. experiment in order to illustrate the dynamic structure of dynamic mouselabweb, we conducted an experiment that compares the results of two decision tasks using the same amount of information presented either in dynamic mouselabweb or in mouselabweb (the control). we formulated three hypotheses based on the process measures: h1: the dynamic structure of a decision task has an effect on decision-making behavior. h2: participants using a dynamic structure will access more information and spend a longer amount of time on a decision task, which indicates a difference in terms of involvement. h3: the dynamic mouselabweb group will place less attention on the most important attribute than the control group. we mainly wanted to make a general statement about the effect of the dynamic structure as it is stated in h1. even though h1 is related to h2, we want to use different analysis techniques to test these two hypotheses. we used more than one dependent variable to represent decision-making behavior to test h1. on the other hand, h3 suggests that these two groups have different approaches for the most important attribute. since the dynamic decision task has more than one important dimension, we assume that the group in this task give less attention to the most important attribute than the other group. method participants the target group was turkish undergraduates studying in their final year in statistics departments. 32 undergraduates (19 female, 9 male, 4 not answered, mage 10.11588/jddm.2019.1.63149 jddm | 2019 | volume 5 | article 4 | 5 http://www.mouselabweb.org/download.php https://doi.org/10.11588/jddm.2019.1.63149 yıldırım & taşabat: dynamic mouselabweb figure 3. a screen capture of a decision task. figure 4. the flowchart of a study in dynamic mouselabweb. the dotted lines represent the background scripts that show screens and obtain information from the database, whereas the unbroken lines denote researchers/participants’ activities. = 22.9, sd = 0.71) from various universities selected a job on dynamic mouselabweb and 20 undergraduates (10 female, 10 male, mage= 23.6, sd = 1.47) selected a job on mouselabweb. programs we designed two job selection tasks with four alternatives and five attributes. the stimuli in mouselabweb have a traditional setup in which the alternatives and the attributes are selected by the researcher, whereas in the dynamic decision task, participants can choose the alternatives and attributes from the sets before the task. design in the dynamic mouselabweb study, to define the set of alternatives, we determined sectors in which people who graduated in statistics might work. these sectors are it, banking, insurance, industry, market research, and education. then we used job search websites to determine the particular occupations. in the preparation phase, we selected 21 jobs and prepared a study link that includes the decision task, and we sent this link to 10 undergraduates and asked for their feedback. based on their responses, we omitted 11 jobs and added 2 new jobs to the set. the set of attributes has 20 attributes which were collected from various studies. to determine the attribute values, we turned to eight specialists who have worked for many years in their sectors. we asked them to rate 20 attributes on a 7-point scale (from worst to best) for each job and each sector (overall rate). after the specialists had assessed all the jobs, we consolidated these assessments into a single table which has 20 attributes and 12 jobs. procedure in the dynamic mouselabweb task, participants were asked to choose four jobs from the set of alternatives and five attributes from the set of attributes. the option “i do not want to choose” was added to the sets, and if this option was selected, the predetermined alternatives/attributes were shown in the main decision task. in the mouselabweb task, participants were asked to choose one job in the main decision task in which the alternatives and attributes were predefined. this decision task also had four jobs and five attributes. both studies include introduction pages and two warm-up tasks before the stimuli. the survey links, which included the decision tasks, were sent to students between march and may 2018. they were 10.11588/jddm.2019.1.63149 jddm | 2019 | volume 5 | article 4 | 6 https://doi.org/10.11588/jddm.2019.1.63149 yıldırım & taşabat: dynamic mouselabweb table 1. means and standart deviations of the dependent variables and t-test results. decision task group t-test for equality of means dynamic mouselabweb mouselabweb group (n = 32)) group (n = 20) t df p process measures mean sd mean sd percent of unique boxes examined 0.84 0.18 0.70 0.26 2.33 50 .02 number of acquisitions 42.16 19.25 26.60 15.66 3.04 50 <.01 reacquisitions number 25.41 17.14 12.70 11.66 3.18 50 <.01 total time in decision (s) 42 24 31 16 2.10 50 .04 time spent on the most important attribute (s) 8.32 5.90 8.86 7.20 -0.29 46 .78 proportional time on the most important attribute 0.20 0.10 0.32 0.24 -2.19 24 .04 strategy measure -0.41 3.93 -0.95 3.20 0.52 50 .61 varalt 0.03 0.03 0.05 0.06 -1.59 50 .12 varatt 0.02 0.02 0.03 0.05 -1.46 22 .16 note. varalt: variance in the proportion of time spentprocessing each alternative, varatt: variance in the proportion of time spent processing each attribute. the significance level is .05. informed about the project by an email and invited to the study. they did not receive any compensation in return. data preparation the process data was used to create nine variables reflecting on decision-making behavior. to measure the total amount of processing, we used four depth measures: 1) total time spent on decision(s), 2) number of acquisitions, 3) number of reacquisitions, 4) percentage of unique boxes examined. to assess selectivity in processing, we used two content measures: 1) time spent on the most important attribute(s), 2) proportional time spent on the most important attributes. to define the pattern of searches, we use three sequence measures: 1) strategy measure, 2) variance in the proportion of time spent processing each alternative (varalt), 3) variance in the proportion of time spent processing each attribute (varatt). results since we expected the dependent variables to be correlated (0.2 < r < 0.8), we initially performed a multivariate analysis of variance (manova) with dependent variables that meet manova’s assumptions. these dependent variables are: total time in decisionmaking, number of acquisitions, time spent on the most important attribute, and variance in proportion of time per alternative. the result of the manova shows that the decision-making behavior of participants who selected their attributes and alternatives before the decision task is statistically different from the control group, f (4,43) = 3.18, p = 0.02. as indicated in h1, we can conclude that the dynamic structure of the decision task has an impact on decisionmaking behavior. an independent samples t-test was used to examine the effects in more detail. table 1 presents the descriptive statistics (means and sds) of the dependent variables and t-test results. table 1 shows that the two groups are significantly different in terms of all the depth measures: percent of unique boxes examined, number of acquisitions, reacquisitions number, total time in decision. as predicted in h2, the dynamic group processed more information (i.e., higher number of acquisitions, higher number of reacquisitions, and higher percentage of unique cells examined) and spent longer time in decision process, so we conclude that this group was involved more in decision-making. lastly, the two groups differed in terms of “proportion of time spent on the most important attribute” (p = 0.04). as stated in h3, it can be concluded that when participants define the attributes and the alternatives of the decision task, they spend less time on the most important attribute. the data file is placed in electronic supplementary material 7. discussion the idea underlying dynamic mouselabweb emerged from the observation that there are no open-source tools available that can create individualized idms. in this paper, we have described dynamic mouselabweb, which was designed to create individualized idms that make it possible for participants to choose the attributes and the alternatives before the decision task. by using dynamic mouselabweb, researchers can avoid the problem of predetermined informational structures and generate more realistic decision settings. additionally, we believe that if participants can decide what they want to know about the options that they will consider, they will be more motivated to get more information about the decision process. to define dynamic mouselabweb’s position in decision studies, we can compare it with the method of active information search (ais). huber, wider, and huber (1997) introduced the method of ais, in which 10.11588/jddm.2019.1.63149 jddm | 2019 | volume 5 | article 4 | 7 https://doi.org/10.11588/jddm.2019.1.63149 yıldırım & taşabat: dynamic mouselabweb the subject gets a basic description of the task, and has to ask questions to receive additional information. these questions are recorded and answers are provided in printed form. in an ais experiment, the researcher has to prepare possible questions in advance to present their answers during the experiment. similar to this, in dynamic mouselabweb the researcher has to define all attributes and alternatives about the task to display them on the screens. additionally, both methods put the participant in an active position. the main difference between these two approaches is that in an ais experiment, the participants have to ask questions about what they need to know to solve the problem, but in dynamic mouselabweb the participants see all possible attributes and alternatives at once and choose among them to create the task. in order to test the program, a study was conducted to compare the decision-making behavior of two groups which selected a job from same-size decision tasks presented in dynamic mouselabweb versus mouselabweb. the results of the study show that 1) the dynamic structure of a decision task has an effect on decision-making behavior; 2) the dynamic group processed more information and spent more time on decision processes; and 3) the dynamic group spent less time on the most important attribute. the job choice task is just one example of any number of decisionmaking contexts, and this new program can be used for many decision-making tasks with larger sample sizes. we encourage researchers to use dynamic mouselabweb to observe decision-making behavior because individualized idms include only those criteria that are important for decision makers and only those options that have the potential to be selected. thus, dynamic mouselabweb creates a decision-making environment with more than one important dimension, an issue that huber (1980) raised. naturally, the outcomes of this should be explored by future research. since it is an open-source program, other researchers can easily access the codes and adapt them to their research. electronic supplementary material supplementary material available online at https://doi.org/10.11588/jddm.2019.1.63149 – esm1.databasetables (doc). – esm2.createdmwtables(sql). – esm3.choosealterscreen(php). – esm4.chooseattriscreen(php). – esm5.decisionscreen(php). – esm6.mouselabweb_files(zip). – esm7.data(sav). acknowledgements: the authors would like to thank the participants and the experts who contribute this study without getting any reward. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: zeliha was the lead researcher who designed the experiment, collected the data, analysed it for this work. semra was the supervisor who served as a constant guiding light for this work. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: yıldırım, z. & taşabat, s. e. (2019). dynamic mouselabweb: individualized information display matrixes. journal of dynamic decision making, 5, 4. doi:10.11588/jddm.2019.1.63149. received: 23 june 2019 accepted: 7 october 2019 published: 28 december 2019 references abelson, r. & levi, a. (1985). decision making and decision theory. in g. lindzey & e. aronson (eds.), the handbook of social psychology (vol. 1, pp. 231–309). munich: random house. andersson, p. (2001). p1198: software for tracing decision behavior in lending to small businesses. behavior research methods, instruments, & computers, 33(2), 234–242. doi:10.3758/bf03195370 aschemann-witzel, j., & hamm, u. (2011). measuring consumers’ information acquisition and decision behavior with the computer-based information-display-matrix. methodology. doi:10.1027/1614-2241/a000018 beach, l. r., & mitchell, t. r. (1978). a contingency model for the selection of decision strategies. academy of management review, 3(3), 439–449. doi:10.5465/amr.1978.4305717 bettman, j. r., johnson, e. j., luce, m. f., & payne, j. w. (1993) correlation, conflict, and choice. journal of experimental psychology: learning, memory, and cognition, 19, 931–51. doi:10.1037//0278-7393.19.4.931 bettman, j. r., luce, m. f., & payne, j. w. (1998). constructive consumer choice processes. journal of consumer research, 25, 187–217. doi:10.1086/209535 böckenholt, u., & hynan, l. s. (1994). caveats on a processtracing measure and a remedy. journal of behavioral decision making, 7(2), 103–117. doi:10.1002/bdm.3960070203 brehmer, b. (1994). the psychology of linear judgement models. acta psychologica, 87(2-3), 137–154. doi:10.1016/00016918(94)90048-5 browne, g. j., pitts, m. g., & wetherbe, j. c. (2007). cognitive stopping rules for terminating information search in online tasks. mis quarterly, 89–104. doi:10.2307/25148782 cohen, j. (1988). statistical power analysis for the behavioral sciences (2nd ed.). hillsdale, nj: lawrence erlbaum associates, publishers. doi:10.4324/9780203771587 10.11588/jddm.2019.1.63149 jddm | 2019 | volume 5 | article 4 | 8 https://doi.org/10.11588/jddm.2019.1.63149 https://doi.org/10.11588/jddm.2019.1.63149 https://doi.org/10.3758/bf03195370 https://doi.org/10.1027/1614-2241/a000018 https://doi.org/10.5465/amr.1978.4305717 https://doi.org/10.1037//0278-7393.19.4.931 https://doi.org/10.1086/209535 https://doi.org/10.1002/bdm.3960070203 https://doi.org/10.1016/0001-6918(94)90048-5 https://doi.org/10.1016/0001-6918(94)90048-5 https://doi.org/10.2307/25148782 https://doi.org/10.4324/9780203771587 https://doi.org/10.11588/jddm.2019.1.63149 yıldırım & taşabat: dynamic mouselabweb costa-gomes, m., crawford, v. p., & broseta, b. (2001). cognition and behavior in normal-form games: an experimental study. econometrica, 69(5), 1193–1235. doi:10.1111/14680262.00239 dawes, r. m. (1979). the robust beauty of improper linear models in decision making. american psychologist, 34(7), 571–582. doi:10.1037//0003-066x.34.7.571 einhorn, h. j., kleinmuntz, d. n., & kleinmuntz, b. (1979). linear regression and process-tracing models of judgment. psychological review, 86(5), 465–485. doi:10.1037//0033295x.86.5.465 einhorn, h. j., & hogarth, r. m. (1981). behavioral decision theory: processes of judgment and choice. annual review of psychology, 32, 53–88. doi:10.1146/annurev.ps.32.020181.000413 ford, j. k., schmitt, n., schechtman, s. l., hults, b. m., & doherty, m. l. (1989). process tracing methods: contributions, problems, and neglected research questions. organizational behavior and human decision processes, 43(1), 75–117. doi:10.1016/0749-5978(89)90059-9 harte, j. m., & koele, p. (2001). modelling and describing human judgement processes: the multiattribute evaluation case. thinking & reasoning, 7(1), 29–49. doi:10.1080/13546780042000028 hogarth, r. m. (1975). cognitive processes and the assessment of subjective probability distributions. journal of the american statistical association, 70(350), 271–289. doi:10.1080/01621459.1975.10479858 huber, o. (1980). the influence of some task variables on cognitive operations in an information-processing decision model. acta psychologica, 45(1–3), 187–196. doi:10.1016/00016918(80)90031-1 huber, o., wider, r., & huber, o. w. (1997). active information search and complete information presentation in naturalistic risky decision tasks. acta psychologica, 95(1), 15–29. doi:10.1016/s0001-6918(96)00028-5 huneke, m. e., cole, c. & levin, i. p. (2004). how varying levels of knowledge and motivation affect search and confidence during consideration and choice. marketing letters, 15(2–3), 67–79. doi:10.1023/b:mark.0000047385.01483.19 jacoby, j., jaccard, j., kuss, a., troutman, t., & mazursky, d. (1987). new directions in behavioral process research: implications for social psychology. journal of experimental social psychology, 23(2), 146–175. doi:10.1016/0022-1031(87)90029-1 jacoby, j., chestnut, r. w., weigl, k. c., & fisher, w. (1976). pre-purchase information acquisition: description of a process methodology, research paradigm, and pilot investigation. acr north american advances,3, 306-314. jarvenpaa, s. l. (1989). the effect of task demands and graphical format on information processing strategies. management science, informs, 35(3), 285–303. doi:10.1287/mnsc.35.3.285 jasper, j. d., & shapiro, j. (2002). mousetrace: a better mousetrap for catching decision processes. behavior research methods, instruments, & computers, 34(3), 364–374. doi:10.3758/bf03195464 jasper, j. d., & levin, i. p. (2001). validating a new process tracing method for decision making. behavior research methods, instruments, & computers, 33(4), 496–512. doi:10.3758/bf03195408 klayman, j. (1982). simulations of six decision strategies: comparisons of search patterns, processing characteristics, and response to task complexity. center for decision research, graduate school of business, university of chicago. levin, i. p., & jasper, j. d. (1995). phased narrowing: a new process tracing method for decision making. organizational behavior and human decision processes, 64(1), 1–8. doi:10.1006/obhd.1995.1084 mittal, b. (1989). measuring purchase-decision involvement. psychology & marketing, 6(2), 147–162. doi:10.1002/mar.4220060206 mitchell, a. a. (1979). involvement: a potentially important mediator of consumer behavior. acr north american advances. montgomery, a. l., hosanagar, k., krishnan, r., & clay, k. b. (2004). designing a better shopbot. management science, 50(2), 189–206. doi:10.1287/mnsc.1030.015 payne, j. w. (1976). task complexity and contingent processing in decision making: an information search and protocol analysis. organizational behavior and human performance, 16(2), 366–387. doi:10.1016/0030-5073(76)90022-2 payne, j. w., bettman, j. r., & johnson, e. j. (1988). adaptive strategy selection in decision making. journal of experimental psychology: learning, memory, and cognition, 14(3), 534–552. doi:10.1037//0278-7393.14.3.534 payne, j. w., braunstein, m. l., & carroll, j. s. (1978). exploring predecisional behavior: an alternative approach to decision research. organizational behavior and human performance, 22(1), 17–44. doi:doi.org/10.1016/0030-5073(78)90003-x payne, j. w., bettman, j. r., & johnson, e. j. (1992). behavioral decision research: a constructive processing perspective. annual review of psychology, 43(1), 87–131. doi:10.1146/annurev.ps.43.020192.000511 pollay, r. w. (1970). the structure of executive decisions and decision times. administrative science quarterly, 15(4) 459–471. doi:10.2307/2391339 russo, j. e., & dosher, b. a. (1983). strategies for multiattribute binary choice. journal of experimental psychology: learning, memory, and cognition, 9(4), 676. doi:10.1037//02787393.9.4.676 stevenson, m. k., busemeyer, j. r., & naylor, j. c. (1990). judgment and decision-making theory. in m. d. dunnette & l. m. hough (eds.), handbook of industrial and organizational psychology (pp. 283–374). palo alto, ca, us: consulting psychologists press. svenson, o. (1979). process descriptions of decision making. organizational behavior and human performance, 23, 86-112. doi:10.1016/0030-5073(79)90048-5 tabatabai, m. (1998). investigation of decision making process: a hypermedia approach. interacting with computers, 9(4), 385– 396. doi:10.1016/s0953-5438(97)00009-x westenberg, m. r. m., & koele, p. (1994). multi-attribute evaluation processes: methodological and conceptual issues. acta psychologica, 87(2-3), 65–84. doi:10.1016/0001-6918(94)90044-2 willemsen, m. c., & johnson, e. j. (2004). mouselabweb: performing sophisticated process tracing experiments in the participants home. in society for computers in psychology annual meeting, minneapolis. willemsen, m. c., & johnson, e. j. (2011). visiting the decision factory: observing cognition with mouselabweb and other information acquisition methods. in m. schulte-mecklenbeck, a. kuehberger, j.g. johnson (eds.), a handbook of process tracing methods for decision research, pp. 21-42. abingdon: routledge williamson, j., ranyard, r., & cuthbert, l. (2000). risk management in everyday insurance decisions: evidence from a process tracing study. risk, decision and policy, 5(1), 19–38. doi:10.1017/s1357530900000090 10.11588/jddm.2019.1.63149 jddm | 2019 | volume 5 | article 4 | 9 https://doi.org/10.1111/1468-0262.00239 https://doi.org/10.1111/1468-0262.00239 https://doi.org/10.1037//0003-066x.34.7.571 https://doi.org/10.1037//0033-295x.86.5.465 https://doi.org/10.1037//0033-295x.86.5.465 https://doi.org/10.1146/annurev.ps.32.020181.000413 https://doi.org/10.1016/0749-5978(89)90059-9 https://doi.org/10.1080/13546780042000028 https://doi.org/10.1080/01621459.1975.10479858 https://doi.org/10.1016/0001-6918(80)90031-1 https://doi.org/10.1016/0001-6918(80)90031-1 https://doi.org/10.1016/s0001-6918(96)00028-5 https://doi.org/10.1023/b:mark.0000047385.01483.19 https://doi.org/10.1016/0022-1031(87)90029-1 https://doi.org/10.1287/mnsc.35.3.285 https://doi.org/10.3758/bf03195464 https://doi.org/10.3758/bf03195408 https://doi.org/10.1006/obhd.1995.1084 https://doi.org/10.1002/mar.4220060206 http://dx.doi.org/10.1287/mnsc.1030.0151 https://doi.org/10.1016/0030-5073(76)90022-2 https://doi.org/10.1037//0278-7393.14.3.534 https://doi.org/10.1016/0030-5073(78)90003-x https://doi.org/10.1146/annurev.ps.43.020192.000511 https://doi.org/10.2307/2391339 https://doi.org/10.1037//0278-7393.9.4.676 https://doi.org/10.1037//0278-7393.9.4.676 https://doi.org/10.1016/0030-5073(79)90048-5 https://doi.org/10.1016/s0953-5438(97)00009-x https://doi.org/10.1016/0001-6918(94)90044-2 https://doi.org/10.1017/s1357530900000090 https://doi.org/10.11588/jddm.2019.1.63149 original research convergent validity of two decision making style measures gentrit berisha, justina shiroka pula and besnik krasniqi university of prishtina “hasan prishtina”, prishtina, kosovo decision making research has witnessed a growing number of studies on individual differences and decision making styles, yet the lack of comprehensive frameworks and widely accepted measures has hindered research for a long time. there is an ongoing debate on whether individuals’ styles dynamically change across time and situations according to circumstances. furthermore, it is an open question whether these styles are mutually exclusive. decision style measures seek to determine one’s dominant style as well as less used styles. to our knowledge this is the first study of the convergent validity of two widely used decision making style measures: the decision style inventory (dsi) and the general decision making style (gdms). the direction and strength of correlation between directive, analytical, conceptual and behavioral styles as measured by dsi and rational, intuitive, dependent, avoidant and spontaneous styles as measured by gdms have been tested. results of the current study are compared with previous studies that have used one or both of the instruments. correlations between styles are consistent with findings from other studies using one of the decision style measures, but the strength of correlations indicates that there is no convergent validity between dsi and gdms. keywords: decision making style, decision style inventory, general decision making style, convergent validity the aim of this study is to test whether two of the mostused measures in the decision making style literature show concurrent validity. a growing body of literature uses either decicision style inventory (dsi) or the general decision making style (gdms) to measure decision making style. the foundation of all rigorous research designs is the use of measurement tools that are psychometrically sound (devon et al., 2007). schwarzer and schwarzer (1996) point out that the weakness of many measures lies in their unsatisfactory psychometric properties, unstable factor structures, and lack of cross-validation. the apa committee on psychological tests (apa, 1953) divides validity studies into predictive validity, concurrent validity, content validity, and construct validity (also see cronbach & meehl, 1955). bäckström and holmes (2001) note that construct validity can partly be shown by demonstrating that an instrument correlates in a theoretically meaningful way with other validated instruments measuring either the same, related, or different constructs. campbell and fiske (1959) postulate that validation it typically convergent. they go on to demand that before testing relationships between constructs, researchers should test the relationship between independent measures of the same construct. campbell and fiske (1959) propose employing different methods of data collection rather than using the same method to test for convergent validity. bryman (1989) adds to this debate by stressing that the multiplemethod convergent validation mostly encountered in organizational research consists of the simultaneous use of different self-administered questionnaire measures of the same underlying construct. convergent validation can be assessed by correlating a measure of some construct with other measures of the same construct, under the assumption that the latter is valid itself (bohrnstedt, 2010). nunnally and bernstein (1994) claim that validation is a never ending process. they suggest that most of psychological measures need to be constantly evaluated and re-evaluated to derive modifications or propose new approaches. dewberry, juanchich, & narendran (2013) claim that relationships between the various scales measuring decision making styles have been poorly understood. leonard, scholl, & kowalski (1999) note that with the growing number of measures of cognitive style, personality type, decision making style, and learning style, it has become unclear whether researchers are measuring the same or different factors. they examined the relationship between four commonly used measures of cognitive style, myersbriggs type indicator (mbti), the group embedded figures test, the learning styles inventory, and the decision style inventory, only to conclude that they are not strongly interrelated and appear to be measuring different aspects of information processing and decision making. rowe and boulgarides (1992) suggest that two decision making dimensions of myers–briggs are linked to each of the decision making styles as measured by decision style inventory (dsi). this hypothesis was not supported by leonard et al. (1999) who found that only one of the mbti dimensions was linked to each of decision making styles. spicer and sadler-smith (2005) have examined the psychometric properties of general decision making style (gdms) confirming the instruments’ soundness as well a its validity, mainly by considering relationships with other instruments. bavol’ár and orosová (2015) report that studies which have utilized gdms mostly have reported the factor structure and inner consistency of the instrument, whereas concurrent and predictive validity studies have not been sufficiently studied. apart from reporting on construct validity, content validity, face validity, several studies (rowe & mason, 1987; mech, 1993; martinson, 2001) report relationships between decision making style instruments as corresponding author: gentrit berisha, university of prishtina “hasan prishtina”, agim ramadani, nn, 10 000, prishtina, republic of kosovo, email: gentrit.berisha@uni-pr.edu 10.11588/jddm.2018.1.43102 jddm | 2018 | volume 4 | article 1 | 1 mailto:gentrit.berisha@uni-pr.edu https://doi.org/10.11588/jddm.2018.1.43102 berisha, pula & krasniqi: convergent validity criteria for the instrument’s soundness. there have been several studies focusing on gdms’s relationships with style of learning and thinking (solat; gambetti, fabbri, bensi, & tonetti, 2008), zimbardo time perspective inventory (ztpi; carelli, wiberg, & wiberg, 2011), attitudes toward thinking and learning scale (attls; galotti et al., 2006), and dsi’s relationships with mbti (rowe & mason, 1987; leonard, et al., 1999), and administrative styles questionnaire (asq; alomari, 2013). to our knowledge this is the first attempt to evaluate the relationship between decision style inventory (rowe & mason, 1987) and general decision making style (scott & bruce, 1995). this paper contributes to validations of gdms and dsi by testing their convergent validity in a sample of business school students. decision making style decision style provides a means for understanding the way that the human mind operates in making decisions (rowe & davis, 1996). according to nutt (1990) “style offers a way to understand why managers, faced with seemingly identical situations, use such different decision processes.” rowe and mason, (1987) suggest the term decision making style to refer to the way a person uses information in decision formulation. conceptualizing decision making styles with the dimensions of information gathering and processing is the starting point for multiple studies (mckenney & keen, 1974; robey & taggart, 1981; mitroff, 1983; driver, 1983; rowe & mason, 1987; kinicki & williams, 2013). over time several waves of research have been conducted concerning stylistic aspects of decision making, yet as tatum, eberlin, kottraba, & bradberry (2003) point out, there is no universally accepted classification of decision making styles. leykin and derubeis (2010) emphasize that several questionnaires assessing decision styles have been developed, each of them containing a small set of decision styles. many researchers more than one inventory or questionnaire to measure decision making styles (kozhevnikov, 2007; bruine de bruin, parker, & fischhoff, 2007), but this is not practical due to considerable overlap in some subscales of these questionnaires (leykin & derubeis, 2010). rowe and mason (1987) hold that kurt lewin is the key contributor to managerial applications of decision styles. lewin (1936) is perhaps the first to introduce the dynamic relationship between person and environment. he claims that the style of the person and the environment govern behavior and introduces the concept of life space to comprehend both. lewin’s notion of a life space added a dynamic dimension to the workings of an individual’s mind (rowe & mason, 1987). taggart, robey, and kroeck (1985) claim that in addition to effects of the elements of the situation (decision maker’s task and environment), the decision maker’s style may explain a substantial amount of variation in managerial decision making. the decision making styles literature has yet to develop a comprehensive conceptual framework (mohammed & schwall, 2009; hamilton, shih, & mohammed, 2016). conceptions range from decision making styles being interchangeable with cognitive styles (anderson, 2000) to being a subset of cognitive style (kozhevnikov, 2007). thunholm (2004) claims cognitive styles are a subset of decision making styles. there have been several attempts to establish comprehensive conceptual models of decision making styles (dewberry et al., 2013; leykin & derubeis, 2010; appelt et al., 2011) that have yet to be thoroughly encompassed by theory and research. decision making styles are usually assessed by means of self-report instruments, with which respondents describe introspectively the way in which they perform certain tasks, check personal habits or preferences or endorse statements about what they think of themselves (raffaldi, iannello, vittani, & antonietti, 2012). gati, landman, davidovitch, asulin-peretz, and gadassi (2010) claim that there are over 160 instruments that aim to distinguish among the various ways people make decisions. there is lack of consensus among researchers whether decision making styles are stable over time or if they are alterable easily and frequently. whereas for kahneman (2003) and epstein (1994) decision making styles are not continuous traits but rather cognitive systems, for scott and bruce (1995) and thunholm (2004) people have a dominant style that can change across situations based on individual characteristics. baron (1985) confronts the traditional view that styles are stable dispositions by claiming that styles are situation specific. furnham (2002) outlines this debate by stating that “despite some variability, individuals tend to exhibit consistent patterns of behavior across situation and over time. however they can choose to change those styles and learn other forms of behavior. it is relatively easy to develop another style.” while many researchers (e.g., penino, 2002; gambetti et al., 2008) claim decision making styles differ by situation and as such are different from cognitive styles and psychological types that remain unchanged across situations, other researchers (e.g., rowe & boulgarides, 1983; betsch & iannello, 2010) refer to them as personality traits. scott and bruce (1995) posit that decision making style is not a personality trait, but a habit-based propensity to react in a certain way in a specific decision context. driver, brousseau, and hunsaker (1990) propose a model of dynamic decision style. the rationale behind it is that most of people have more than one style, adopting styles to suit environmental and personal conditions. streufert and nogami (1989) speculate why some employees perform well even when transferred between jobs or tasks, whereas others (with the same level of intelligence, experience, and training) perform reasonably in one task environment but fail to perform well when transferred to a different environment. they suggest that a reason for these surprising differences in performances could be traced to the cognitive styles. brousseau, driver, hourihan, & larsson (2006) argue that decision style is influenced by circumstances. in order to respond to dynamic environment, managers need to have the ability to call on all styles (brousseau et al., 2006). they conclude that in order for leaders to succeed, their behaviors and styles must evolve over the course of their careers. decision making style determines how each individual responds to external world. alignment of individual’s style and environmental requirements is a key element in managerial effectiveness and executive success (rowe & mason, 1987). decision making style is a term used more often in the career development and occupational behavior literature than in the decision making literature (scott & bruce, 1995; highhouse, dalal, & salas, 2013). according to rowe and boulgarides (1983) decision styles have had a wide application in many fields like training in decision making, person-job fit, personnel selection and development, career planning and education, and creativity development. understanding decision style characteristics provides the base to improve communication, planning, goal setting, leadership style and team building (driver et al. 1990). the 10.11588/jddm.2018.1.43102 jddm | 2018 | volume 4 | article 1 | 2 https://doi.org/10.11588/jddm.2018.1.43102 berisha, pula & krasniqi: convergent validity following section focuses on decision making style measures that are widely used in the literature. decision making style measures together with the mbti, dsi and gdms are the most utilized instruments in the decision making style literature. whereas the mbti is more of a personality indicator, dsi and gdms (see figure 1) are exclusive decision making style measures based on previous instruments, integrating earlier work on decision making styles (scott & bruce, 1995), encompassing their taxonomies (galotti et al., 2006), or incorporating many of the attributes of other decision style models (rowe & boulgarides, 1983). the decision style inventory classifies people into four styles depending on their cognitive complexity and their value orientation. rowe and mason (1987) summarize the characteristics of the styles: the directive style has low tolerance for ambiguity and is oriented to task and technical concerns; the analytical style has a high tolerance for ambiguity and is oriented to task and technical concerns; the conceptual style has a high tolerance for ambiguity and is oriented to people and social concerns; the behavioral style has a low tolerance for ambiguity and is oriented to people and social concerns. rowe and mason (1987) claim that the dsi has shown excellent face validity with well over 90 percent of the people who took the test agreeing with its findings. test-retest reliability lies at .7, making it a recommendable psychometric instrument. alongside the developers of the decision style inventory who have reported on its psychometric properties (boulgarides, 1984; rowe & mason, 1987; rowe & boulgarides, 1992; boulgarides & cohen, 2001), several authors (shackleton, pitt, & marks, 1990; connor & becker, 2003; fox & wayne, 2005) conclude that the dsi is a valid instrument for measuring decision style. scott and bruce (1995) have created gdms as a response to “the lack of generally available, psychometrically sound instrument for measuring decision style.” much of their conceptualization of decision style has been shaped by building on the work of driver et al. (1990) in their flagship book the dynamic decision maker. from prior theorizing and empirical research, they identified four decision making styles: rational, intuitive, dependent and avoidant. after administering the 37 item questionnaire with the first sample for evaluation, scott and bruce (1995) reduced it to 25 items and tested it in three subsequent samples. a fifth style, spontaneous, was identified from the first sample. scott and bruce (1995) have defined decision making styles in behavioral terms: the rational decision making style is characterized by a thorough search for information and logical evaluation of alternatives; the intuitive decision making style is characterized by a reliance on hunches and feelings; the dependent decision making style is characterized by a search for advice and direction from others; the avoidant decision making style is characterized by attempts to avoid decision making and spontaneous decision making style is characterized by a feeling of immediacy and a desire to come through the decision making process as quickly as possible. apart from the authors, several researchers (russ, mcneilly, & comer, 1996; sager & gastil, 1999; loo, 2000; spicer & sadler-smith, 2005; galotti et al., 2006; sylvie & huang, 2008; allwood & salo, 2012; curşeu & schruijer, 2012) have tested gdms’s validity and endorse it as one of the most used and sound instrument in decision making literature. dsi and gdms differ significantly in method. whereas dsi items are scenario-based and determine the relative propensity to make use of four decision making styles (martinsons & davison, 2007), gdms consists of statements describing how individuals go about making important decisions (thunholm, 2004). testing for relationships between styles, scott and bruce (1995) assert that the pattern of correlations suggests conceptual independence among the five scales of the gdms. correlations among the subscales of the gdms would imply that the decision making styles are not mutually exclusive, and that individuals do not rely on a single decision making style (scott & bruce, 1995). when taking the dsi, individuals are forced to rank four possible responses each representing a decision making style even when they may appear equally desirable or undesirable (rowe & mason, 1987). these two instruments also differ in terminology, with none of the styles of one instrument having an analogue in the other (see figure 1). the dsi furthermore combines styles into different patterns. figure 1. decision style inventory and general decision making style. source: rowe et al. (1984); scott & bruce (1995) rowe et al. (1984) indicate that directive and analytical styles create the rational style pattern (dsi_rational), and behavioral and conceptual styles create the intuitive style pattern (dsi_intuitive). inviduals with a rational style are logical, abstract and focused; individuals with an intuitive style are creative, relational and empathetic (rowe & boulgarides, 1983; rowe & mason, 1987). the rational and intuitive pattern of dsi fit with rational and intuitive decision making styles of gdms. method participants a sample of 152 management undergraduates, 88 female and 64 male, with an age range from 20 to 23, participated in the study. students were in the phase of finalizing studies and choosing their future career paths. they participated in career decision making workshops as part of a course and the two instruments were part of a larger questionnaire. although reliance on student participants in social studies has been criticized, gordon, slade, & schmitt (1986) suggest that this practice is prudent when it comes to identifying causal relations among general behavioral constructs where social and cultural characteristics of the subjects do not influence the research. 10.11588/jddm.2018.1.43102 jddm | 2018 | volume 4 | article 1 | 3 https://doi.org/10.11588/jddm.2018.1.43102 berisha, pula & krasniqi: convergent validity procedure this research has been a part of a larger study conducted upon completion of a decision making course. questionnaires were administered anonymously in classroom setting to ensure student’s sense of engagement to the task. aftwards students were accommodated in a large auditorium and debriefed about the self-report questionnaire. since students were albanian speakers, back translation was used to verify the translation of questionnaires (brislin, 1970). convergent validity was tested with standard tools using spss 23. instruments decision style inventory (rowe & mason, 1987) and general decision making style (scott & bruce, 1995) are the decision making style instruments used in the study. the dsi consists of 20 items of four behavioral descriptions representing one of the four decision styles (rowe & mason, 1987). respondents were asked to choose one of the mutually exclusive numbers (1 – least like me; 2 – slightly like me; 4 – moderately like me and 8 – most like me). once all responses are ranked, the scores in each column are totaled. each column represents one of the decision making styles. authors have developed a style intensity assessment tool that represents style dominance. the intensity of each style is labeled as very dominant, dominant, backup and least-preferred. using a scoring system authors have developed, the intensity of each style is labeled as very dominant, dominant, back-up and least-preferred. the gdms instrument consists of 25 items, scored on a five-point likert-type scale, ranging from “strongly agree” to “strongly disagree” (scott & bruce, 1995). each style is represented by 5 items, with a maximum score of 25 and a minimum score of 5. results and discussion the descriptive statistics and correlations for the scales used in the study are presented in table 1. cronbach alpha for gdms scales are between .54 and .7, with only one style (dependent) having an acceptable internal reliability at .7, whereas rational, intuitive, avoidant and spontaneous styles scored .54, .67, .6, .68 respectively. when assessed together the scales give a score of .71 making it a reliable instrument. dsi was tested for face validity, with 93 percent of respondents agreeing with the style suggested by the instrument. this is in line with the levels reported by rowe and mason (1987). cronbach alpha for dsi was not calculated due to the specific nature of the instrument. most of the respondents (67/152) had a behavioral style, directive style being the second most frequent (44/152), whereas conceptual (23/152) and analytical (18/152) styles were less frequent. other studies (mech, 1993; jamian, sidhu, & aperapar, 2013) report behavioral style being the predominant style among different samples. in the decision style inventory instrument each style is negatively correlated to other styles, which is consistent with instruments scoring system of mutual exclusivity. the strongest negative correlation exists between directive and conceptual styles (-.52) and between analytical and behavioral (-.49) which was expected since they are the extreme opposites in the dimensions of tolerance for ambiguity and value orientation. the low correlation values between decision making style as measured with dsi and gender and age support previous studies (rowe & boulgarides, 1983; rowe & mason, 1987; mech, 1993) suggesting there is no significant correlation between these two individual differences and decision styles. in the gdms instrument only the rational and avoidant styles are negatively correlated which supports scott and bruce’s (1995) finding with an undergraduate’s sample. moreover, the current study supports the most consistent correlation pattern, the positive correlation between intuitive and spontaneous styles (.41 in this study; between .32 and .53 in all four of scott and bruce’s samples). all styles correlate negatively with age, yet insignificantly, with rational style having the highest negative correlation. since the age variance of respondents is just four years (20–23), it prohibits drawing conclusions. correlations of styles with gender are insignificantly low. pearson correlations between styles measured by dsi and gdms were used to determine whether they have convergent validity. 13 of 20 correlations between styles are negative suggesting that when comparing dsi and gdms there is no convergent validity. the strongest negative correlation (-.18) is evidenced between intuitive style and analytical style. this is consistent with what the styles stand for, as people with an analytical style use considerable information and are very careful in the examination of alternatives (rowe & boulgarides, 1983), whereas people with an intuitive style use hunches and feelings in decision making (scott & bruce, 1995). as for the positive correlations, the strongest (.28) is between dependent style and behavioral style. this is supported by their similar characteristics, given that dependent style people search for advice and guidance from others before making important decisions (scott & bruce, 1995) and behavioral style people focus on people in the decision making process (leonard, scholl, & kowalski, 1999). moderately low reliabilities and inability to determine reliability and validity with same tests due to difference in method, impede correction of correlations for attenuation. correlations between styles (below diagonal) suggest that there is no convergent validity between decision style inventory and general decision making style. the rational style pattern has a positive correlation with rational style and intuitive style pattern has a positive correlation with intuitive style. their correlations with the opposite styles are negative at the same power. the dsi intuitive style pattern has a stronger correlation with dependent and avoidant styles than it does with intuitive style. these two 10.11588/jddm.2018.1.43102 jddm | 2018 | volume 4 | article 1 | 4 https://doi.org/10.11588/jddm.2018.1.43102 berisha, pula & krasniqi: convergent validity table 1. means, standard deviations and correlations between gdms and dsi and dsi style patterns mean sd 1 2 3 4 5 6 7 8 9 10 11 12 13 1. gender .42 .50 1 2. age 20.92 .95 .21* 1 3. directive (dsi) 77.78 14.62 -.02 .16 1 4. analytical (dsi) 82.32 11.53 -.04 -.11 -.17 1 5. conceptual (dsi) 73.91 12.82 .12 -.14 -.53** -.21* 1 6. behavioral (dsi) 65.99 14.49 -.06 .05 -.44** -.49** -.19* 1 7. dsi_rational 160.10 17.44 -.04 .06 .76** .56** -.58** -.69** 1 8. dsi_intuitive 139.90 17.44 .04 -.06 -.76** -.56** .58** .69** -1.00** 1 9. rational (gdms) 21.47 2.59 -.07 -.21** -.03 .14 -.11 .02 .07 -.07 1 10. intuitive (gdms) 17.63 3.62 .06 -.10 .09 -.18* -.07 .11 -.04 .04 -.02 1 11. dependent (gdms) 19.67 3.62 -.07 .01 -.05 -.11 -.16* .28** -.11 .11 .22** .09 1 12. avoidant (gdms) 11.38 3.81 .14 .08 -.01 -.13 -.07 .18* -.09 .09 -.12 .15 .27** 1 13. spontaneous (gdms) 13.43 3.96 .06 -.03 .22** -.04 -.14 -.06 .15 -.15 -.10 .38** -.01 .38** 1 note: dsi_rational is the pattern style determined when directive and analytical style scores are summed up; dsi_intuitive is the pattern style determined when conceptual and behavioral style scores are summed up gender was coded as 0 (female) and 1 (male) * p < .05 **p < .01 styles correlate positively with rational style in all of scott and bruce’s (1995) samples, but this does not explain the low correlation between intuitive style and intuitive style pattern. the dsi rational style pattern has a stronger correlation with spontaneous style than with rational style. one possible explanation for this could be because rational style pattern contains scores from directive style and it is characterized with making fast decisions which complies with spontaneous style making “snap” and “spur of the moment” decisions (scott & bruce, 1995). the rationale behind this lack of strength in correlation, albeit the right direction, may be found in differences in instrument’s methods or what bryman (1989) calls “common method variance.” scott and bruce (1995) concluded that the decision making styles are independent but not mutually exclusive and that people seem to use a combination of decision making styles in making important decisions. the correlation analysis from the sample, the forced choice and mutual exclusive scoring of dsi instrument imply that this instrument determines style dominance, whereas gdms styles are independent and a respondent can score mutually high or low in every style. conclusion there is a growing interest in studying decision making style. notwithstanding, the lack of a comprehensive framework and measures is perceived as the main reason why this field of study is not well established in decision making research. there is a quandary whether decision making style instruments measure the same construct. this research paper has shed light on the validity of two of the most used decision making style measures by testing their convergent validity. although the direction of correlations is consistent with prior research and theory, the lack of strength of correlations suggests that there is no convergent validity between dsi and gdms. the scoring systems of these two instruments are different, but nevertheless this study shows that they do not measure the same constructs. the practical implication of this research is that the two decision making measures should be used with caution from both practitioners and researchers, in particular when comparing decision making style constructs with other constructs. limitations and further research self-report questionnaires like decision making style measures rely on respondents making accurate judgments about themselves and therefore have the potential for bias and unreliability. future research on validity of these and other decision making style instruments should therefore employ different samples. moreover, future research should have managers and other decision makers in an organizational setting as respondents of these questionnaires as their wording is not intended specifically for students. the study falls short of meeting nunnally’s (1978) rule of ten respondents per item. as size of sample could influence strength of correlations, future research on convergent or other construct validity should seek to increase sample size and test across instruments and different samples. the forced choice scoring system in the decision style inventory produces a perfect negative correlation between style patterns. the conceptualization of decision making styles has been influenced by the contributions of driver et al. 10.11588/jddm.2018.1.43102 jddm | 2018 | volume 4 | article 1 | 5 https://doi.org/10.11588/jddm.2018.1.43102 berisha, pula & krasniqi: convergent validity (1990), who proposed a dynamic decision style model. both dynamic decision making and decision making style consider the person-environment relationship. decision making style measures have seen extensive application in the decision making literature and should also be integrated into dynamic decision making studies. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: the authors contributed equally to this work. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: berisha, g., pula, j. s., & krasniqi, b. (2018). convergent validity of two decision making style measures. journal of dynamic decision making, 4, 1. doi:10.11588/jddm.2018.1.43102 received: 30 nov 2017 accepted: 16 jan 2018 published: 24 apr 2018 references allwood, c. m., & salo, i. (2012). decision-making styles and stress. international journal of stress management, 19(1), 34– 47. doi:10.1037/a0027420 al-omari, a. a. (2013). the relationship between decision making styles and leadership styles among public school principals. international education studies, 6(7), 100–110. doi:10.5539/ies.v6n7p100 anderson, j. a. (2000). intuition in managers: are intuitive managers more effective? journal of managerial psychology, 15(1), 46–63. doi:10.1108/02683940010305298 apa (1953). ethical standards of psychologists. wachington: american psychological association. appelt, k. c., milch, k. f., handgraaf, m. j., & weber, e. u. (2011). the decision making individual differences inventory and guidelines for the study of individual differences in judgment and decision-making research. judgement and decision making, 6(3), 252–262. bäckström, m., & holmes, b. m. (2001). measuring adult attachment: a construct validation of two self-report instruments. scandinavian journal of psychology, 42(1), 79–86. doi:10.1111/1467-9450.00216 baron, j. (1985). rationality and intelligence. cambridge: cambridge university press. bavol’ár, j., & orosová, o. (2015). decision-making styles and their associations with decision-making competencies and mental health. judgment and decision making, 10(1), 115–122. betsch, c., & iannello, p. (2010). foundations for tracing intuition: challenges and methods. in a. glöckner, & c. witteman (eds.), measuring individual differences in intuitive and deliberative decision making styles. a comparison of different measures (pp. 251–271). london: psychology press. bohrnstedt, g. w. (2010). measurement models for survey research. in p. v. marsden, & j. d. wright (eds.), handbook of survey research (pp. 347–404). bingley: emerald group publishing. boulgarides, j. d. (1984). the decision style inventory: o.d. applications. management research news, 7(4), 17–20. doi:10.1108/eb027850 boulgarides, j. d., & cohen, w. a. (2001). leadership style vs. leadership tactics. journal of applied management and enrepreneurship, 6(1), 59–73. brislin, r. w. (1970). back-translation for cross-cultural research. journal of cross-cultural psychology, 1(3), 185–216. doi:10.1177/135910457000100301 brousseau, k. r., driver, m. j., hourihan, g., & larsson, r. (2006). the seasoned executive’s decision-making style. harvard business review, 84(2), 1–10. doi:10.1225/r0602f bruine de bruin, w., parker, a. m., & fischhoff, b. (2007). individual differences in adult decision-making competence. journal of personality and social psychology, 92(5), 938–956. doi:10.1037/0022-3514.92.5.938 bryman, a. (1989). research methods and organization studies. new york: routledge. campbell, d. t., & fiske, d. w. (1959). convergent and discriminant validation by the multitrait-multimethod matrix. psychological bulletin, 56(2), 81–105. doi:10.1037/h0046016 carelli, m. g., wiberg, b., & wiberg, m. (2011). development and construct validation of the swedish zimbardo time perspective inventory. european journal of psychological assessment, 27(4), 220–227. doi:10.1027/1015-5759/a000076 connor, p. e., & becker, b. w. (2003). personal value systems and decision-making styles of public managers. public personnel management, 32(1), 155–180. doi:10.1177/009102600303200109 cronbach, l. j., & meehl, p. e. (1955). construct validity in psychological tests. psychological bulletin, 52(4), 281–302. doi:10.1037/h0040957 curşeu, p. l., & schruijer, s. g. (2012). decision styles and rationality: an analysis of the predictive validity of the general decision-making style inventory. educational and psychological measurement, 72(6), 1053–1062. doi:10.1177/0013164412448066 daud, n. g., adnan, w. a., & noor, n. l. (2008). information visualization techniques and decision style: the effects in decision support environments. international journal of digital content technology and its applications, 2(2), 20–24. davis, d. l., grove, s. j., & knowles, p. a. (1990). an experimental application of personality type as an analogue for decision-making style. psychological reports, 66(1), 167–175. doi:10.2466/pr0.1990.66.1.167 devon, h. a., block, m. e., moyle-wright, p., ernst, d. m., hayden, s. j., lazzara, d. j., . . . & kostas-polston, e. (2007). a psychometric toolbox for testing validity and reliability. journal of nursing scholarship, 39(2), 155–164. doi:10.1111/j.15475069.2007.00161.x dewberry, c., juanchich, m., & narendran, s. (2013). the latent structure of decision styles. personality and individual differences, 54(5), 566–571. doi:10.1016/j.paid.2012.11.002 driver, m. (1983). decision style and organizational behavior: implications for academia. the review of higher education, 6(4), 387–406. doi:10.1353/rhe.1983.0014 10.11588/jddm.2018.1.43102 jddm | 2018 | volume 4 | article 1 | 6 https://doi.org/10.11588/jddm.2018.1.43102 http://dx.doi.org/10.1037/a0027420 http://dx.doi.org/10.5539/ies.v6n7p100 http://dx.doi.org/10.1108/02683940010305298 http://dx.doi.org/10.1111/1467-9450.00216 http://dx.doi.org/10.1108/eb027850 http://dx.doi.org/10.1177/135910457000100301 http://dx.doi.org/10.1225/r0602f http://dx.doi.org/10.1037/0022-3514.92.5.938 http://dx.doi.org/10.1037/h0046016 http://dx.doi.org/10.1027/1015-5759/a000076 http://dx.doi.org/10.1177/009102600303200109 http://dx.doi.org/10.1037/h0040957 http://dx.doi.org/10.1177/0013164412448066 http://dx.doi.org/10.2466/pr0.1990.66.1.16 http://dx.doi.org/10.1111/j.1547-5069.2007.00161.x http://dx.doi.org/10.1111/j.1547-5069.2007.00161.x http://dx.doi.org/10.1016/j.paid.2012.11.002 http://dx.doi.org/10.1353/rhe.1983.0014 https://doi.org/10.11588/jddm.2018.1.43102 berisha, pula & krasniqi: convergent validity driver, m. j., brousseau, k. r., & hunsaker, p. l. (1990). the dynamic decision maker: five decision styles for executive and business success. new york: harper & row. epstein, s. (1994). integration of cognitive and the psychodynamic unconscious. american psychologist, 49(8), 709–724. doi:10.1037/0003-066x.49.8.709 fox, t. l., & wayne, s. j. (2005). the effect of decision style on the use of a project management tool: an empirical laboratory study. the data base for advances in information systems, 36(2), 28–42. doi:10.1145/1066149.1066153 furnham, a. (2002). personality, style preference and individual development. in m. pearn (ed.), individual differences and development in organisations (pp. 89–103). chichester: john wiley & sons. galotti, k. m., ciner, e., altenbaumer, h. e., geerts, h. j., rupp, a., & woulfe, j. (2006). decision-making styles in a real-life decision: choosing a college major. personality and individual differences, 41(4), 629–639. doi:10.1016/j.paid.2006.03.003 galotti, k. m., tandler, j. m., & wiener, h. j. (2014). real-life decision making in college students ii: do individual differences show reliable effects? american journal of psychology, 127(1), 33–42. doi:10.5406/amerjpsyc.127.1.0033 gambetti, e., fabbri, m., bensi, l., & tonetti, l. (2008). a contribution to the italian validation of the general decision-making style inventory. personality and individual differences, 44(4), 842–852. doi:10.1016/j.paid.2007.10.017 gati, i., & saka, n. (2001). internet-based versus paperand-pencil assessment: measuring career decision-making difficulties. journal of career assessment, 9(4), 397–416. doi:10.1177/106907270100900406 gati, i., landman, s., davidovitch, s., asulin-peretz, l., & gadassi, r. (2010). from career decision-making styles to career decision-making profiles: a multidimensional approach. journal of vocational behavior, 76(2), 277–291. doi:10.1016/j.jvb.2009.11.001 germeijs, v., luyckx, k., notelaers, g., goossens, l., & verschueren, k. (2012). choosing a major in higher education: profiles of students’ decision-making process. contemporary educational psychology, 37(3), 22–239. doi:10.1016/j.cedpsych.2011.12.002 gordon, m. e., slade, l. a., & schmitt, n. (1986). the "science of the sophomore" revisited: from conjecture to empiricism. the academy of management review, 11(1), 19–207. doi:10.5465/amr.1986.4282666 hagan, a. j., & rogers, j. c. (1988). the role of decision style in selecting marketing managers: a comparison of peruvian and u.s. styles. in k. d. bahn (ed.), proceedings of the 1988 academy of marketing science (ams) annual conference (pp. 527–529). new york: springer. doi:10.1007/978-3-319-170466_125 hamilton, k., shih, s.-i., & mohammed, s. (2016). the development and validation of the rational and intuitive decision styles scale. journal of personality assessment, 98(5), 523–535. doi:10.1080/00223891.2015.1132426 hansson, p. h., & andersen, j. a. (2007). the swedish principal: leadership style, decision-making style, and motivation profile. international electronic journal for leadership in learning, 11(8), 1–13. hariri, h. (2011). leadership styles, decision-making styles, and teacher job satisfaction: an indonesian school context. townsville: james cook university. hariri, h., monypenny, r., & prideaux, m. (2016). teacherperceived principal leadership styles, decision-making styles and job satisfaction: how congruent are data from indonesia with the anglophile and western literature? school leadership & management, 36(1), 41–62. doi:10.1080/13632434.2016.1160210 harren, v. a. (1979). a model of career decision making for college students. journal of vocational behavior, 14(2), 119–133. doi:10.1016/0001-8791(79)90065-4 highhouse, s., dalal, r., & salas, e. (2013). judgment and decision making at work. new york: routledge. jamian, l. s., sidhu, g. k., & aperapar, p. s. (2013). managerial decision styles of deans in institutions of higher learning. procedia social and behavioral sciences, 90, 278–287. doi:10.1016/j.sbspro.2013.07.092 kahneman, d. (2003). maps of bounded rationality. american economic review, 93(5), 1449–1475. doi:10.1257/000282803322655392 kimberlin, c. l., & winterstein, a. g. (2008). validity and reliability of measurement instruments used in research. american journal of health systems pharmacy, 65(23), 2276–2284. doi:0.2146/ajhp070364 kinicki, a., & williams, b. (2013). management: a practical introduction (6 ed.). new york: mcgraw-hill. kozhevnikov, m. (2007). cognitive styles in the context of modern psychology: toward an integrated framework of cognitive style. psychological bulletin, 133(3), 464–481. doi:10.1037/00332909.133.3.464 leonard, n. h., scholl, r. w., & kowalski, k. b. (1999). information processing style and decision making. journal of organizational behavior, 20(3), 407–420. doi:0.1002/(sici)10991379(199905)20:3<407::aid-job891>3.0.co;2-3 lewin, k. (1936). principles of topological psychology. new york: mcgraw-hill. leykin, y., & derubeis, r. j. (2010). decision-making styles and depressive symptomatology: development of the decision styles questionnaire. judgment and decision making, 5(7), 506–515. loo, r. (2000). a psychometric evaluation of the general decisionmaking style inventory. personality and individual differences, 29(5), 895–905. doi:10.1016/s0191-8869(99)00241-x martinsons, m. g., & davison, r. m. (2007). strategic decision making and support systems: comparing american, japanese and chinese management. decision support systems, 43(1), 284–300. doi:10.1016/j.dss.2006.10.005 mckenney, j. l., & keen, p. g. (1974). how managers’ minds work. harvard business review, 52(3), 79–90. mcshane, s. l., & glinow, m. a. (2010). organizational behaviour: emerging knowledge and practice for the real world (5 ed.). new york: mcgraw-hill/irwin. mech, t. f. (1993). the managerial decision styles of academic library directors. college & research libraries, 54(5), 375–386. doi:10.5860/crl_54_05_375 mitroff, i. (1983). stakeholders of the organizational mind. san francisco, ca: jossey-bass publishers. mohammed, s., & schwall, a. (2009). individual differences and decision making: what we know and where we go from here. in g. p. hodkginson, & j. k. ford (eds.), international review of industrial and organizational psychology (vol. 24, pp. 249– 312). chichester: wiley-blackwell. nunnally, j. (1978). psychometric methods. new york: mcgrawhill. nunnally, j. c., & bernstein, i. h. (1994). psychometric theory (3 ed.). new york: mcgraw-hill. nutt, p. c. (1990). strategic decisions made by top executives and middle managers with data and process dominant styles. journal of management studies, 27(2), 173–194. doi:10.1111/j.14676486.1990.tb00759.x osipow, s. h. (1999). assessing career indecision. journal of vocational behavior, 55(1), 147–154. doi:10.1006/jvbe.1999.1704 10.11588/jddm.2018.1.43102 jddm | 2018 | volume 4 | article 1 | 7 http://dx.doi.org/10.1037/0003-066x.49.8.709 http://dx.doi.org/10.1145/1066149.1066153 http://dx.doi.org/10.1016/j.paid.2006.03.003 http://dx.doi.org/10.5406/amerjpsyc.127.1.0033 http://dx.doi.org/10.1016/j.paid.2007.10.017 http://dx.doi.org/10.1177/106907270100900406 http://dx.doi.org/10.1016/j.jvb.2009.11.001 http://dx.doi.org/10.1016/j.cedpsych.2011.12.002 http://dx.doi.org/10.5465/amr.1986.4282666 http://dx.doi.org/10.1007/978-3-319-17046-6_125 http://dx.doi.org/10.1007/978-3-319-17046-6_125 http://dx.doi.org/10.1080/00223891.2015.1132426 http://dx.doi.org/10.1080/13632434.2016.1160210 http://dx.doi.org/10.1016/0001-8791(79)90065-4 http://dx.doi.org/10.1016/j.sbspro.2013.07.092 https://doi.org/10.1257/000282803322655392 http://dx.doi.org/0.2146/ajhp070364 http://dx.doi.org/10.1037/0033-2909.133.3.464 http://dx.doi.org/10.1037/0033-2909.133.3.464 http://dx.doi.org/0.1002/(sici)1099-1379(199905)20:3<407::aid-job891>3.0.co;2-3 http://dx.doi.org/0.1002/(sici)1099-1379(199905)20:3<407::aid-job891>3.0.co;2-3 http://dx.doi.org/10.1016/s0191-8869(99)00241-x http://dx.doi.org/10.1016/j.dss.2006.10.005 http://dx.doi.org/10.5860/crl_54_05_375 http://dx.doi.org/10.1111/j.1467-6486.1990.tb00759.x http://dx.doi.org/10.1111/j.1467-6486.1990.tb00759.x http://dx.doi.org/10.1006/jvbe.1999.1704 https://doi.org/10.11588/jddm.2018.1.43102 berisha, pula & krasniqi: convergent validity park, d. (1996). gender role, decision style and leadership style. women in management review, 11(8), 13–17. doi:10.1108/09649429610148737 penino, c. m. (2002). is decision style related to moral development among managers in the us? journal of business ethics, 41(4), 337–347. doi:10.1023/a:1021282816140 raffaldi, s., iannello, p., vittani, l., & antonietti, a. (2012). decision-making styles in the workplace: relationships between self-report questionnaires and a contextualized measure of the analytical-systematic versus global-intuitive approach. sage open, 2(2), 1–11. doi:10.1177/2158244012448082 robey, d., & taggart, w. (1981). measuring managers’ minds: the assessment of style in human information processing. academy of management review, 6(3), 375–393. doi:10.5465/amr.1981.4285773 rowe, a. j., & boulgarides, j. d. (1983). decision styles: a perspective. learning and organizational development journal, 4(4), 3–9. doi:10.1108/eb053534 rowe, a. j., & boulgarides, j. d. (1992). managerial decision making. new york: mcmillan. rowe, a. j., & davis, s. a. (1996). intelligent information systems: meeting the challenge of the knowledge era. westport: quorum books. rowe, a. j., & mason, r. o. (1987). managing with style: a guide to understanding, assessing, and improving decision making. san francisco, ca: jossey-bass. rowe, a. j., boulgarides, j. d., & mcgrath, m. r. (1984). managerial decision making. chicago, il: science research associates. rowe, a. j., mason, r. o., dickel, k. e., mann, r. b., & mockler, r. j. (1994). strategic management: a methodological approach (4 ed.). new york: addison-wesley. rubinton, n. (1980). instruction in career decision making and decision-making styles. journal of counseling psychology, 27(6), 581–588. doi:10.1037/0022-0167.27.6.581 russ, f. a., mcneilly, k. m., & comer, j. m. (1996). leadership, decision making and performance of sales managers: a multilevel approach. journal of personal selling & sales management, 16(3), 1–15. doi:10.1080/08853134.1996.10754060 sager, k. l., & gastil, j. (1999). reaching consensus on consensus: a study of the relationships between individual decision-making styles and use of the consensus decision rule. communication quarterly, 47(1), 67–79. doi:10.1080/01463379909370124 schwarzer, r., & schwarzer, c. (1996). a critical survey of coping. in m. e. zeidner (ed.), handbook of coping: theory, research, applications (pp. 107–132). new york: wiley. scott, s. g., & bruce, r. a. (1995). decision-making style: the development and assessment of a new measure. educational and psychological measurement, 55(5), 818–831. doi:10.1177/0013164495055005017 shackleton, d., pitt, l., & marks, a. s. (1990). managerial decision styles and machiavellianism: a comparative study. journal of managerial psychology, 5(1), 9–16. doi:10.1108/02683949010139492 spicer, d. p., & sadler-smith, e. (2005). an examination of the general decision making style questionnaire in two uk samples. journal of managerial psychology, 20(2), 137–149. doi:10.1108/02683940510579777 streufert, s., & nogami, g. y. (1989). cognitive style and complexity: implications for i/o psychology. in c. l. cooper, & i. t. robertson (eds.), international review of industrial and organizational psychology (pp. 93–143). oxford: john wiley & sons. sylvie, g., & huang, j. s. (2008). value systems and decision-making styles of newspaper front-line editors. journalism & mass communication quarterly, 85(1), 61–82. doi:10.1177/107769900808500105 taggart, w., robey, d., & kroeck, g. (1985). managerial decision styles and cerebral dominance: an empirical study. journal of management studies, 22(2), 175-192. doi:10.1111/j.14676486.1985.tb00071.x tatum, b. c., eberlin, r., kottraba, c., & bradberry, t. (2003). leadership, decision making, and organizational justice. management decision, 41(10), 1006–1016. doi:10.1108/00251740310509535 thunholm, p. (2004). decision-making style: habit, style or both? personality and individual differences, 36(4), 931–944. doi:10.1016/s0191-8869(03)00162-4 10.11588/jddm.2018.1.43102 jddm | 2018 | volume 4 | article 1 | 8 http://dx.doi.org/10.1108/09649429610148737 http://dx.doi.org/10.1023/a:1021282816140 http://dx.doi.org/10.1177/2158244012448082 http://dx.doi.org/10.5465/amr.1981.4285773 http://dx.doi.org/10.1108/eb053534 http://dx.doi.org/10.1037/0022-0167.27.6.581 http://dx.doi.org/10.1080/08853134.1996.10754060 http://dx.doi.org/10.1080/01463379909370124 http://dx.doi.org/10.1177/0013164495055005017 http://dx.doi.org/10.1108/02683949010139492 http://dx.doi.org/10.1108/02683940510579777 http://dx.doi.org/10.1177/107769900808500105 https://doi.org/10.1111/j.1467-6486.1985.tb00071.x https://doi.org/10.1111/j.1467-6486.1985.tb00071.x http://dx.doi.org/10.1108/0025174031050953 http://dx.doi.org/10.1016/s0191-8869(03)00162-4 https://doi.org/10.11588/jddm.2018.1.43102 original research accounting for outcome and process measures in dynamic decision-making tasks through model calibration varun dutt1 and cleotilde gonzalez2 1school of computing and electrical engineering and school of humanities and social sciences, indian institute of technology mandi, india and 2dynamic decision making laboratory, department of social and decision sciences, carnegie mellon university, pittsburgh, pa, usa computational models of learning and the theories they represent are often validated by calibrating them to human data on decision outcomes. however, only a few models explain the process by which these decision outcomes are reached. we argue that models of learning should be able to reflect the process through which the decision outcomes are reached, and validating a model on the process is likely to help simultaneously explain both the process as well as the decision outcome. to demonstrate the proposed validation, we use a large dataset from the technion prediction tournament and an existing instance-based learning model. we present two ways of calibrating the model’s parameters to human data: on an outcome measure and on a process measure. in agreement with our expectations, we find that calibrating the model on the process measure helps to explain both the process and outcome measures compared to calibrating the model on the outcome measure. these results hold when the model is generalized to a different dataset. we discuss implications for explaining the process and the decision outcomes in computational models of learning. keywords: outcome and process measures, computational models of learning, instance-based learning, dynamic decisions, binary choice, calibration unlike disciplines like economics, models of decision mak-ing in psychology often incorporate theories of the underlying cognitive processes that lead to specific outcomes in a decision task. for example, instance-based learning theory (iblt; gonzalez & dutt, 2011), a theory of how people make dynamic decisions commonly includes assumptions of how people search for information (i.e., the process) and how this information search helps people to arrive at a decision (i.e., the outcome). however, many of the process theories and corresponding models are just tested on an outcome level; rather than on the process level itself (johnson et al., 2008). accounting for both the decision outcomes and the process through which these outcomes are reached is important in mathematical models (scheres & sanfey, 2006). that is because by accounting for the process and decision outcomes will enable such models to provide better account of the observed phenomena. furthermore, it is also important to account for process and decision outcomes in computational models of learning that try to explain human decisions (busemeyer & diederich, 2009, erev & barron, 2005, rapoport & budescu, 1992). for example, researchers investigating choice behavior are often interested in explaining the overall maximization behavior (an outcome measure) and the exploratory behavior (e.g., alternation between alternatives, a process measure) through cognitive models, which explains how people learn to maximize long-term rewards (biele, erev & ert, 2009; erev, ert, roth, haruvy et al., 2010; gonzalez & dutt, 2011). amidst the importance of accounting for both the decision outcome and the process, literature has revealed a strong relationship between these two, where the resulting outcome is consistent with the adopted process (erev & barron, 2005; green, price & hamburger, 1995; hills & hertwig, 2010). according to erev and barron (2005), one expects a strong relationship between process and decision outcomes in cases where the decision environment is dynamic (i.e., repeated), and where the decision outcome is contingent upon the process. for example, consider a repeated binary-choice task, where choices are made repeatedly between two alternatives. one of the alternatives is risky with a high outcome and a low outcome. these two outcomes occur with a certain pre-defined probabilities when this risky alternative is chosen. the other alternative is safe with a medium outcome. this medium outcome occurs with a sure (100%) chance when this alternative is chosen. now, if the expected value of the risky alternative is greater than that of the safe alternative (i.e., the safe alternative is maximizing), then participants who alternate a lot while selecting alternatives would end-up maximizing their choices only half of the time. in fact, hills and hertwig (2010) show that people seem to rely on two distinct alternation processes while making binary choices; both these processes achieve different amounts of maximization behavior. these arguments are not only relevant to human decisions but also to decision making in animals. for example, green et al. (1995) have shown that pigeons can only learn to maximize their outcomes by alternating between available alternatives in a probabilistic environment involving repeated choices between safe and risky alternatives. calibrating models to both process and outcome measures from one-time sequential sampling tasks is already common in literature (ratcliff, 1978; ratcliff & smith, 2004). for example, ratcliff (1978) calibrated models to corresponding author: varun dutt, school of computing and electrical engineering and school of humanities and social sciences, indian institute of technology, mandi, pwd rest house, near bus stand, mandi – 175 001, himachal pradesh, india. e-mail: varun@iitmandi.ac.in 10.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 1 mailto:varun@iitmandi.ac.in http://dx.doi.org/10.11588/jddm.2015.1.17663 dutt & gonzalez: accounting for outcome and process measures both outcome and process measures in an old-new recognition memory task. in this task, the outcome measure was proportion of correct responses and the process measure was the accumulation of evidence to a threshold for making a response. in fact, calibrating models to both outcome and process measures in one-time choice tasks is so common that a suite of software called diffusion model analysis toolbox (dmat, vandekerckhove & tuerlinckx, 2007) has been recently developed for this purpose. in contrast, to the authors’ best knowledge, except for one study (mentioned below) no one has explicitly calibrated models to outcome and process measures simultaneously in dynamic decision making tasks (johnson, schultemecklenbeck & willemsen, 2008). johnson et al. (2008) demonstrated via computational modeling that the priority heuristic, which provides a novel account of how people make risky choices, captures the decision outcomes; yet, this heuristic fails to account for the process measures. the general finding is that although certain behavioral results reveal a strong connection between the decision outcome and the process, existing models of learning in dynamic decision tasks rarely show any relationship between them (dember & fowler, 1958; erev & barron, 2005; erev, ert, roth, haruvy et al., 2010; rapoport & budescu, 1992; rapoport, erev, abraham & olson, 1997; tolman, 1925). for example, although the outcome results (i.e., maximization) in a symmetrical zero-sum matching pennies game were consistent with predictions from a reinforcementlearning algorithm, process results (i.e., alternations between alternatives) could not be accounted for by the algorithm (erev & barron, 2005; rapoport & budescu, 1992). similarly, according to johnson et al. (2008), the priority heuristic, a strategy to account for risky choices, fails to account for the process measures in dynamic decision tasks. in one study, gonzalez and dutt (2011) have calibrated cognitive models in the sampling paradigm (a dynamic task), where participants are asked to sample options free of cost before making a consequential choice for real. gonzalez and dutt (2011) demonstrate that a computational model based upon the iblt (gonzalez, lerch & lebiere, 2003), (“ibl model” hereafter), when calibrated on the outcome measure, was able to also explain the process measure better than the best models known in two different experimental paradigms. gonzalez and dutt (2011) however, did not calibrate their model on the process measure as well. thus, it remains unclear what effect calibrating a model to the process measure compared to the outcome measure has on the model predictions of both these measures. in general, one expects the decision outcome to be the result of the process (johnson et al., 2008). thus, calibrating models on process measures rather than outcome measures should have benefits in explaining both these measures at the same time. although it is hard to find models calibrated to outcome and process measures in dynamic tasks, past studies have made certain qualitative predictions of dynamic decision models (busemeyer, 1985; hertwig, barron, weber & erev, 2004; lee, zhang, munro & steyvers, 2011) on outcome and process measures. however, a quantitative empirical investigation of these models on both these measures is something currently lacking and much needed in literature. this paper makes a contribution in this area by investigating the benefit of calibrating cognitive models to outcome and process data in a dynamic decision task. in this paper, we evaluate the role of calibrating a computational model to either the decision outcome or the process in explaining and predicting both these measures. specifically, we calibrate an ibl model (gonzalez & dutt, 2011), to a risk-taking measure (decision outcome) or an alternation measure (process), and evaluate the model fits to human data (through parameter calibration in a dataset) and predictions (through generalization in a dataset different from calibration). given the hypothesized benefits of calibrating models on process measures (camerer & ho, 1999; suppes & atkinson 1959), we expect that the ibl model being calibrated to the alternation measure would improve its explanation about both the risk-taking and alternations compared to when it is calibrated on the risktaking measure. we use two large human datasets, estimation and competition, that were collected for the 2008 technion prediction tournament (tpt (erev, ert, roth, haruvy et al., 2010). the choice of tpt datasets is because the main focus of the tournament was on outcome measures, where no attention was given to process measures (erev, ert, roth, haruvy et al., 2010). that is because it was felt that paying less attention to the process measures can actually help the prediction of the outcome measures (erev & haruvy, 2005; estes, 1962), which is contrary to the hypothesis under test in this paper. thus, this dataset becomes an ideal choice for testing a processmeasure calibrated model’s ability to perform on the outcome measure. in what follows, we first discuss the role of the calibration process in computational models. next, we present the effects of calibrating an existing ibl model on the outcome measure or the process measure on the explanations and predictions of one or both measures in the tpt’s datasets. we close this paper by discussing the role of model calibration to account for both the process and decision outcomes. the role of model calibration in explaining different measures of performance calibrating a model to human data means finding the values of its parameters that minimize the deviation between model’s predictions and observations on a dependent measure. in the tpt, several influential models1 of learning in binary choice were calibrated and evaluated on only the outcome measure (risk-taking) and not on the process measure (alternations). these models were able to account for risk-taking very well; however, many of them did not provide any way of computing the alternations (gonzalez & dutt, 2011). in fact, most of the competing models did not provide any way to explain the learning process (see an extended discussion about these models in gonzalez and dutt (2011)). for example, a number of models submitted to the tpt used prospect theory (tversky & kahneman, 1992), to predict choices based upon calibrated mathematical functions. prospect theory does not provide any mechanism that would predict the sequential selection of options over time. in fact, only a few recent models of repeated binary-choice may account for both the risk-taking and alternation measures simultaneously: one of these models is the inertia sampling and weighting (i-saw) model (chen et al., 2011; nevo & erev, 2012; erev, ert, roth, haruvy et al., 2010) and the other is an ibl model (gonzalez & dutt, 2011; gonzalez, dutt & lejarraga, 2011; lejarraga, dutt & gonzalez, 2012). however, these models were calibrated on both the outcome and process measures at the same time, which makes it 1some of these models included the two-stage sampler model, the normalized reinforcement learning with inertia model, and the explorative sampler with recency model (erev, ert, roth, haruvy et al., 2010) 10.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 2 http://dx.doi.org/10.11588/jddm.2015.1.17663 dutt & gonzalez: accounting for outcome and process measures difficult to evaluate the utility of calibrating models to one of these measures. we expect that calibrating a model to the process measure should generally be beneficial for the model’s ability to explain both the process and outcome measures upon generalization to novel conditions. next, we provide details about the tpt datasets that we use to evaluate the ibl model. method risk-taking and alternations in technion prediction tournament competing models submitted to the tpt were evaluated according to the generalization criterion method (busemeyer & wang, 2000), by which models were calibrated on choices made by participants in 60 problems (the estimation set) and later tested in a new set of 60 problems (the competition set) with the parameters obtained from the calibration process in the estimation set. the generalization criterion method was believed to be a true test for models to explain observed choice decisions. although the tpt involved three different experimental paradigms, we only use data from the “e-repeated” paradigm that involved consequential choices in a repeated binary-choice task with immediate outcome feedback on the chosen alternative. for each of the 60 problems in the estimation and competition sets in this paradigm, a sample of 100 participants was randomly assigned into 5 groups of 20 participants each, and each group completed 12 of the 60 problems. each participant was instructed to repeatedly and consequentially select between two unlabeled buttons on a computer screen in order to maximize long-term rewards for a block of 100 trials per problem (this end point was not known to participants). one button was associated with a risky alternative and the other button with a safe alternative. selecting an alternative, safe or risky, generated an outcome for the selected alternative (thus, the foregone outcome on the unselected alternative was not shown). the selection of the alternative with the higher expected value, which could be either the safe or risky button, would maximize a participant’s long-term rewards. therefore, choosing a maximizing alternative across all the repeated trials would constitute the optimal strategy in the task. other details about the e-repeated paradigm are reported in erev, ert, roth, haruvy et al. (2010). the models submitted to the tpt were not provided with human data for alternation between options (i.e., the a-rate or the process measure), and they were evaluated only according to their ability to account for the risk-taking behavior (i.e., the r-rate or the outcome measure) (erev, ert, roth, haruvy et al., 2010). we calculated the a-rate for analyses of alternations from the tpt data (see results in gonzalez and dutt, 2011). first, alternations are either coded as 1s, the respondent switched from making a risky or safe choice in the last trial to making a safe or risky choice in the current trial; or they are coded as 0s, the respondent simply repeated the last trial’s choice. the proportion of alternations in each trial is computed by averaging the alternations over 20 participants per problem and the 60 problems in each dataset. the r-rate is the proportion of risky choices in each trial averaged over 20 participants per problem and the 60 problems in each dataset. a problem is defined as consisting of two alternatives, risky and safe. in the risky alternative, there are two possible outcomes, high and low, where the occurrence of these outcomes is determined by corresponding probability value. in the safe alternative, there is one possible outcome, medium, where this outcomes occurs with a 100% chance. for calculating the a-rate and r-rate, the averaging is done over 20 participants as this many participants were collected in the tpt (erev, ert, roth, haruvy et al., 2010). figure 1 shows the overall r-rate and a-rate over 99 trials from trial 2 to trial 100 in the estimation and competition sets. as seen in both of these datasets, the r-rate is relatively constant across trials, in contrast to the sharp decrease in the a-rate. the sharp decrease in the a-rate shows a transition in the pattern of information-search across trials (gonzalez & dutt, 2011). overall, these r-rate and a-rate curves suggest that risk-taking remains relatively steady across trials, while they learn to alternate less and choose one of the two alternatives more often. thus, the a-rate (process) is more dynamic compared to the r-rate (decision outcome) and due to these differences it is likely to be harder for a model to account for the a-rate compared to the r-rate. we use the r-rate and arate curves in figure 1 to evaluate the role of model calibration ahead in this paper. an instance-based learning model of repeated binary-choice iblt (gonzalez et al., 2003) has been used as the basis for developing computational models that capture human behavior in a wide variety of dynamic decision making tasks. these include dynamically-complex tasks like the water purification plant task (gonzalez & lebiere, 2005; gonzalez et al., 2003; martin, gonzalez & lebiere, 2004), training paradigms of simple and complextasks (gonzalez, best, healy, bourne & kole, 2010), simple stimulus-response practice and skill acquisition tasks (dutt, yamaguchi, gonzalez & proctor, 2009) and repeated binary-choice tasks (gonzalez & dutt, 2011; gonzalez et al., 2011; lebiere, gonzalez & martin, 2007; lejarraga et al., 2012) among others. the different computational applications of iblt illustrate its generality and ability to capture decisions from experience in multiple contexts. a recent ibl model has showcased the theory’s robustness across multiple choice tasks: a probabilitylearning task, a repeated binary-choice task with fixed probabilities, and a repeated binary-choice task with changing probabilities (lejarraga et al., 2012). we use this model to evaluate the effects of model cali10.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 3 http://dx.doi.org/10.11588/jddm.2015.1.17663 dutt & gonzalez: accounting for outcome and process measures figure 1. (a) the r-rate and a-rate across trials observed in human data in the estimation set of the tpt between trial 2 and trial 100. (b) the r-rate and a-rate across trials observed in human data in the competition set of the tpt between trial 2 and trial 100. bration to different outcome or process measures. the model’s formulations and decision-making process are further explained in other publications (gonzalez & dutt, 2011; lejarraga et al., 2012) and summarized in the appendix. this model makes choice selections between alternatives in a trial by comparing the weighted averages of observed outcomes on each alternative called “blended values.” a blended value for an alternative, safe or risky, is a function of the probability of retrieving instances from memory multiplied by their respective outcomes that have been observed on previous selections of the alternative (lebiere, 1999; lejarraga et al., 2012). each instance consists of a label that identifies a decision alternative in the task and the outcome obtained. for example, (risky, $32) is an instance where the decision was to choose the risky alternative and the outcome obtained was $32. the probability of retrieving an instance from memory, which is used to compute the blended value, is a function of its activation (anderson & lebiere, 1998). each observed outcome (represented by a corresponding instance in memory) has an activation value that is a function of the recency and frequency of observing the outcome plus a noise term. this simplified activation equation has shown to be sufficient at explaining human choices in several experiential tasks (gonzalez & dutt, 2011; lejarraga et al., 2012). the activation is influenced by the decay parameter , which captures the rate of forgetting or the reliance on recency and frequency of observing outcomes. the higher the value of the parameter, the greater is the model’s reliance on outcomes experienced recently. the activation is also influenced by a noise parameter that is important for capturing the variability in human behavior from one participant to another. ibl borrows d and s parameters and the activation equation from a popular cognitive framework called act-r (atomic components of thought – rational; anderson & lebiere, 1998). however, unlike act-r where d and s parameters are kept fixed, we calibrate the values of these parameters in the ibl model to account for choices in human data. the model equations for blending and activation are included in the appendix. results model calibration to different measures we used a genetic algorithm program to calibrate the model’s parameters to minimize the mean squared deviation (msd) between its predictions and the observed average a-rate per problem or the average rrate per problem. the average r-rate per problem and the average a-rate per problem were computed by averaging the risky choices and alternations in each problem over 20 participants per problem and 100 trials per problem (for a problem’s definition, please see the description above). later, the msds were calculated across the 60 estimation set problems by using the average r-rate per problem and by the average a-rate per problem from the model and human data. for calibration, both the s and the d parameters were varied between 0.0 and 10.0 and the genetic algorithm was run for 500 generations (crossover rate = 50%; mutation rate = 10%). the assumed range of variation for the s and d parameters and the number of generations in the genetic algorithm is large, and it ensures that the optimization process does not miss the minimum msd value due to a small range of parameter variation (for more details about genetic algorithm optimization, please see gonzalez & dutt, 2011). we calibrated the ibl model separately on the r-rate and the a-rate measures, and the optimized values of the d and s parameters were determined for each calibration. the model calibrated on the r-rate produced the smallest msd for d = 5.00 and s = 1.50. these parameters have the same optimal values as reported by lejarraga et al. (2012), who had also calibrated this ibl model on the r-rate measure on the same dataset. as documented by lejarraga et al. (2012), the value of both the d and s parameters is high compared to the act-r default values of d = 0.5 and s = 0.25 (anderson & lebiere, 1998). furthermore, the model calibrated on the a-rate produced the smallest msd for d = 9.74 and s = 0.96. thus, calibrating the model on the a-rate produces a greater value for the d parameter and a slightly smaller value for the s parameter. the greater d parameter value suggests a high 10.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 4 http://dx.doi.org/10.11588/jddm.2015.1.17663 dutt & gonzalez: accounting for outcome and process measures dependency on recently experienced outcomes to make choice decisions. figure 2 shows the msds for the r-rate and the arate from the ibl model that was calibrated on the rrate or the a-rate in the estimation set. when model parameters were calibrated on the r-rate (i.e., d = 5.0 and s = 1.5), the model explained the r-rate quite well (msd = 0.008), but it explained the a-rate less well (msd = 0.063). thus, the model explains the outcome measure well when calibrated on the outcome measure; but, it explains the process measure less well. in contrast, when the ibl model parameters are calibrated on the a-rate, the model explains the a-rate much better (msd=0.002) and the resulting r-rate also relatively well (msd = 0.023). thus, the benefit of calibrating the model on the a-rate measure (= 0.061) is larger than the detriment of calibrating the model on the r-rate measure (= 0.015). overall, these results show that by calibrating the ibl model to the process measure, one is able to explain both the process and outcome measures better than by calibrating the ibl model to the outcome measure. thus, these results suggest that the components of the ibl model are good representations of the a-rate process and well as the r-rate decision outcomes, especially when accounting for the a-rate is more challenging than the r-rate because the a-rate is more dynamic than the r-rate (gonzalez & dutt, 2011). figure 2. the msd for the r-rate per problem and a-rate per problem in the estimation set of the tpt. the model was either calibrated on the r-rate per problem or calibrated on the a-rate per problem in the estimation set. the calibrated values of the d and s parameters obtained for each measure(r-rate or a-rate per problem) have been shown in brackets.the differences for calibrating with a-rate measure(respective r-rate measure) are shown by two vertical arrows. figure 3 presents the human and model r-rate and a-rate across trials when the model was calibrated to the r-rate (figure 3a) and when it was calibrated to the a-rate (figure 3b). here, it can be observed how the model explains the human learning data better for the measure used to calibrate the model. generalizing the calibrated ibl model to the competition set the demonstration that calibrating a model to a process measure helps explain both the process and outcome measures is an important way to corroborate the consistency of predictions from cognitive models. a robust model should be able to explain the learning process, as well as the outcomes resulting from that very process. according to lebiere, gonzalez, and warwick (2009), models that explain only the outcome and not the process behavior might find it difficult to generalize their predictions to novel conditions. here, we used the generalization criterion test (ahn, busemeyer, wagenmakers, & stout, 2009; busemeyer & wang, 2000), to investigate the predictions that the different calibration procedures can make in novel data sets: we ran the calibrated models in novel conditions to evaluate and compare performance. the model calibrated to the tpt’s estimation set on the r-rate or the a-rate was generalized to tpt’s competition set by keeping the same parameter values that were derived during calibration. the model was run using 20 participants per problems and 60 problems in the competition set. there were different sets of problems used between the estimation and competition sets. also, these problems were run as part of two separate experiments involving different human participants. given these differences, one expects poorer performance from both the models in the competition set compared to the estimation set. however, as the algorithm used to generate problems in the competition set was same as that used to generate problems in the estimation set, one also expects both models to showcase results that are similar to those found for the estimation set: the model calibrated to the process measure is able to explain both the process and outcome measures better than the model calibrated to the outcome measure. figure 4 shows the resulting msds from generalizing the ibl model to the competition set. the model that was calibrated on the estimation set’s r-rate resulted in the best predictions for the same measure in the competition set (msd = 0.006); however, its predictions for the a-rate were relatively inferior (msd = 0.074). furthermore, the model that was calibrated on the a-rate resulted in the best predictions for the same measure in the competition set (msd = 0.006) with reasonably good predictions for the r-rate (msd = 0.032). thus, again the improvement in msd for the a-rate is larger than (= 0.068) the decrement in the msd for the r-rate (= 0.026). also note that the results in competition set (figure 4) generate poorer performance (higher msds) from the models, in general, compared to those in the estimation set (figure 2). as in the estimation set, these results translate to the process of learning over trials (see figure 5). the model’s predictions for the measure on which it was calibrated to in the estimation set are the best. the model that was calibrated on the r-rate in the estimation set predicted the r-rate better than the a-rate 10.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 5 http://dx.doi.org/10.11588/jddm.2015.1.17663 dutt & gonzalez: accounting for outcome and process measures figure 3. the r-rate and a-rate across trials predicted by the ibl model and that observed in human data in the tpt’s estimation set. panels a and b show the results of calibrating the ibl model to the r-rate per problem and the a-rate per problem, respectively. figure 4. the msd for the r-rate per problem and a-rate per problem in the competition set of the tpt. the model was either calibrated on the r-rate per problem or calibrated on the a-rate per problem in the estimation set. the calibrated values of the d and s parameters obtained for each measure (r-rate or a-rate per problem)in the estimation set have been shown in brackets. the differences for calibrating with a-rate measure (respective r-rate measure) are shown by two vertical arrows. (figure 5a); however, the model that was calibrated on the a-rate in the estimation set predicted both the r-rate and a-rate over time quite well (figure 5b). discussion we argue that strong and robust models of human behavior need to explain both the decision outcome and the process from which that outcome came about. we suggest that many models of human behavior, particularly in the context of repeated choice and dynamic decisions from experience, have only focused on predicting outcomes but not the process. furthermore, most of the existing computational models of experiential decisions explain the decision outcomes, while completely ignoring or failing to account for the process through which these decision outcomes are reached (see a review of models in (gonzalez & dutt, 2011). this observation is perhaps not a coincidence, because predicting outcome as a result of a process is very challenging (erev & barron, 2005; rapoport et al., 1997). our findings presented the robustness of explaining and predicting outcome and process measures through an ibl model. we demonstrated a method for finding out a cognitive model’s ability in explaining both the process and the decision outcomes. the model’s calibration on the process measure reduced the msd for the a-rate (process) by a large amount without a large deterioration in the msd for the r-rate (decision outcome). the proposed calibration was also helpful in accounting for both these measures after the model was generalized into a novel condition. explaining both the process and decision outcomes 10.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 6 http://dx.doi.org/10.11588/jddm.2015.1.17663 dutt & gonzalez: accounting for outcome and process measures figure 5. the generalization of the ibl model in the tpt’s competition set. (a) the model’s parameters were calibrated on the r-rate per problem measure in the tpt’s estimation set. (b) the model’s parameters were calibrated on the a-rate per problem measure in the tpt’s estimation set. is important, because doing so will improve our understanding of how people maximize long-term goals through the process of sequential choices from experience. several recent model-comparison competitions have suggested the use of different dependent measures for calibrating models without a clear motivation for choosing one measure over the other. for example, the measure of model evaluation in the tpt was solely risk-taking, i.e., decision outcomes (erev & barron 2005); however, the measure of evaluation in the recently concluded market-entry competition (erev, ert, & roth, 2010) was a combination of risk-taking (outcome) and alternations (process). our analysis suggests that stronger and more robust models of learning should be able to explain both the decision outcomes and the process by which these outcomes came about. future model comparison efforts should enforce both types of measures. in this paper, we used one ibl model to showcase the benefits of calibrating models on a process measure compared to an outcome measure. this attempt maybe limited in its ability at present as we only used one model, ibl, on two datasets. however, this attempt does showcase the wider generalizability of the theory, iblt, which has been used in literature to derive a number of models on a number of decision tasks (please see: gonzalez, in press; gonzalez, 2013 for more arguments). as part of our future research, we would like to build on our current finding by calibrating and evaluating models on both the outcome and process measures in various tasks that differ in their outcome feedback and dynamics. also, as part of future research, we would like to consider the mutual benefits of calibrating models to both process and decision outcomes especially when there are more than two measures. it would be interesting to observe the extent to which the benefits of calibrating models to different kinds of process measures carries over to different kinds of decision outcomes. in the case there are more than two measures, one could combine multiple process and outcome measures by doing a weighted sum of mean-squared deviations calculated on these measures. one could keep weights at values such that all combining measures are weighted equally during optimization. furthermore, it would be interesting to observe how calibrating models to the process measures carries over to the outcome measures when the calibration is done at the individual level rather than at the aggregate level. these evaluations would help extend our existing knowledge on this topic and help us explore benefits and limitations for computational models in explaining both the decision outcomes and the process through which these outcomes are reached. acknowledgements: this research is partially supported by the following funding sources: defense threat reduction agency (dtra) grant number: hdtra109-1-0053 to dr. cleotilde gonzalez; department of science and technology (dst) grant number: sr/csri/28/2013(g) to dr. varun dutt. we would also like to thank dr. ido erev of the technion-israel 10.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 7 http://dx.doi.org/10.11588/jddm.2015.1.17663 dutt & gonzalez: accounting for outcome and process measures institute of technology for making the data from the technion prediction tournament available. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: the authors contributed equally to this work. supplementary material: supplementary material available online. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: dutt, v. & gonzalez, c. (2015). accounting for outcome and process measures in dynamic decision-making tasks through model calibration. journal of dynamic decision making, 1, 2. doi:10.11588/jddm.2015.1.17663 received: 15 december 2014 accepted: 13 july 2015 published: 29 september 2015 references ahn, w. y., busemeyer, j. r., wagenmakers, e. j., & stout, j. c. (2009). comparison of decision learning models using the generalization criterion method. cognitive science, 32, 13761402. doi:10.1080/03640210802352992 anderson, j. r., & lebiere, c. (1998). the atomic components of thought. mahwah, nj: erlbaum. biele, g., erev, i., & ert, e. (2009). learning, risk attitude and hot stoves in restless bandit problems. journal of mathematical psychology, 53(3), 155-167. doi:10.1016/j.jmp.2008.05.006 busemeyer, j. r. (1985). decision making under uncertainty: a comparison of simple scalability, fixed sample, and sequential sampling models. journal of experimental psychology, 11, 538564. doi:10.1037/0278-7393.11.3.538 busemeyer, j. r., & diederich, a. (2009). cognitive modeling. new york, ny: sage publications. busemeyer, j. r., & wang, y.m. (2000). model comparison and model selections based on generalization criterion methodology. journal of mathematical psychology, 44(1), 171–189. doi:10.1006/jmps.1999.1282 camerer, c., & ho, t. h. (1999). experience-weighted attraction learning in normal form games. econometrica, 67(4), 827-874. retrieved from: http://www.jstor.org/stable/2999459 chen, w., liu, s. y., chen, c. h., & lee, y. s. (2011). bounded memory, inertia, sampling and weighting model for market entry games. games, 2, 187-199. doi:10.3390/g2010187 dember, w. n., & fowler, f. (1958). spontaneous alternation behavior. psychological bulletin, 55, 412–428. doi:10.1037/h0045446 dutt, v., yamaguchi, m., gonzalez, c., & proctor, r.w. (2009). an instance-based learning model of stimulus-response compatibility effects in mixed location-relevant and location-irrelevant tasks. in a. howes, d. peebles, r. cooper (eds.), 9th international conference on cognitive modeling – iccm2009. manchester, uk. retrieved from: http://act-r.psy.cmu.edu/wordpress/ wp-content/uploads/2012/12/863paper115.pdf erev, i., & barron, g. (2005). on adaptation, maximization and reinforcement learning among cognitive strategies. psychological review, 112(4), 912-931. doi:10.1037/0033-295x.112.4.912 erev, i., ert, e., & roth a. e. (2010). a choice prediction competition for market entry games: an introduction. games, 1(2), 117-136. doi:10.3390/g1020117 erev, i., ert, e., roth, a. e., haruvy, e., herzog, s. m., hau, r., hertwig, r., stewart, t., west, r., & lebiere, c. (2010). a choice prediction competition: choices from experience and from description. journal of behavioral decision making, 23(1), 15-47. doi:10.1002/bdm.683 erev, i., & haruvy, e. (2005). generality, repetition, and the role of descriptive learning models. journal of mathematical psychology, 49(5), 357-371. doi:10.1016/j.jmp.2005.06.009 estes, w. k. (1962). learning theory. annual review of psychology, 13, 107-144. doi:10.1146/annurev.ps.13.020162.000543 gonzalez, c. (in press). decision making: a cognitive science perspective. chapter 6 (pp. tbd). in chipman, s. (ed.), the oxford handbook of cognitive science. new york, ny: oxford university press. gonzalez, c. (2013). the boundaries of instance-based learning theory for explaining decisions from experience. chapter 5, pp. 73-98. in pammi and srinivasan (eds.), decision making: neural and behavioural approaches. vol. 202, progress in brain research. new york, ny: elsevier. gonzalez, c., best, b. j., healy, a. f., bourne, l. e., jr, & kole, j. a. (2010). a cognitive modeling account of simultaneous learning and fatigue effects. journal of cognitive systems research, 12(1), 19-32. doi:10.1016/j.cogsys.2010.06.004 gonzalez, c., & dutt, v. (2011). instance-based learning: integrating sampling and repeated decisions from experience. psychological review, 118, 523-551. doi:10.1037/a0024558 gonzalez, c., dutt, v., & lejarraga, t. (2011). a loser can be a winner: comparison of two instance-based learning models in a market entry competition. games, 2(1), 136-162. doi:10.3390/g2010136 gonzalez, c., & lebiere, c. (2005). instance-based cognitive models of decision making. in d. zizzo & a. courakis (eds.), transfer of knowledge in economic decision-making (pp.148-165). new york, ny: palgrave macmillan. gonzalez, c., lerch, f. j., & lebiere, c. (2003). instance-based learning in real-time dynamic decision making. cognitive science. 27(4), 591-635. doi:10.1016/s0364-0213(03)00031-4 green, l., price, p. c., & hamburger, m. e. (1995). prisoner’s dilemma and the pigeon: control by immediate consequences. journal of experimental analytical behaviour, 64, 1–17. doi:10.1901/jeab.1995.64-1 hertwig, r., barron, g., weber, e. u., & erev, i. (2004). decisions from experience and the effect of rare events in risky choice. psychological science, 15, 534-539. doi:10.1111/j.09567976.2004.00715.x hills, t. t., & hertwig, r. (2010). information search in decisions from experience: do our patterns of sampling foreshadow our decisions? psychological science, 21(12), 178710.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 8 http://dx.doi.org/10.1080/03640210802352992 http://dx.doi.org/10.1016/j.jmp.2008.05.006 http://dx.doi.org/10.1037/0278-7393.11.3.538 http://dx.doi.org/10.1006/jmps.1999.1282 http://www.jstor.org/stable/2999459 http://dx.doi.org/10.3390/g2010187 http://dx.doi.org/10.1037/h0045446 http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/12/863paper115.pdf http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/12/863paper115.pdf http://dx.doi.org/10.1037/0033-295x.112.4.912 http://dx.doi.org/10.3390/g1020117 http://dx.doi.org/10.1002/bdm.683 http://dx.doi.org/10.1016/j.jmp.2005.06.009 http://dx.doi.org/10.1146/annurev.ps.13.020162.000543 http://dx.doi.org/10.1016/j.cogsys.2010.06.004 http://dx.doi.org/10.1037/a0024558 http://dx.doi.org/10.3390/g2010136 http://dx.doi.org/10.1016/s0364-0213(03)00031-4 http://dx.doi.org/10.1901/jeab.1995.64-1 http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.11588/jddm.2015.1.17663 dutt & gonzalez: accounting for outcome and process measures 1792. doi:10.1177/0956797610387443 johnson, e. j., schulte-mecklenbeck, m., & willemsen, m. (2008). process models deserve process data: comment on brandstätter, gigerenzer, & hertwig (2006). psychological review, 115(1), 263-272. doi:10.1037/0033-295x.115.1.263 lebiere, c. (1999). blending: an act-r mechanism for aggregate retrievals. paper presented at the sixth annual act-r workshop at george mason university. retrieved from: http://act-r.psy.cmu.edu/wordpress/wp-content/themes/ act-r/workshops/1999/talks/blending.pdf lebiere, c., gonzalez, c., & martin, m. (2007). instancebased decision making model of repeated binary choice. in proceedings of the 8th international conference on cognitive modeling (pp. 67-72). oxford, uk: psychology press. retrieved from: http://repository.cmu.edu/cgi/viewcontent.cgi ?article=1083&context=sds lebiere, c., gonzalez, c., & warwick, w. (2009). a comparative approach to understanding general intelligence: predicting cognitive performance in an open-ended dynamic task. in goertzel, b., hitzler, p., & hutter, m., (eds.), proceedings of the second conference on artificial general intelligence, 103-107. amsterdam-paris: atlantis press. doi:10.2991/agi.2009.2 lee, m. d., zhang, s., munro, m., & steyvers, m. (2011). psychological models of human and optimal performance in bandit problems. cognitive systems research, 12, 164-174. doi:10.1016/j.cogsys.2010.07.007 lejarraga, t., dutt, v., & gonzalez, c. (2012). instancebased learning: a general model of repeated binary choice. journal of behavioral decision making, 25(2), 143-153. doi:10.1002/bdm.722 martin, m. k., gonzalez, c., & lebiere, c. (2004). learning to make decisions in dynamic environments: act-r plays the beer game. in proceedings of the sixth international conference on cognitive modeling (pp. 178-183). mahwah, nj: erlbaum. retrieved from: http://repository.cmu.edu/cgi/ viewcontent.cgi?article=1087&context=sds nevo, i., & erev, i. (2012). on surprise, change, and the effect of recent outcomes. frontiers in psychology, 3, 1-9. doi:10.3389/fpsyg.2012.00024 ratcliff, r. (1978). a theory of memory retrieval. psychological review, 85, 59–108. doi:10.1037/0033-295x.85.2.59 ratcliff, r., & smith, p. (2004). a comparison of sequential sampling modles for two-choice reaction time. psychological review, 111, 333–367. doi:10.1037/0033-295x.111.2.333 rapoport, a., & budescu, d.v. (1992). generation of random series in two-person strictly competitive games. journal of experimental psychology: general, 121, 352–363. doi:10.1037/00963445.121.3.352 rapoport, a., erev, i., abraham, e. v., & olson, d. e. (1997). randomization and adaptive learning in a simplified poker game. organizational behavior and human decision processes, 69(1), 31-49. doi:10.1006/obhd.1996.2670 scheres, a. & sanfey, a.g. (2006). individual differences in decision-making: drive and reward responsiveness affects strategic bargaining in economic games. behavioral and brain functions, 2, 35. doi:10.1186/1744-9081-2-35 suppes, p., & atkinson, r.c. (1959). markov learning models for multiperson situations, i. the theory. technical report prepared under contract nonr 255(17)(nr 171-034), 21, 1-78. retrieved from: http://suppes-corpus.stanford.edu/techreports/ imsss_21.pdf tolman, e.c. (1925). purpose and cognition: the determiners of animal learning. psychological review, 32, 285–297. doi:10.1037/h0072784 tversky, a., & kahneman, d. (1992). advances in prospect theory: cumulative representation of uncertainty. journal of risk uncertainty, 9, 195–230. doi:10.1007/bf00122574 vandekerckhove, j., & tuerlinckx, f. (2007). fitting the ratcliff diffusion model to experimental data. psychonomic bulletin & review, 14, 1011-1026. doi:10.3758/pbr.15.6.1229 appendix decision rule a choice is made in the model in trial t+1 as the selection of an alternative with the highest blended value as per equation 1 (below). blending and activation mechanisms the blended value of alternative j is defined as vj = n∑ i=1 pixi (1) where xi is the value of the observed outcome in the outcome slot of an instance i corresponding to the alternative j, and pi is the probability of that instance’s retrieval from memory (for the case of our binarychoice task in the experience condition, the value of j in equation 1 could be either risky or safe). the blended value of an alternative is the sum of all observed outcomes xi in the outcome slot of corresponding instances, weighted by the instances’ probability of retrieval. probability of retrieving instances in any trial t,the probability of retrieving instance i from memory is a function of that instance’s activation relative to the activation of all other instances corresponding to thatalternative, given by pi,t = e ai,t π∑ j e ai,t π (2) where π is random noise defined as s× √ 2 s and is a free noise parameter. the noise parameter s captures the imprecision of retrieving instances from memory. activation of instances the activation of each instance in memory depends upon the activation mechanism originally proposed in act-r [2]. according to this mechanism, for each trial t, activation ai,t of instance i is: ai,t = ln ( ∑ ti∈1,...,t−1 (t− ti)−d) + s× ln ( 1 −yi,t yi,t ) (3) where d is a free decay parameter, and ti is a previous trial when the instance i was created or its activation was reinforced due to an outcome observed in 10.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 9 http://dx.doi.org/10.1177/0956797610387443 http://dx.doi.org/10.1037/0033-295x.115.1.263 http://act-r.psy.cmu.edu/wordpress/wp-content/themes/act-r/workshops/1999/talks/blending.pdf http://act-r.psy.cmu.edu/wordpress/wp-content/themes/act-r/workshops/1999/talks/blending.pdf http://repository.cmu.edu/cgi/viewcontent.cgi?article=1083&context=sds http://repository.cmu.edu/cgi/viewcontent.cgi?article=1083&context=sds http://dx.doi.org/10.2991/agi.2009.2 http://dx.doi.org/10.1016/j.cogsys.2010.07.007 http://dx.doi.org/10.1002/bdm.722 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1087&context=sds http://repository.cmu.edu/cgi/viewcontent.cgi?article=1087&context=sds http://dx.doi.org/10.3389/fpsyg.2012.00024 http://dx.doi.org/10.1037/0033-295x.85.2.59 http://dx.doi.org/10.1037/0033-295x.111.2.333 http://dx.doi.org/10.1037/0096-3445.121.3.352 http://dx.doi.org/10.1037/0096-3445.121.3.352 http://dx.doi.org/10.1006/obhd.1996.2670 http://dx.doi.org/10.1186/1744-9081-2-35 http://suppes-corpus.stanford.edu/techreports/imsss_21.pdf http://suppes-corpus.stanford.edu/techreports/imsss_21.pdf http://dx.doi.org/10.1037/h0072784 http://dx.doi.org/10.1007/bf00122574 http://dx.doi.org/10.3758/pbr.15.6.1229 http://dx.doi.org/10.11588/jddm.2015.1.17663 dutt & gonzalez: accounting for outcome and process measures the task (the instance i is the one that has the observed outcome as the value in its outcome slot). the summation will include a number of terms that coincides with the number of times an outcome has been observed in previous trials and the corresponding instance i’s activation that has been reinforced in memory (by encoding a timestamp of the trial ti). therefore, the activation of an instance corresponding to an observed outcome increases with the frequency of observation and with the recency of those observations. the decay parameter d affects the activation of an instance directly, as it captures the rate of forgetting or reliance on recency. noise in activation the yi,t term is a random draw from a uniform distribution u(0, 1), and the s× ln ( 1−yi,t yi,t ) term represents gaussian noise important for capturing the variability of human behavior. pre-populated instances in memory for the first trial,the ibl model does not haveany instances in memory from which to calculate blended values. therefore, the model is made to make a selection between instances that are pre-populated in memory. lejarraga, dutt, and gonzalez [23] used a value of +30 in the outcome slot of the two alternatives’ instances. the +30 value is arbitrary, but most importantly, it is greater than any possible outcomes in the tpt problems and will trigger an initial exploration of the two alternatives. we use these pre-populated values in the model in this paper. 10.11588/jddm.2015.1.17663 jddm | 2015 | volume 1 | article 2 | 10 http://dx.doi.org/10.11588/jddm.2015.1.17663 theoretical contribution the empirical potential of live streaming beyond cognitive psychology alexander nicolai wendt1 1department of psychology, heidelberg university, germany empirical methods of self-description, think aloud protocols and introspection have been extensively criticized or neglected in behaviorist and cognitivist psychology. their methodological value has been fundamentally questioned since there apparently is no sufficient proof for their validity. however, the major arguments against selfdescription can be critically reviewed by theoretical psychology. this way, these methods’ empirical value can be redeemed. furthermore, self-descriptive methods can be updated by the use of contemporary media technology. in order to support the promising perspectives for future empirical research in the field of cognitive psychology, live streaming is proposed as a viable data source. introducing this new paradigm, this paper presents some of the formal constituents and accessible contents of live streaming, and relates them to established forms of empirical research. by its structure and established usage, live streaming bears remarkable resemblances to the traditional methods of self-description, yet it also adds fruitful new features of use. on the basis of its qualities, the possible benefits that appear to be feasible in comparison with the traditional methods of self-description are elaborated, such as live streaming’s ecological validity. ultimately, controversial theoretical concepts, such as those in phenomenology and cultural-historical psychology, are adopted to sketch further potential benefits of the utility of live streaming in current psychology debates. keywords: live streaming, think aloud protocol, introspection, cognitive psychology, phenomenology live streaming is a multimedia technology whichoriginates in the advances of the web 2.0 (li & yin, 2007). it is constituted by user-created digital video streams that are transmitted via hosting platforms, such as prominently twitch.tv. unlike videoon-demand formats, live streaming is submitted in real time (karat et al., 2002). the immense requirement of bandwidth capacity did not allow for the reliable usage of the format for mass audiences before the second decade of the twenty-first century. as a result, the technology of live streaming has to be regarded as completely up-to-date in 2017. its formal compounds consist of video and audio recordings of content that usually relates indirectly to the user – called “streamer” -, e.g. by showing their digital video gameplay, or directly containing them, especially using webcam recordings. the material is transmitted in a single frame and a single audio track to the audience, who can choose whether or not to interact with the streamer via written real time group chat (barasch & berger, 2014; franquet i calvet, villa montoya, & bergillos garcía, 2013; ko, chang, & chu, 2013). despite offering great variety of possible contents and contexts in initiating the usage of live streaming in psychology, its most pertinent format seems to be the submission of stationary single streamers that maintain a single content for a sufficiently long duration of time, especially video games. video games are structurally characterized by their similarity to established paradigms of experimental psychology, e. g., problem solving tasks and dynamic decision making. to investigate dynamic decision making, empirical psychology employs situation simulation in virtual environments that are structurally equivalent to video games. e. g., güss, tuason and orduña (2015) investigate the possibility to observe strategies, tactics and errors by the use of a digital microworld. they state that “complex and dynamic computersimulated problem scenarios” (p. 3) serve the investigation of the fields of complex problem solving and dynamic decision making. another example is the study concerning the influence of personality on dynamic decision making by nicholson and o’hare (2014). just as güss et al. (2015), they found there research on the use of computer simulations that contain an interface which is conceptually analogous to video games. this diversity of research displays that computer simulations can be used for cognitive sciences in various ways. however, live streaming offers a more elaborated opportunity to study the participants’ behavior than the research based on computer simulations provides so far because the material obtained by live streaming enables a more detailed opportunity of observation and a fruitful analogy to the methodology of think aloud protocols since they observe individuals in their behavior, attending to singular tasks in a comparable manner (funke & spering, 2006). yet, there are incremental discrepancies between both data sources, live streaming and think aloud protocols, that indicate an advantage of using data obtained through live streaming. in this paper, the innovative application of corresponding author: alexander n. wendt, department of psychology, heidelberg university, hauptstr 47-51, 69117 heidelberg, germany. email: alexander@puwendt.de 10.11588/jddm.2017.1.33724 jddm | 2017 | volume 3 | article 1 | 1 mailto:alexander@puwendt.de https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2017.1.33724 wendt: the empirical potentials of live streaming live streaming as a data source in experimental psychology is proposed. this approach will be founded on the basis of classical epistemological and methodological debates. recent controversy in experimental psychology has indicated that the discipline’s current methodological principles cannot guarantee further advances in understanding psychological behavior and experience (funke, 2014; jäkel & schreiber, 2013; ohlsson, 2012). within experimental psychology, the currently predominant paradigm can be called cognitive psychology (neisser, 2014). its foundation may be criticized as a “mechano-representationalist approach” (which consists of regarding cognition as information processing, for example, in computational modelling, and indirect realism), and has been questioned from external points-of-view, e. g., by phenomenology (hutto, 2008). following these critical approaches, the methodological exclusion of classical empirical concepts – such as introspection or think aloud protocols (graumann, 1991; fahrenberg, 2015) – that was recommended by nisbett and wilson (1977) may be scrutinized theoretically. in opposition to the standpoint of mere positivist methodology, this paper supports the re-integration of these classical methods on the basis of a theoretical discussion of the discipline’s principles. in order to enrich epistemological debates within psychology, theoretical standpoints from a number of other fields are adopted, such as cultural-historical psychology and phenomenology. crucially, the paper proposes that the methods’ usability can be enhanced by combining them with recent technological development, namely live streaming. the epistemological debate the intuitive relevance of self-descriptive access to one’s own or others’ cognition has been palpable throughout the history of both naïve and empirical interest in behavior and experience. in the early conceptual stages of empirical psychology, these approaches have been most regularly used and initially appeared to be more reliable than any experimental methods (fahrenberg, 2015; galliker, 2016; walach, 2013). yet, over the course of paradigm shifts in the twentieth century within the discipline, the methods’ pertinence was fundamentally questioned, to an equal degree in behaviorism and cognitive psychology. while the former denied the methods’ objectives, the latter disclaimed their validity. in the second half of the twentieth century, neither introspection nor thinking aloud influenced the discipline’s development to any noteworthy extent, as summarized by lyons (1986). yet the methods’ apparent face validity was not obfuscated by this extensive critique; describing one’s own experiences or regarding others’ descriptions of their experiences remain to be the most intuitive form of psychology. therefore, despite being harshly questioned, the core of these methods cannot lose its fundamental relevance – even if cognitive psychology might regard them only as empirical phenomena instead of reliable sources of data. however, recent comments in psychology question the significance of the cognitive paradigm from standpoints external to empirical psychology (hutto, 2008; petitmengin & bitbol, 2009; zahavi & gallagher, 2008) as well as internal to the cognitive sciences (funke, 2014; jäkel & schreiber, 2013; ohlsson, 2012). subsequently, the critique of self-descriptive methods as developed by cognitive psychology is weakened, and the possibility for the review of previous controversies within experimental psychology arises. these circumstances favor not only methodological deliberation about the use of self-descriptive methods, but also discussion about psychological theory’s epistemological foundations. accordingly, a viewpoint beyond cognitive psychology can advocate the structurally scrutinized methods of self-description by considering alternative approaches to the objectives of psychology. yet, in order to access this deliberation, a sufficient representation of the epistemological status quo is required. the epistemological and methodological foundations of the debate about self-descriptive methods is characterized by three problems. first, the so-called subject-object problem. as put by jaspers (1953, p. 25, translation by the author), “when we regard ourselves as the object, we become another one for ourselves while at the same time maintaining to be the thinking i itself”. the problem’s origin resides within the separation of subject and object made in indirect realism. whilst the dualistic assumption of a separation between the perceiving subject and the perceived object is maintained, the self appears to be urged into a chimerical position of subject and object at the same time. still, this statically egological conception of consciousness, as it has been established in cartesian tradition, cannot claim an exclusive prevalence. consider here e.g. gurwitsch (1941), who demonstrates the existence of various non-egological conceptions of consciousness in the phenomenological province in husserl or sartre. the subject-object problem can only be seen as a disqualification of self-descriptive methods if the standpoint of indirect realism is radically maintained. notwithstanding this epistemological reduction, the issue of the subject-object problem can be avoided if not resolved. the debate’s second aspect can be entitled, the problem of methodology. it considers the question of whether self-description (introspection or thinking aloud) can be sufficiently justified as a source of empirical data in psychology. for behaviorism and the computational theory of mind in cognitive psychology, self-description has been judged to be an insufficient source of behavioral data (aanstoos, 1987). therefore, skepticism towards introspection and thinking aloud have been prevalent. following petitmengin and bitpol (2009) and jäkel and schreiber (2013), the concerns of these dominant paradigms in empirical psychology can yet be answered. first, critique of self-description claims that the instruction to observe oneself contaminates the original behavior. however, this constraint must be preceded by the knowledge of 10.11588/jddm.2017.1.33724 jddm | 2017 | volume 3 | article 1 | 2 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2017.1.33724 wendt: the empirical potentials of live streaming the behavior that in fact cannot be determined before observation. therefore, this critique is based on a conception of sterile subjectivity that cannot be maintained in the light of an elaborate understanding of consciousness (zahavi, 2005). second, skeptical critique claims that the observed object of thought does not remain the same for the case of self-description. nevertheless, this comment poses a naïve correspondence theory of truth that is ignorant of the subjective constitution of understanding. third, the claim that the data of self-description cannot be reproduced can rather be applied to all sources of behavioral data instead of being biased against self-descriptive methods – a more neutral perspective, which has been applied recently (open science collaboration, 2015). in this sense, the issue of reproducibility is not a mere issue for methods of self-description, and they do not appear any less applicable than other sources of data. ultimately, the problem of methodology has been discussed as psychology seeks to position itself between two extreme attitudes, either proposing selfdescription’s infallibility, as did classical philosophers (e. g., descartes, locke, husserl), or else completely rejecting the method’s applicability. yet petitmengin and bitpol (2009) offer a compromise that abandons neither the methods nor the skepticism. they approach self-description carefully by regarding its epistemologically problematic nature while sustaining the valuable perspective of an immediate access to behavior and experience. in their case, they adopt a procedure of controlled continuous remembrance of the selfdescribed episode to expose more psychological data to the empirical observation – an attempt that resembles the classical concepts of brentano (1874). the third problem regarding self-description’s epistemological conditions can be called the problem of method. it regards the relation between the selfdescriptive act (the subject’s experience itself), the self-descriptive predication (the subject’s verbalization) and the self-descriptive message (the verbalization’s understanding). the two extreme positions towards self-description, its infallibility on the one side, and the rejection of its applicability on the other, can be characterized by these three elements. with regard to infallibility, the self-descriptive act evokes its adequate predication so that the message can be understood if accurate measures are applied. from this point of view, the interpretation of self-description becomes a hermeneutical matter. with regard to rejection, the self-descriptive act is rather unable to be predicated adequately (as nisbett and wilson, 1977 claim to expose), with the result that its message remains unrelated to the original act. from this standpoint, self-description is a futile endeavor. the compromise proposed by petitmengin and bitpol (2009) can be expressed by the relation of these three aspects, too: the self-descriptive predication has a contingent access to the act that can be activated by a procedure that resembles the phenomenological epoché, which is a continuous bracketing of experiences which is not essential to the act. however, these three approaches to the problem of method – infallibility, rejection and the (phenomenological) compromise – are equally reliant on a genetic subjectivity, which regards the act as the primary and unidirectional source of the selfdescription. on the contrary, a dialogical understanding of the situation of self-description, as it can be rendered on the basis of the cultural-historical psychology (e. g., vygotsky, 1986), offers an alternative view. whereas the previous approaches require one to comprehend the self-descriptive act as the spontaneous experience of an independent subject that unidirectionally evokes its predication, a dialogical understanding considers the possibility of a bidirectional influence of the self-descriptive predication and message. by this theoretical approach, higher cognition is exclusively available because it relates to the symbolic order that can be accessed only through dialogical exchange (werani, 2011). in other words, from a culturalhistorical standpoint, the self-descriptive message precedes the self-descriptive act, as it can be observed in infants’ egocentric speech or clinical cases (morin, 2009; shengold, 1978). the decision to include this theoretical alternative, although not necessarily adopting it, enriches the problem’s controversy by a significant aspect. the self-descriptive act no longer remains the posited internal cause of all self-description, but is understood equally as cause and effect of the experience which itself originates from the cultural situation of the dialogue. allowing for this understanding means that the utility of self-description as a method in empirical psychology is not exclusively determined by access to a merely speculative instance of subjectivity. understanding individuals’ behavior and experience becomes a matter of understanding entirely observable processes. from the cultural-historical perspective, self-description advances from a questionable to a more promising concept (alderson-day & fernyhough, 2015). ultimately, the methods’ viability is not as simple to judge as is normally assumed by cognitive psychology because it does not regard the dialogical approach’s premises. instead of neglecting the methods automatically within cognitive psychology, the discussion of its anthropological conditions becomes relevant. a critical standpoint towards cognitive psychology’s “mechano-representationalist approach” and its two compounds – namely [1] cognition as information processing as exemplary in the computational theory of mind and [2] indirect realism – which yet remain predominant as the current epistemological paradigm in experimental psychology, allows us to advocate the methods of self-description, introspection and thinking aloud, for the three depicted problems, the subjectobject problem, the problem of methodology and the problem of method. clearly, the points made above do not entirely dismiss the critique – especially concerns of standardization, reliability and objectivity remain – but they can nullify the apparent banishment of selfdescription in experimental psychology since these remaining concerns are generally relevant for all kinds of data sources. as it were, a discursive field of an10.11588/jddm.2017.1.33724 jddm | 2017 | volume 3 | article 1 | 3 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2017.1.33724 wendt: the empirical potentials of live streaming thropological debate in theoretical psychology can be exploited again that had already been fruitfully tilled in the discipline’s history. on this basis the possibility of eliciting live streaming’s viability as a method of empirical psychology comes into being. what is live streaming? in its current application, live streaming is not designed to be a paradigm of empirical psychology but to be a medium of communication and entertainment. the streamer decides to connect to a digital host for her content and provides the streamed material for as long as she plans. there are few editorial limitations. for example, in case of the hosting platform twitch.tv, the content is supposed to relate to video games, yet equally to allow for different content, like streamers, to present crafting activities, footage of conventions or art. evaluation of its viability therefore requires an initial analysis of live streaming’s general structure to enable a basic understanding of the medium’s relevant details that can be availed as its empirical design features. in the following section, a description by form and by content will be given/provided. in terms of media linguistics (schmitz, 2015), live streaming is a transient, current, oral form of communication based on dynamic images. the requisite compounds are the video capture of the streamer by webcam, the video capture of the content by camera or computer screen capturing, the streamer’s audio track captured by microphone, the content’s audio track captured by microphone or direct computer audio capturing and (optionally) the written interaction with the audience. live streaming’s most common function of communication is entertainment from the audience’s point of view, while the streamer may be relating to the stream either professionally or for the purposes of leisure. its mode of communication is characterized as multimodal, since it integrates different modes, such as audio, video and written text. the combination of different modes varies between the individual streams. sometimes the webcam image has a prominent appearance, whereas in other cases the focus might lie merely on the content. the action can also continuously shift between sources. an interesting further aspect of media linguistics is orality. as live streaming is a medium of the internet, its language communication is occasionally selfreferential. in linguistics, naïve direct oral communication which does not reflect upon its own status as language is called primary orality, as that which can be found in infants before they learn to write and read. secondary orality, on the other hand, is aware of its own conditions, for example its grammar. in the case of live streaming, media linguistics observes tertiary orality, which by the means of new media starts to relieve the boundaries established in the secondary orality. as for live streaming, this can be stated by notingvarious characteristics of jargon in oral and written communication. ultimately, the analysis of media linguistics provides a comparison with more established forms of communication, such as television. television is a form of communication that is unidirectional, institutionalized, edited and directed towards an audience, while live streaming is bidirectional, decentralized, autonomous and contingently produced for an audience. these aspects not only differentiate the two media, but are also highly relevant for their ecological validity as potential paradigms in empirical psychology. the analysis of media linguistics depicts live streaming as a form of communication with a consistent pattern of action. in its core characteristics, live streaming can be seen as a viable data source for field research because the general setup includes constant conditions which allow for comparisons both between subjects and within a single subject. furthermore, the structural similarity to laboratory settings used by experimental psychology reinforces this viability and provides perspectives for the integration of the data. the stream meets the formal criteria for think aloud protocols (funke & spering, 2006), making the streamer the research’s subject. moreover, live streaming already implicitly contains constituents that are assessed as desirable for future research in experimental psychology’s think aloud protocols, such as observation by webcam (elling, lentz, & de jong, 2012). therefore, in its form, live streaming can be altogether presumed to be a reliable and effective source of empirical data that is able to deal with the crucial concern of think aloud protocols’ ecological validity while being intuitively accessible for comparisons with established self-descriptive laboratory research. regarding its content, however, live streaming in theory is not restricted to a certain domain of situations for the streamer. the only limitations posed to the stream are bound to the technology’s limits (yet these are de facto less restricted than any laboratory research) and the cultural dynamics that evoke the streamer’s decisions (which yet are a priori not less creative than researchers’ designs). still, certain situations are more favorable for the interests of experimental psychology’s research. for example, in so far as the comparison with psychological research into problem solving appears promising, live streams that present scenarios comparable to problem solving behavior or involve dynamic decision making are evidently preferable. in this case, the preference can be matched easily because the most established and common live streams contain video games – a content that is fairly similar to most problem solving paradigms (monjelat, zaballos, & lacasa, 2012). as the example of this relationship between video games as the content of live streaming and problematic situations as the content of empirical paradigms in experimental psychology shows, live streaming’s content contributes effectively to matters of psychology. various and most typical cases of empirical research deal with games, be it board games such as chess (aanstoos, 1983; de groot, 1965; huizinga, 1949; mcgonigal, 2011) or video games (adachi & willoughby, 2013; günzel, 2016; sturz, bodily, & katz, 2009). by their 10.11588/jddm.2017.1.33724 jddm | 2017 | volume 3 | article 1 | 4 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2017.1.33724 wendt: the empirical potentials of live streaming design, video games present highly standardized situations that can be independently repeated, therefore qualifying their contents as independent variables. yet the variety of different video games at the same time creates an opportunity and poses a challenge to their scientific interpretation. the situations in which the streamer can engage are dependent on the genre, the singular video game and even the play mode she selects. therefore, the scientific interpretation of these games requires a certain knowledge of their structure before being sufficient for comparison with established data sources in cognitive sciences. to facilitate the psychological interpretation, cognitive sciences employ descriptions of video games drawing from media theory. for example, günzel (2013; 2014; 2016) offers a terminology and ontology to classify video games in general and especially in the case of the genre of first-person shooter. he regards the material compounds of the spatial and temporal dimensions within the simulation as well as the subject’s perception. by referring to media theory, psychology can decipher the structure of video games to enable a sufficient understanding of their setup in order to determine the elements that compare with established experimental paradigms. however, crucial analogies between video games and empirical paradigms are already obvious at first sight. the difference between a digital simulation of chess and the board game itself, as it has been made a topic of psychology, is minimal; equally, there is a significant similarity between card games and digital simulation of card games, even when they are including additional graphic animations. even more complex cases, such as strategic or action real time simulations, that integrate the dimension and experience of time into the game still have manifest consistency with empirical paradigms, which can be made visible when the games are described by their fundamental algorithmic compounds. clearly, video games offer a major variety of scenarios that can be made the subject of discussion in the cognitive sciences. psychological research has already utilized this feature of computer simulations (dörner, kreuzig, reither, & stäudel, 1983; greiff, holt, wüstenberg, goldhammer, & funke, 2013). live streaming’s contribution to empirical psychology to sum up the arguments gathered about live streaming’s empirical potential so far, first, the epistemological context has been outlined. it has been shown that the rejection of self-descriptive methods relies on the presumptions of cognitive psychology, so that a phenomenological perspective on the subject-object problem, the problem of methodology and the problem of method could rule out the formal necessity of this rejection. following the cultural-historical psychology, an alternative theoretical evaluation of self-description based on the primacy of social interaction can be employed that hosts the viability of thinking aloud and introspection which can be applied consecutively to the data source of live streaming. second, the introduction of live streaming’s formal compounds allowed the comparison with established forms of empirical observation and their objects: in the case of live streaming, the exemplary content of video games has been outlined. now, in the third step, live streaming’s possible contribution to psychological research shall be sketched out. overall, besides the formal viability of live streaming as a data source for empirical sciences, its content, through the example of video games (and beyond it), also bears sufficient similarity to prominent psychological matters. moreover, the medium offers even more, yet contingent possibilities, such as overcoming the stationary data acquisition of laboratories by portable devices like smartphones that allow streaming in spatially dynamic contexts. also, despite the ostensible advantage of greater ecological validity, live streams cover extensive circumferences of material: the example of streamer octavian morosan (https://www.twitch.tv/nl_kripp) demonstrates the coverage of five years of almost daily recordings by an average of more than six hours uninterrupted streaming schedule in a fairly standardized setup with sufficiently repeated content of video games – about 9000 hours of recordings. this amount of data is practically impossible to be generated in a laboratory. developing a reliable way to analyze and interpret these data in the context of the above described debate about self-description allows for various uncharted empirical contents, such as long term developments in biographies or detailed observations of lifeworld. nevertheless, the decisive concern remains how live streaming provides a service for the purposes of current psychological study. although apparently an applicable data source and qualitative observation method, it has to contribute to a critical contentrelated controversy in psychology in order to be a valuable addition to empirical methods. in regard to this matter, the above mentioned field of problem solving and dynamic decision-making research indicates a direction in which live streaming’s material can be of use. twentieth-century research about problem solving has been based on the approach provided by newell und simon (1972) that employs the computational theory of mind. problems are conceptualized as the relation between initial state and goal state, inhibited by barriers. yet, this approach has been criticized in its foundations in cognitive psychology (aanstoos, 1983; radley, 1991; wertz, 1993) and its empirical applicability (funke, 2014; getzels, 1982; ohlsson, 2012; quesada, kintsch, & gomez, 2005). a fundamental revision of problem solving therefore bears a promising potential. however, this revision demands an essential expansion of the scope of previous reductionist theories. for newell & simon (1972), a problem is determined exclusively by its formal relations and it is thereby, for example, in no way different whether it is a problem for a human or a computer, whether it is solved in existential distress or as a matter of routine. the subjectivity involved in having a prob10.11588/jddm.2017.1.33724 jddm | 2017 | volume 3 | article 1 | 5 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2017.1.33724 wendt: the empirical potentials of live streaming lem is not factored in by the computational theory of mind, the subject’s perspective is merely embedded in the constellation of particular elements. a phenomenological approach to the notion of the problem, on the other side, would succeed to recover the totality that is present in the experience of having a problem. phenomenology can discern between the formal relation of the problem material’s elements and the subject’s personal situation when it is facing the problem. yet, these phenomenological thoughts methodologically are insufficiently founded when analyzed in the terms of contemporary psychology. they refer to occurrences that are hardly observed in the laboratory’s sterile environment. to support this critical side of the controversy about problem solving and dynamic decision making, new means of empirical methodology are required. live streaming can propose a viable and promising candidate for this spot since it transcends the structural limitations of the empirical laboratory. in this context, koro-ljungberg, douglas, therriault, malcolm and mcneil (2013) ask whether think aloud protocols are viable in settings that were not generated by the investigator but are rather determined autonomously by the subject. adopting a constructivist point of view, they opt for an expansion of traditional think aloud protocols by follow-up interviews to improve the observation of subjects’ generation of knowledge whereas traditional think aloud protocols neglect the plasticity of actual behavior. this perspective equally applies to live streaming. the scope of observation can be expanded because the subjects act autonomously. regarding this aspect, the apparent weakness of absent laboratory control can even be seen as an advantage. another important perspective for live streaming is the above mentioned ecological validity. the empirical material’s authenticity is dependent on the design’s susceptibility for the subjects’ voluntary behavior and the subjects’ consciousness as well as their attitude. self-descriptive methods are error-prone to these factors because the explicit awareness of being a test subject may (although by no means must) distort their genuine behavioral tendencies, for example by demand characteristics. live streaming bears a manifest, yet not unmitigated advantage over the laboratory designs since the streamer’s role does not relate to being a test’s subjects. however, the particular social constellation of exposing oneself to an internet audience still poses an influence on the behavioral tendencies. nevertheless, this influence, rather than the laboratory’s observation, is a natural one that can be compared to the role interests that are always present to human behavior by its social nature, as analyzed by sociology (cooley, 1902; lindesmith, 1983). equally in this social-psychological and sociological context, a further remark about the difference between traditional self-description and live streaming should be highlighted. as goffman (1980) has pointed out, the individual social situation is characterized by a framework, independent of how many protagonists engage in it. in the case of laboratory research, this framework is dominated by the instructions and the artificial circumstances. in live streaming, however, the genuine complexity of this behavior can be observed because the circumstances are not manipulated and are therefore plenty. in other words, the behavior’s variance is vastly increased. certainly, this poses a challenge for quantitative attempts of interpretation, but still, live streaming’s conceptual standardization allows for a basic access to approach the streamers’ behavior scientifically within a most naturalistic setting. to demonstrate the potential use of live streaming, the reference to an exemplary study in a similar domain can be illustrating. rach and kirsch (2016) investigated the possibility of modelling human problem solving with data from an online game. they implemented the well-established traveling salesperson problem (tsp) as the underlying structure to a casual online video game in order to obtain behavioral data about problem solving and dynamic decision making. in comparison with classical laboratory examinations, they expected benefits in ecological validity (“create an appealing game experience”, p. 416) and the efficiency of data acquisition. among their observations, they state certain aspects that indicate noteworthy behavioral accommodations, such as different attitudes towards the gameplay (“just curious” vs. “really ambitious”). these observations reflect a relevant situational difference between laboratorial and online settings. however, rach and kirsch (2016) highlight some shortcomings of their design that can be compensated by adopting live streaming as an (additional) data source. in their discussion, they mention low level of controllability, such as “the environment and distraction level” (p. 425) or misunderstanding of the instructions, as a possible source of noise in the data. moreover, they mention certain independent variables, e. g., the time invested into problem solving, which they were not able to interpret reliably, since they had no access to the participants’ immediate behavior. all in all, the authors still recommend online games as an experimental method – a conclusion that coincides with the approach at hand. beyond this general affirmation of the setting, live streaming would be able to compensate for the shortcomings of mere usage of online games while also inviting more detailed observations and interpretations. yet, the level of detail applied to their simulation by rach and kirsch (2016) is very basic and structurally based on the tsp. to access the complexity of actual video games which opens the horizon of research towards dynamic decision-making, an elaborated approach to understand their structure is required. for instance, the terminology by günzel mentioned above can serve this purpose in determining the problem space with the same precision that applies to simple problems as the tsp. another suitable example for the utility of live streaming research can be acquired in social sciences. reeves, greiffenhagen, and laurier (2016) explore the possible insights from the observation of video games drawing on previous analysis of different gameplay set10.11588/jddm.2017.1.33724 jddm | 2017 | volume 3 | article 1 | 6 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2017.1.33724 wendt: the empirical potentials of live streaming tings, such as playing in a group, as a couple or playing by oneself but “not alone” (p. 5) online. their considerations are based on ethnomethodological thinking, an approach that originated in the development of phenomenological sociology and therefore resembles the endeavor at hand. the authors introduce different perspectives of analyzing gameplay, such as the consideration of the communication with other players, “muliactivity” (being “interwoven in other activities”, p. 7) or the player’s “placement” (p. 10). in one case, the authors highlight that “the player in talking to the spectator formulates (for the spectator, but thereby also for the researchers) many aspects of the game that usually remain implicit, tacit, or unspoken” (p. 16), rendering a perspective that equally nourishes the psychological interest in live streaming. in their interpretations, the authors relate their observations to experiential features, such as “sequentiality” and “situadedness” (p. 21) or the “orderly character of everyday activities” (p. 23). by considering the resemblance to potential live streaming research, it can be outlined how the above elaborated concept may be of use. just as in the case of rach and kirsch (2016), reeves et al. (2016) state that – quoting sacks – the application of video material would favor the depth of analysis by enabling to “start with things that are not currently imaginable, by showing that they happened” (p. 28). live streaming offers an ideal potential to meet this demand. moreover the relationship between live streaming and self-descriptive methodology that has been advocated before, can expand the use of live streaming’s material beyond the ethnomethodological approach of reeves et al. (2016) who admit the methodological discrepancy between their work and cognitive sciences. in other words, the application of psychological research onto live streaming as a data source bears promising potential to integrate methods of social and cognitive sciences by surpassing the limitations of cognitive psychology. in conclusion, using live streaming as an empirical data source not only circumvents the methodological limitations of traditional self-description for its ecological validity but also introduces greater behavioral variance to psychological observation. in order to obtain these potentials for experimental psychology, first, the theoretical discourse about self-descriptive methodology ought to be continued. second, the new data source has to be explored by respective designs and established as a method by generating effective and reliable ways for its interpretation as it has been done for introspection and think aloud protocols over the course of psychology’s history. it has to be said that this investigation of the empirical potentials of live streaming is grounded on a theory-laden foundation. the approaches of phenomenology and the cultural-historical psychology as they have been advocated here, are not immune to critique. consequently, the data source of live streaming might appear ideal to conceptual approaches such as ethnomethodology, as can be said with bergmann (1991): “that advocates of ethnomethodology and the subsequent conversation analysis rely on audio-visual recordings of natural courses of interaction as primary data material” (p. 89, translation by the author), while it might at the same time encounter skepticism within cognitive psychology. however, this contradiction with regard to methodology is necessary to lay out when reviewing the theoretical controversy in experimental psychology and it indicates a vivid process of development in the sciences. acknowledgements: the author would like to give thanks to the two anonymous reviewers and the supervising editor andreas fischer for providing fruitful critique and substantial advices. further thanks go to jonathan griffiths for carefully proofreading the manuscript. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. handling editor: andreas fischer author contributions: the article’s concept derives from the author’s graduation thesis: "die tauglichkeit von live-streaming für die empirische psychologie des problemlösens" (2016), psychology department of the university of heidelberg, supervising professors joachim funke and momme von sydow. supplementary material: no supplementary material available. copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: wendt, a. n. (2017). the empirical potential of live streaming beyond cognitive psychology. journal of dynamic decision making, 3, 1. doi:10.11588/jddm.2017.1.33724 received: 04 november 2016 accepted: 26 february 2017 published: 08 march 2017 references aanstoos, c. m. (1983). the think aloud method in descriptive research. journal of phenomenological psychology, 14(2), 243266. aanstoos, c. m. (1987). a critique of the computational model of thought: the contribution of merleau-ponty. journal of phenomenological psychology, 18(2), 187-200. adachi, p. j., & willoughby, t. (2013). more than just fun and games: the longitudinal relationships between strategic video games, self-reported problem solving skills, and academic 10.11588/jddm.2017.1.33724 jddm | 2017 | volume 3 | article 1 | 7 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2017.1.33724 wendt: the empirical potentials of live streaming grades. journal of youth and adolescence, 42(7), 10411052. doi:10.1007/s10964-013-9913-9 alderson-day, b. & fernyhough, c. (2015). inner speech: development, cognitive functions, phenomenology, and neurobiology. psychological bulletin, 141(5), 931965. doi:10.1037/bul0000021 barasch, a., & berger, j. (2014). broadcasting and narrowcasting: how audience size affects what people share. journal of marketing research, 51(3), 286-299. doi:10.1509/jmr.13.0238 bergmann, j. r. (1991). deskriptive praktiken als gegenstand und methode der ethnomethodologie. in m. herzog & c. f. graumann (eds.), sinn und erfahrung. phänomenologische methoden in den humanwissenschaften. (pp. 86-102). heidelberg, germany: asanger. brentano, f. (1874). psychologie vom empirischen standpunkte. leipzig, germany: duncker & humblot. cooley, c. h. (1902). human nature and the social order. new york, ny: scribner’s. de groot, a. d. (1965). thought and choice in chess. the hague, netherlands: mouton. dörner, d., kreuzig, h.w., reither, f. & stäudel, t. (1983). lohhausen. vom umgang mit unbestimmtheit und komplexität. bern, switzerland: huber. elling, s., lentz, l., & de jong, m. (2012). combining concurrent think-aloud protocols and eye-tracking observations: an analysis of verbalizations and silences. professional communication, 55(3), 206-220. doi:10.1109/tpc.2012.2206190 fahrenberg, j. (2015). theoretische psychologie – eine systematik der kontroversen. lengerich, germany: pabst. franquet i calvet, r., villa montoya, m. i., & bergillos garcía, i. (2013). public service broadcasting’s participation in the reconfiguration of online news content. journal of computer-mediated communication, 18(3), 378-397. doi:10.1111/jcc4.12014 funke, j. (2014). problem solving: what are the important questions? in p. bello, m. guarini, m. mcshane, & b. scassellati (eds.), proceedings of the 36th annual conference of the cognitive science society (pp. 493-498). austin, tx: cognitive science society. funke, j. & spering, m. (2006). methoden der denkund problemlöseforschung. in j. funke (ed.), denken und problemlösen (=enzyklopädie der psychologie, themenbereich c: theorie und forschung, serie ii: kognition, band 8, (pp. 675744). göttingen, germany: hogrefe. galliker, m. (2016). ist die psychologie eine wissenschaft? ihre krisen und kontroversen von den anfängen bis zur gegenwart. wiesbaden, germany: springer. getzels, j. w. (1982). the problem of the problem. in r. hogarth (ed.), new directions for methodology of social and behavioral science: question framing and response consistency. ed. 11, (pp. 37-49). san francisco, ca: jossey-bass. goffman, e. (1980). rahmen-analyse. ein versuch über die organisation von alltagserfahrung. frankfurt am main, germany: suhrkamp. graumann, c. f. (1991). phänomenologie und psychologie ein problematisches verhältnis. in m. herzog & c. f. graumann (eds.), sinn und erfahrung. phänomenologische methoden in den humanwissenschaften. (pp. 22-42). heidelberg, germany: asanger. greiff, s., holt, d. v., wüstenberg, s., goldhammer, f., & funke, j. (2013). computer-based assessment of complex problem solving: concept, implementation, and application. educational technology research & development, 61, 407421. doi:10.1007/s11423-013-9301-x günzel, s. (2013). vom sehen des sehens zum sehen des sich selbst sehens. das computerspielbild der ersten person. in a. böhler, c. herzog, & a. pechriggel (eds.), korporale performanz. zur bedeutungsgenerierenden dimension des leibes. (pp. 123-154). bielefeld, germany: transcript. günzel, s. (2014). eine frage der perspektive, oder: was weiß der film über die räume des computerspiels? in j. pause, d. müller, & i. gradinari (eds.) wissensraum film. trierer beiträge zu den historischen kulturwissenschaften, bd. 13 (pp. 169-200) wiesbaden, germany: reichert. günzel, s. (2016). zur ästhetischen form des computerspielbildes. in j. sorg, & j. venus (eds.), erzählformen im computerspiel. zur medienmorphologie digitaler spiele. bielefeld, germany: transcript. gurwitsch, a. (1941). a non-ecological conception of consciousness. philosophy and phenomenological research, 1(3), 325338. doi:10.2307/2102762 güss, c. d., tuason, m. t., & orduña, l. v. (2015). strategies, tactics, and errors in dynamic decision making in an asian sample. journal of dynamic decision making, 1,1. doi:10.11588/jddm.2015.1.13131 huizinga, j. (1949). homo ludens. a study of the play-element in culture. london, uk: routledge & kegan paul. hutto, d. d. (2008). articulating and understanding the phenomenological manifesto. abstracta, 4(3), 10-19. jäkel, f. & schreiber, c. (2013). introspection in problem solving. the journal of problem solving, 6(1), 20-33. doi:10.7771/19326246.1131 jaspers, k. (1953). einführung in die philosophie. münchen, germany: piper. karat, c. m., karat, j., vergo, j., pinhanez, c., riecken, d., & cofino, t. (2002). that’s entertainment! designing streaming, multimedia web experiences. international journal of human-computer interaction, 14(3-4), 369384. doi:10.1080/10447318.2002.9669125 ko, h. t., chang, c., & chu, n. s. (2013). an empirical investigation of the consumer demand for digital television application services. behaviour & information technology, 32(4), 397409. doi:10.1080/0144929x.2011.608810 koro-ljungberg, m., douglas, e. p., therriault, d., malcolm, z., & mcneill, n. (2013). reconceptualizing and decentering thinkaloud methodology in qualitative research. qualitative research, 13(6), 735-753. doi:10.1177/1468794112455040 li, b. & yin, h. (2007). peer-to-peer live video streaming on the internet: issues, existing approaches, and challenges [peer-topeer multimedia streaming]. communications magazine, 45(6), 94-99. doi:10.1109/mcom.2007.374425 lindesmith, a. r. & strauss, a. l. (1983). symbolische bedingungen der sozialisation. eine sozialpsychologie. frankfurt am main, germany: ullstein. lyons, w. e. (1986). the disappearance of introspection. cambridge, uk: mit press. mcgonigal, j. (2011). reality is broken: why games make us better and how they can change the world. london, uk: j. cape. monjelat, n., zaballos, l. m., & lacasa, p. (2012). procesos de resolución de problemas y videojuegos: el caso de sim city creator. electronic journal of research in educational psychology, 10(28), 1493-1522. morin, a. (2009). self-awareness deficits following loss of inner speech: dr. jill bolte taylor’s case study. consciousness and cognition, 18(2), 524-529. doi:10.1016/j.concog.2008.09.008 10.11588/jddm.2017.1.33724 jddm | 2017 | volume 3 | article 1 | 8 http://dx.doi.org/10.1007/s10964-013-9913-9 http://dx.doi.org/10.1037/bul0000021 http://dx.doi.org/10.1509/jmr.13.0238 http://dx.doi.org/10.1109/tpc.2012.2206190 http://dx.doi.org/10.1111/jcc4.12014 http://dx.doi.org/10.1007/s11423-013-9301-x http://dx.doi.org/10.2307/2102762 http://dx.doi.org/10.11588/jddm.2015.1.13131 http://dx.doi.org/10.7771/1932-6246.1131 http://dx.doi.org/10.7771/1932-6246.1131 http://dx.doi.org/10.1080/10447318.2002.9669125 http://dx.doi.org/10.1080/0144929x.2011.608810 http://dx.doi.org/10.1177/1468794112455040 http://dx.doi.org/10.1109/mcom.2007.374425 http://dx.doi.org/10.1016/j.concog.2008.09.008 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2017.1.33724 wendt: the empirical potentials of live streaming neisser, u. (2014). cognitive psychology: classic edition. new york, ny: psychology press. newell, a., & simon, h. a. (1972). human problem solving. englewood cliffs, nj: prentice-hall. nicholson, b., & o’hare, d. (2014). the effects of individual differences, prior experience and cognitive load on the transfer of dynamic decision-making performance. ergonomics, 57(9), 1353-1365. doi:10.1080/00140139.2014.933884 nisbett, r. e. & wilson, t. d. (1977). telling more than we can know: verbal reports on mental processes. psychological review, 84(3), 231-259. doi:10.1037/0033-295x.84.3.231 ohlsson, s. (2012). the problems with problem solving: reflections on the rise, current status, and possible future of a cognitive research paradigm. the journal of problem solving, 5(1), 101128. doi:10.7771/1932-6246.1144 open science collaboration (2015). estimating the reproducibility of psychological science. science, 349(6251), aac4716. doi:10.1126/science.aac4716 petitmengin, c. & bitbol, m. (2009). listening from within. journal of consciousness studies, 16(10-12), 363-404. quesada, j., kintsch, w., & gomez, e. (2005). complex problem-solving: a field in search of a definition? theoretical issues in ergonomics science. 6(1), 533. doi:10.1080/14639220512331311553 rach, t. & kirsch, a. (2016). modelling human problem solving with data from an online game. cognitive processing, 17(4), 415-428. doi:10.1007/s10339-016-0767-4 radley, a. (1991). solving a problem together: a study of thinking in small groups. journal of phenomenological psychology, 22(1), 39-59. reeves, s., greiffenhagen, c., & laurier, e. (2016). video gaming as practical accomplishment: ethnomethodology, conversation analysis, and play. topics in cognitive science, 135. doi:10.1111/tops.12234 schmitz, u. (2015). einführung in die medienlinguistik. darmstadt, germany: wissenschaftliche buchgesellschaft. shengold, l. (1978). kaspar hauser and soul murder: a study of deprivation. international review of psycho-analysis, 5, 457-476. sturz, b. r., bodily, k. d., & katz, j. s. (2009). dissociation of past and present experience in problem solving using a virtual environment. cyberpsychology & behavior, 12(1), 1519. doi:10.1089/cpb.2008.0147 vygotsky, l. s. (1986). denken und sprechen. frankfurt am main, germany: fischer. walach, h. (2013). psychologie. wissenschaftstheorie, philosophische grundlagen und geschichte. stuttgart, germany: kohlhammer. werani, a. (2011). inneres sprechen: ergebnisse einer indiziensuche. berlin, germany: lehmanns. wertz, f. j. (1993). cognitive psychology: a phenomenological critique. journal of theoretical and philosophical psychology, 13(1), 2-24. doi:10.1037/h0091109 zahavi, d. (2005). subjectivity and selfhood. investigating the first-person perspective. massachusetts, ma: bradford. zahavi, d. & gallagher, s. (2008). reply: a phenomenology with legs and brains. abstracta, 4(3), 86-107. 10.11588/jddm.2017.1.33724 jddm | 2017 | volume 3 | article 1 | 9 http://dx.doi.org/10.1080/00140139.2014.933884 http://dx.doi.org/10.1037/0033-295x.84.3.231 http://dx.doi.org/10.7771/1932-6246.1144 http://dx.doi.org/10.1126/science.aac4716 http://dx.doi.org/10.1080/14639220512331311553 http://dx.doi.org/10.1007/s10339-016-0767-4 http://dx.doi.org/10.1111/tops.12234 http://dx.doi.org/10.1089/cpb.2008.0147 http://dx.doi.org/10.1037/h0091109 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2017.1.33724 original research collective risk social dilemma: role of information availability in achieving cooperation against climate change medha kumar and varun dutt indian institute of technology mandi, india behaviour change via monetary investments is a way to fighting climate change. prior research has investigated the role of climate-change investments using a collectiverisk-social-dilemma (crsd) game, where players have to collectively reach a target by contributing to a climate fund; failing which they lose their investments with a probability. however, little is known on how variability in the availability of information about players’ investments influences investment decisions in crsd. in an experiment involving crsd, 480 participants were randomly assigned to different conditions that differed in the availability of investment information among players. half of the players possessed a higher starting endowment (rich) compared to other players (poor). results revealed that investments against climate change were higher when investment information was available to all players compared to when this information was available only to a few players or to no one. similarly, investments were higher among rich players compared to poor players when information was available among all players compared to when it was available only to a few players or to no one. again, the average investment was significantly greater compared to the nash investment when investment information was available to all players compared to when this information was available only to a few players or to no one. we highlight some implications of our laboratory experiment for human decision-making against climate change. keywords: collective risk social dilemma, climate fund, information availability, investments, nash equilibrium climate change has been a topic of growing concern forthe entire world (roberts, 2015). earth’s average surface temperature has already risen about 1.8 degrees fahrenheit (1.0 degree celsius) since the late 19th century, a change that is largely driven by increased greenhouse gas (ghg) emissions into the atmosphere (ipcc, 2015). amidst increasing temperatures, real-world evidence shows that people continue to show a waiting approach towards climate change (dutt & gonzalez, 2012a; dutt & gonzalez, 2012b; ricke & caldeira, 2014). monetary investments against climate change, which are one of the indicators of behaviour change, provide important ways for our society to fight climate change (webb, 2012). climate negotiations are a way for deciding monetary investments against climate change and they enable us to reduce society’s impact on climate change (doulton & brown, 2009; sterman & sweeney, 2007; sterman, 2008). during negotiation process, there may be lower investments among negotiators. a likely reason for the lower investments could be socio-political or geo-political motivations (barnett, 2007). for example, the united states pulled out of the recent paris agreement and the green climate fund due to certain political motivations (zhang, chao, zheng, & huang, 2017). however, another reason for lower investments could be the information asymmetries present among negotiators. due to information asymmetries, some negotiators may possess untrue or imprecise information about investments of other negotiators; whereas, some negotiators may possess accurate investment information. an extreme form of information asymmetry may be where it becomes difficult to obtain information on one’s climate actions. for example, in the recent paris agreement, there was quite some debate over china’s stance to not let international inspectors access their information about carbon-dioxide emissions (zhang et al., 2017). an investigation of this extreme form of information asymmetry, where information may be withheld and not known to certain negotiators, is the primary focus of this paper. prior research has investigated climate negotiations in the laboratory using a collective-risk-social-dilemma (crsd) game (milinski, sommerfeld, krambeck, reed, & marotzke, 2008; tavoni, dannenberg, kallis, & löschel, 2011). in crsd, negotiating players are provided initial endowments and they need to contribute money from their endowments to reach a pre-defined collective goal over several rounds of negotiations. if players fail to reach the collective goal, then climate change could occur with a known probability and negotiating players lose their leftover endowments completely (milinski et al., 2008). understanding negotiations in the crsd game has been an active area of research (burton, may, & west, 2013; tavoni et al., 2011; milinski, röhl, & marotzke, 2011). however, existing literature involving crsd has assumed negotiating players to possess complete information about investments made by opponents (i.e., no information asymmetry was assumed to exist among players), which may not be true in the real world. as discussed above, nations may withhold information about their investments against climate change in the real world (zhang et al., 2017). motivated by this observation, we investigate the influence of such information asymmetries among players on decisionmaking in the crsd game in the laboratory. corresponding author: medha kumar, indian institute of technology mandi, mandi, india, e-mail: medha751@gmail.com 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 1 mailto:medha751@gmail.com https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments furthermore, in real world climate change negotiations, it is likely that income inequalities may exist between negotiators (uno, 2018). for example, some negotiators may belong to low-income nations and others may belong to high-income nations (uno, 2018). these income-level differences may likely influence the decision-making during negotiations (burton et al., 2013; milinski et al., 2011; dennig, budolfson, fleurbaey, siebert, & socolow, 2015). motivated from this literature, in this paper, we also investigate how income-level differences among players influence their decision in crsd. in what follows, initially we discuss prior research involving the crsd framework. then, we discuss certain theories of decision-making that help motivate our hypotheses concerning information asymmetries and income-level differences. next, we detail an experiment where we test our hypotheses in the crsd game. in the end, we detail our results, discuss their theoretical underpinnings, and derive implications of our findings for the real world. collective risk social dilemma (crsd) game prior research involving the collective risk social dilemma (crsd) game has tested the effects of probability of climate change on investments made by negotiators (milinski et al., 2008; tavoni et al., 2011). these studies have revealed that people invest more against climate change when they are convinced that failure to invest will cause grave financial losses (milinski et al., 2008). furthermore, people invest more against climate change in the crsd game when probability of experiencing a climate catastrophe is high compared to low (hagel, milinski, & marotzke, 2017; milinski et al., 2008). studies have investigated how individuals behave if a collective target is missed under different risk situations. results revealed that the assessment of risk arising from missing a collective target caused reduced contributions. however, risk reduction caused players to maximize their individual contributions (hagel et al., 2017). barrett and dannenberg (2012) showed that when players are provided with a dangerous scenario of rise in global temperature in the crsd game, climate negotiations turned into a coordination game. research has also revealed that the presence of small groups can help achieve collective goals under stringent conditions (santos, vasconcelos, santos, neves, & pacheco, 2012). in addition, prior research has evaluated the effects of inequalities in initial endowments and players’ pledges on investments against climate change in the crsd game (tavoni et al., 2011). results showed that the initial endowment inequality made it harder to succeed in the crsd game; however, players’ pledges increased success dramatically (tavoni et al., 2011). in this paper, we build upon this literature to investigate the effects of information asymmetries and income-level differences among players in crsd. thus, in some conditions in the crsd game, all players possess investment information about other players. however, in other conditions in the crsd game, either none of the players or only a subset of players possess investment information about other players. in addition, we create income-level differences between players by making some players invest against climate change in the initial rounds in crsd (poor players), where other players do not invest against climate change (rich players). we believe that both information asymmetries and income-level differences are likely to influence people’s investment decisions in the crsd game. theoretical underpinnings of decision-making in crsd a number of theories in decision-making literature may provide the theoretical underpinnings to understand the resulting behaviour in crsd in the presence of information asymmetries (gonzalez, ben-asher, martin, & dutt, 2015; kumar & dutt, 2015; mitchell, 1995; schultz, nolan, cialdini, goldstein, & griskevicius, 2007; voulevi & van lange, 2012) and income-level differences (burton et al., 2013; dennig et al., 2015; kahneman & tversky, 1979; milinski et al., 2011; tversky & kahneman, 1992). these theories may be connected at the cognitive level; however, they may also provide non-overlapping explanations about the resulting behaviour. the influence of information asymmetries on climate change investments may be explained based upon certain cognitive theories (gonzalez et al., 2015; kumar & dutt, 2015). for example, on account of instance-based learning theory (iblt; gonzalez et al., 2015; kumar & dutt, 2015), a cognitive theory of decisions from experience, we expect to find lower investments when information asymmetries are present among negotiators compared to when information asymmetries are absent. that is because, in classical games like prisoner’s dilemma, cognitive models of decision-making based upon iblt exhibit lower investments when information asymmetries are present compared to when information asymmetries are absent (gonzalez et al., 2015). such models combine not only personal investments; but, also investments of other negotiating partners (gonzalez et al., 2015). when information asymmetries are present, model players may not be able to systematically combine their investments with those of their opponents and they may be able to maximize only their personal savings and not their public investments. furthermore, the influence of information asymmetries on climate change investments may be explained based upon theory of social norms (tsn; schultz et al., 2007; voulevi & van lange, 2012). according to tsn, social norms are a double-edged sword (schultz et al., 2007; voulevi & van lange, 2012; dutt, 2011): investments could be higher or lower when players possess information about investments of others in their group compared to when they lack this information. for example, if opponents invest against climate change, then one expects this investment information’s availability among players to increase the overall investments of the group. however, if opponents do not invest against climate change, then one expects this investment information’s availability among players to decrease the overall investments of the group. that is because, according to tsn, people tend to follow others while deciding their own actions (schultz et al., 2007; voulevi & van lange, 2012). the influence of information asymmetries on climate change investments may also be explained based upon picture theory (mitchell, 1995) and that people are conscious about their public image (fenigstein, scheier, & buss, 1975; tajfel & turner, 1979). according to picture theory (mitchell, 1995), visuals are believed to have a great power to influence people’s decisions. also, public image of oneself may cause people to act differently compared to their private self (fenigstein et al., 1975). overall, on account of the theories of cognition and social norms, and the 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 2 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments picture theory, people are likely to become consistent investors when investment information about others is made available to them. thus, we expect h1: higher investments when information about investments of other players in a group is present compared to when this information is absent. furthermore, certain theories may explain the influence of income-level differences between rich and poor players on decision-making during negotiations (burton et al., 2013; milinski et al., 2011; dennig et al., 2015). for example, using laboratory experiments, milinski et al. (2011) showed that rich players are willing to substitute for missing contributions by the poor, provided the players collectively face intermediate climate targets. also, dennig et al. (2015) have demonstrated that poor people are more vulnerable to climate change impacts compared to rich people. furthermore, a number of economic theories (kahnemann & tversky, 1979; tversky & kahnemann, 1992) and ethical theories (ipcc, 2015; fleurbaey, 2008; brown, 2013) may also help explain the effects of income-inequality on people’s decision-making during negotiations. due to economic theories on differences in reference levels of low and high income negotiators (kahnemann & tversky, 1979; tversky & kahnemann, 1992) as well as ethical theories of responsibility and fairness (ipcc, 2015; fleurbaey, 2008; brown, 2013), one expects: h2: higher investments from high-income (rich) negotiators compared to low-income (poor) negotiators in the crsd game. in addition, when investment information is known to all players, then we expect rich negotiators to contribute more compared to poor negotiators on account of the phenomena of reference dependence in prospect theory (kahnemann & tversky, 1979; tversky & kahnemann, 1992). according to reference dependence (kahnemann & tversky, 1979; tversky & kahnemann, 1992), in the presence of investment information, those with higher reference levels (or higher incomes) will likely invest more compared to those with lower reference levels (or lower incomes). in the presence of investment information, higher-income negotiators may also contribute more compared to lower-income negotiators due to a feeling of responsibility towards society as well as a societal perception of fairness (brown, 2013). overall, we also expect: h3: higher investments from rich players compared to poor players when information about investments of other players in a group is present compared to when this information is absent. finally, players possessing pro-environmental dispositions have been shown to contribute more against climate change (burton et al., 2013). pro-environmental dispositions may measure people’s agreement or disagreement to different statements about the environment. overall, we expect: h4: players with greater pro-environmental dispositions to likely invest higher amounts against climate change compared to players with smaller pro-environmental dispositions. in the next section, we detail an experiment involving crsd where we evaluated different hypotheses stated above. method participants students were recruited through an email advertisement for a climate change study at the indian institute of technology mandi, india. there were 480 participants (54 females; 426 males), who were divided into 80 groups per condition with 6 participants per group. participants comprised of undergraduate and graduate students in computer engineering, mechanical engineering, electrical engineering, basic sciences, and humanities and social sciences. ages ranged from 18 to 30 years (m = 20 years; sd = 1.56 years). the groups took 45-50 minutes to finish the study. participants were paid inr 30 (∼ usd 0.5) as the base payment for participation. in addition, participants could get a performance incentive based upon the units left in their private account at the end of 13th round. the performance incentive was calculated as 1 unit in the private account = inr 0.5 in real money. on average across all conditions, participants earned 27 units (inr 13) as payment. procedure the experiment comprised of the following three sequential sections: questionnaire; instruction and demographic information; and game play. in the questionnaire section, which preceded the game play section, participants were given survey questionnaires that tested their pro-environmental predisposition (new ecological paradigm; dunlap et al., 2000). in the instructions and demographics section, participants were asked to self-report their basic demographic information (like age, gender, and major) and then asked to read instructions concerning the study. the instructions were adapted from (tavoni et al., 2011), which formed the basis for our study. in the game play section, participants were asked to play the crsd game within their group for 13 repeated rounds. experimental design four hundred and eighty participants were randomly assigned to one of four between-subjects conditions that differed in the amount of information possessed by negotiating players (20 groups per condition): info-all, no-info, info-rich, and info-poor. in each condition, a group of 6 randomly-matched players made monetary investments in a climate fund to avert climate change across 13 repeated rounds. all players in a group started with an equal payoff of 52 units in their private account. in each round, participants decided an investment between 0, 2, and 4 units to put in a climate fund with a goal of reaching 156 units by the end of 13th round. 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 3 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments collective risk social dilemma (crsd) game in crsd, negotiating players are provided initial endowments. players need to contribute money from their endowments to reach a pre-defined collective goal over several rounds of negotiations. if players fail to reach the collective goal, then climate change could occur with a known probability and negotiating players may lose their leftover endowments completely. in order to reach the collective target, players need to make individual sacrifice, with benefits to all but no guarantee that others will also contribute. from the point of view of players, it seems tempting to contribute less so as to save money to induce others to contribute more. hence, there is a dilemma and the risk of failure (milinski et al., 2008). figure 1 shows the investment screen used across all conditions. as shown in figure 1, the investment screen displayed the current trial number, total endowment left with the player, a timer, and different investment options. the timer indicated the time left for players to make their investment decisions. the timer lasted for 30 seconds but the screen did not switch after the timer expired until players made their decisions. players had to select one out of the three options to indicate the amount they wanted to invest into climate protection. once players had selected the amount, they pressed the next button to proceed to the next round. the first three rounds were automated, where the computer randomly made 3 players to contribute 4 units (poor) and made the remaining 3 players contribute 0 units (rich). the description about different conditions is presented in the next section. information availability in info-all (no-info) condition, at the end of each round, all players (none of the players) in the group got feedback about other players’ individual investments to the climate fund from the start of the game and in the preceding round. in the info-rich (info-poor) condition, at the end of each round, only the 3 rich (poor) players got feedback about other players’ individual investments to the climate fund from the start of the game and in the preceding round. figure 2 and 3 show the feedback screen presented to players in different conditions in the crsd game. for example, as shown in figure 2, in info-all condition, at the end of a round, all players in the group got feedback about other players’ individual investments to the climate fund from the start of the game and in the preceding round. also, players were given information about the total investment made by their group to the climate fund in the preceding round along with the total cumulative investment made by their group since the start of the game. in the info-rich condition, the rich players could see the investments made by all other players (see figure 2), but the poor players could not see the investments made by other players (see figure 3). similarly, in the info-poor condition, the poor players could see the investments made by all other players (see figure 2), but the rich players could not see the investments made by other players (see figure 3). across all conditions, if the collective investment of a group to the climate fund remained less than 156 units, then the group failed to reach the collective goal and climate change occurred with a 50% chance. if climate change occurred, then it made everyone lose their incomes that they had not invested in the climate fund till the last round. nep-r questionnaire before performing in the crsd game, participants were given the new ecological paradigmrevised (nep-r) questionnaire that tested their proenvironmental predisposition (dunlap et al., 2000). the nep-r consists of 15 statements, which tests people’s environmental pre-deposition on different issues. among the 15 statements, agreement on eight statements reflect endorsement of the paradigm and agreement of the remaining seven statements reflect the endorsement of the popular world view. in addition to nep-r questionnaire, participants were given questions that tested their reasoning for making decisions. for more information on these questions, please refer to the supplementary material. nash investment nash equilibrium is a term used in game theory to describe an equilibrium where each player’s strategy is optimal given the strategies of all other players (osborne & rubinstein, 1994). thus, nash equilibrium is a proposed solution of a non-cooperative game involving two or more players in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only their own strategy (osborne & rubinstein, 1994). in the crsd game, given 13 rounds, 6 players, and a target of 156 units, a number of nash equilibria are possible as the contributions from players in a group could be unequal – some may put 0s, some may put 2s, while others may put 4s. however, a fair nash equilibrium in crsd could be one that is symmetric, i.e., where all players are assumed to contribute equally and optimally to reach the target investment. the symmetric nash investment in the crsd game is assumed to be 2 units per player per round. that is because, when each of the 6 players in a group contributes 2 units per round across 13 rounds, the cumulative investment results in 156 units. dependent variables and statistical analyses we used the average cumulative investments across groups in different information conditions as one of the dependent variables. for each group, the average cumulative investment after a certain round was computed by averaging of the cumulative investments made by all players in a group up to the chosen round. 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 4 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments figure 1. investment screen across different information conditions in the crsd game. the investment screen displayed the endowment from which players had to invest between 0, 2 or 4 units into climate protection. figure 2. feedback screen presented to all players in info-all condition, rich players in info-rich condition, and poor players in info-poor condition, respectively. figure 3. feedback screen presented to all players in no-info condition, rich players in info-poor condition, and poor players in info-rich condition, respectively. 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 5 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments figure 4. success rates and average cumulative investments across different information conditions. success rate and average cumulative investment over 13 rounds by successful and failure groups in avoiding dangerous climate change. the blue section indicates success rates of successful groups; whereas, the red section indicates success rates of failure groups. the numbers within each section indicates average cumulative investments. numbers after the “±” symbol indicate the standard deviation (the n/a value in info-all condition is because only one failure group existed in this condition). for example, if the first, second, and third players in a group contributed 0, 2, and 4 units, respectively, in the first two rounds in crsd, then after 2 rounds, the cumulative investment of these players would be 0, 4 and 8 units, respectively. thus, the average cumulative investment would be 4 units [= (0 + 4 + 8) / 3). if a group’s cumulative investment was greater than or equal to 156 units at the end of the 13th round, then the group was termed as successful; otherwise, the group was termed as failure. success rate was defined as the proportion of groups out of all groups in a condition where the groups’ cumulative investments were greater than or equal to 156 units at the end of the 13th round. for example, if there were 10 groups out of a total of 20 groups in the info-all condition where the groups’ cumulative investments were greater than or equal to 156 units at the end of the 13th round, then the success rate would be 0.50 (= 10 / 20). results in order to test our expectations regarding the investments across different conditions and rounds, we performed one-way and mixed-factorial anovas with different dependent measures, and conditions and rounds as the independent measures. we also compared the average investment per player (found by averaging the investments of all players) against nash investment per player. to test the expectations regarding rich and poor players, we performed one-way anovas with different dependent measures, and rich and poor groups as the independent measure. furthermore, we also performed correlation analyses where we correlated nep-r scores with cumulative investments. all statistical analyses were performed at an alpha level of .05 and a power threshold of 0.8. success rates and average cumulative investments across successful and failure groups in order to test hypothesis h1, we performed a oneway anova to evaluate whether success rates were influenced by the different information conditions. information availability had a significant effect on success rates (f(3, 32) = 9.52, p < .05, ηp2 = .47). figure 4 shows the success rates by successful and failure groups in avoiding dangerous climate change. table 1. post-hoc tests for success rates across different information conditions. success rate in one condition versus the other condition (mean, standard deviation) p info-all (0.95, 0.22) > no-info (0.20, 0.41) < .05 info-all (0.95, 0.22) > info-rich (0.40, 0.50) < .05 info-all (0.95, 0.22) > info-poor (0.25, 0.44) < .05 no-info (0.20, 0.41) ∼ info-rich (0.40, 0.50) ∼ 0.18 no-info (0.20, 0.41) ∼ info-poor (0.25, 0.44) ∼ 0.71 info-rich (0.40, 0.50) ∼ info-poor (0.25, 0.44) ∼ 0.32 table 1 shows the post-hoc tests for comparing success rates in different conditions. post-hoc tests revealed that success rates were significantly higher in info-all condition compared to no-info, info-rich, and info-poor conditions. there was no significant difference in success rates between info-rich and info-poor conditions and between info-rich and no-info conditions. similarly, success rates were similar in no-info and info-poor conditions. as per our expectation in h1, these results show that groups had higher success rates when all players possessed investment information about others’ investments compared to when either this information was partially present with only some players in the group or completely absent from all players in the group. success rates were similar when 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 6 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments figure 5. average cumulative investments over rounds across different information conditions. average cumulative investment across 13 rounds by successful groups (a) and failure groups (b) in avoiding dangerous climate change. the horizontal line shows the collective goal of 156 units to be achieved by the end of 13th round in the task. 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 7 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments figure 6. average cumulative investments by rich and poor players. average cumulative investments were calculated across 10 rounds (round 4th to round 13th). the investment information was possessed by only the rich or only the poor players. furthermore, we performed a one-way anova to check whether the average cumulative investments were influenced by the different information conditions. figure 4 shows the average cumulative investments by successful and failure groups. information availability had a significant effect on the average cumulative investments for successful groups (f(3, 32) = 9.52, p < .05, ηp2 = .47); however, not for failure groups (f(3, 40) = 1.80, p = .16, ηp2 = .12). table 2 shows the post-hoc tests for average cumulative investments among successful groups. post-hoc tests revealed that the average investment in info-all condition was significantly higher compared to info-rich, info-poor, and no-info conditions. furthermore, average cumulative investment in no-info condition was similar to that in info-rich and info-poor conditions. there was no significant difference in average cumulative investments between info-rich and info-poor conditions. as per our expectation in h1, these results show that average cumulative investments were higher when all players possessed information about others’ investments compared to when this information was partially available to some players. furthermore, average cumulative investments were similar when investment information was available to only the rich or only the poor players. table 2. post-hoc test for average cumulative investments for successful groups across different information conditions. average cumulative investment in one condition versus the other condition (mean, standard deviation) p info-all (186.21, 19.10) > no-info (171.00, 8.72) < .05 info-all (186.21, 19.10) > info-rich (163.75, 9.59) < .05 info-all (186.21, 19.10) > info-poor (166.00, 6.16) < .05 no-info (171.00, 8.72) ∼ info-rich (163.75, 9.59) ∼ 0.40 no-info (171.00, 8.72) ∼ info-poor (166.00, 6.16) ∼ 0.55 info-rich (163.75, 9.59) ∼ info-poor (166.00, 6.16) ∼ 1.00 average cumulative investments across rounds among successful and failure groups we wanted to investigate the average cumulative investments across rounds among successful and failure groups. we analysed the pattern of average cumulative investments across rounds among successful and failure groups using one-way repeated-measures anovas (see figure 5a and 5b). average cumulative investments increased over rounds for both successful groups (f(3, 12) = 1461.96, p < .05, ηp2 =.98) and failure groups (f(3, 12) = 204.13, p < .05, ηp2 =.84). furthermore, we performed mixed-factorial anovas to evaluate whether the average cumulative investments across rounds among both successful and failure groups were different in different information conditions. anova results revealed that the average cumulative investments across rounds were indeed different in different information conditions among both successful groups (f(36, 384) = 6.97, p < .05, ηp2 =.40) and failure groups (f(36, 480) = 2.15, p < .05, ηp 2 =.14). as seen in fig 5(a), among successful groups, although there was an overall increase in investments across all conditions, yet the rate of increase was more in info-all condition compared to all other conditions. on average, participants reached the goal in 10 rounds in info-all condition compared to a higher number of rounds in other conditions. furthermore, as seen in fig 5(b), among failure groups, the rate of increase of average cumulative investment was similar in info-all, no-info, and info-rich conditions. however, average cumulative investments were lower in info-poor condition compared to that in other conditions. thus, in agreement with h1, the best case for achieving the collective goal was when investment information was present among all players. however, when groups failed, then the worst case was when investment information was available to only the poor players. 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 8 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments average cumulative investments among poor and rich players we expected rich players to invest more against climate change compared to poor players (h2). we analysed average cumulative investments between round 4th and round 13th by poor and rich players (see figure 6). in agreement with h2, average cumulative investments for rich players were significantly higher than those for poor players (58.4 > 51.2; f(1, 156) = 7.26, p < .05, ηp2 = .04). average cumulative investments among poor and rich players across different information conditions we expected information availability to influence the investments of rich and poor players (h3). we performed one-way anovas to investigate whether information availability influenced the average cumulative investments among rich and poor players, respectively, in different information conditions. figure 7 shows the average cumulative investments between 4th round and 13th round by poor players (blue) and rich players (red) across different information conditions. average cumulative investment was significantly higher among rich players compared to poor players in infoall condition (79.20 > 68.50; f(1, 39) = 4.70, p < .05, ηp 2 = 0.11). however, average cumulative investment for rich and poor players was similar in all other conditions: no-info (49.20 ∼ 47.20; f(1, 39) = 0.17, p = .68, ηp 2 = 0.00), info-rich (57.70 ∼ 49.90; f(1, 39) = 2.79, p = .10, ηp2 = 0.07), and info-poor (47.50 ∼ 39.10; f(1, 39) = 1.65, p = .21, ηp2 = 0.04). thus, overall, these results agree with our expectation h3 about rich players contributing more compared to poor players when information was available among all players. figure 7. average cumulative investments by poor (blue) and rich (red) players across different information conditions. average cumulative investment was calculated between the 4th round and the 13th round in the game. average cumulative investment and nep-r across different information conditions we expected players’ pro-environmental attitudes to influence their investments against climate change (h4). we analysed players’ pro-environmental attitudes by using the new ecological paradigm-revised (nep-r) scale (dunlap et al., 2000). overall, in agreement with h4, the nep-r score was significantly and positively correlated to cumulative investments across 13 rounds (r(78) = .42, p < .001). correlations between nep-r and cumulative investments were not significant in info-all condition (r(18) = .30, p = .19); no-info condition (r(18) = .22, p = .36); and, info-rich condition (r(18) = .42, p = .06). however, this correlation was significant for info-poor condition (r(18) = .50, p = .03). correlation between nep-r and cumulative investments was positive and significant for both poor players (r(78) = .31, p = .01) and rich players (r(78) = .27, p = .02). overall, these results agree with our expectation in h4. deviations of average investment per player from nash predictions we analysed the deviations in players’ investments from their nash predictions between rounds 4 and 13. in the info-all condition, the average investment per player was significantly greater compared to the symmetric nash investment per player (2.35 > 2.00; t(119) = 5.73, p < .05, r = .46). however, in other conditions, the average investment per player was significantly lower compared to the symmetric nash prediction: no-info (1.70 < 2.00; t(119) = −4.66, p < .05, r = .39), info-rich (1.84 < 2.00; t(119) = −2.31, p < .05, r = .21) and info-poor (1.57 < 2.00; t(119) = 5.73, p < .05, r = .46). the average investment per player was significantly lower compared to the symmetric nash investment per player for both rich players (1.95 < 2.00; t(239) = −.91, p < .05, r = .06) and poor players (1.70 < 2.00; t(239) = −5.14, p < .05, r = .31). discussion and conclusion in today’s world, climate change is a pressing problem and behaviour change is critically needed for fighting climate change (webb, 2012). monetary investments against climate change are important indicators of the needed behaviour change (doulton & brown, 2009; sterman & sweeney, 2007; sterman, 2008). our results revealed that possessing information about investments of other players produced higher investments against climate change and higher success rates among successful groups (h1). investments and success rates were similar when the investment information was possessed by only a subset of players (either rich or either poor only). also, the contributions by rich players were more compared to poor players when investment information was present among players (h2). also, the nep-r scores were positively correlated with people’s investments against climate change (h4). a likely reason for higher investments when information was present among all players is due to the theory of social norms (tsn; schultz et al., 2007). as per tsn, peer pressure plays a significant role in 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 9 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments driving monetary investments towards climate change: people are willing to contribute when they are able to see others contribute. the influence of information asymmetries on climate change investments may also be explained based upon picture theory (mitchell, 1995) and that people are conscious about their public image (fenigstein et al., 1975; tajfel & turner, 1979). according to picture theory (mitchell, 1995), visuals are believed to have a great power to influence people’s decisions. thus, when people are able to visualize the investment information about other players during feedback, then this visualization causes them to invest more against climate change. also, public image of oneself may cause people to act differently compared to their private self (fenigstein et al., 1975). in general, players may not want to be portrayed publicly as those contributing less as that is likely to hurt their public image. overall, players may tend to invest in ways that reduce the possibility of hurting their public image. still, another reason for higher investments in the presence of information could be due to the learning from investment outcomes of other players (gonzalez et al., 2015; kumar & dutt, 2015). as per instancebased learning theory (iblt), players maximize investments when they are able to combine their investment outcomes with investment outcomes of other players (gonzalez et al., 2015). players are likely able to activate investment instances in their memory when they observe contributions of other players. when information is present among all players, the activation of instances is relatively easy and this activation may likely cause people to invest significantly higher in the presence of information. interestingly, almost all groups were successful when investment information was available to all players. this result is in contrast to that found by tavoni et al. (2011) and milinski et al. (2008) where only 20% and 10% of the groups were successful when information was present among all players. although we can only speculate about the reasons for the differences, one likely reason for this difference could be the fact that this study was run in india compared to those of tavoni et al. (2011) and milinski et al. (2008), where the latter studies were run in european union (eu) with a different population. recent research has shown that people in developing countries (like india) perceive climate change a much greater threat to themselves and to their families compared to respondents in the developed countries (in eu; lee et al., 2015). perhaps, the feeling of threat from climate change made our participants contribute more against climate change. furthermore, we found that the rich players’ investments were higher compared to the poor players’ investments. this result can be explained on the basis of reference-level dependence as part of prospect theory (pt; kahnemann & tversky, 1992; tversky & kahnemann, 1979). on account of pt, poor players’ smaller incomes likely pushed their reference-levels lower compared to rich players’ reference-levels. a higher reference-level of rich players compared to poor players causes rich players to invest more compared to the poor players. another likely reason for rich players to contribute more compared to poor players is due to ethical theories of responsibility and fairness (ipcc, 2015; fleurbaey, 2008; brown, 2013). the higher income-levels of rich players gives them a feeling of responsibility towards reducing climate change. also, societal perception of rich players contributing more portrays them to be fair (fleurbaey, 2008; brown, 2013). in this paper, we used the collective-risk-socialdilemma (crsd) framework (burton et al., 2013; dannenberg et al., 2015; hagel et al., 2017; jacquet et al., 2013; milinski, hilbe, semmann, sommerfeld, & marotzke, 2016; milinski et al., 2008; tavoni et al., 2011) in a laboratory setting and our results regarding negotiations against climate change should be seen with this limitation in mind. our experimental design in this preliminary study was canonical and the situation, where investment information may be withheld from other players, may be less common in the realworld. in real-world negotiations, information sharing about investments may likely be present among negotiators; however, this information may not be true. thus, we plan to undertake future studies, where we vary the level of truth of investment information while people invest against climate change. from our lab-based findings in this paper, our results are promising for negotiations against climate change. overall, investments are likely to be higher when investment information is shared amongst all negotiating players. in the real-world, people are most likely to possess investment information about their opponents. in such situations, based upon our results, we expect investments against climate change to be maximized. in addition, real-world negotiations are likely to have negotiators from both nations with higher and lower income levels. based upon our findings, again, the news is promising: we expect that in a mixed income-level environment, the higher-income negotiators will contribute more compared to the lower-income negotiators. in fact, the higher-income negotiators are expected to be closer to their optimal nash investment levels. also, we found that pro-environmental attitudes were positively correlated with investments. thus, for real-world negotiations, investments are likely to be higher if negotiators possess pro-environmental attitudes towards our environment. thus, choosing negotiators with proenvironmental attitudes may be a key for success of climate negotiations. overall, our results revealed that information asymmetry is an important factor impacting investments against climate change. however, there are several other factors that are also likely to influence investments and negotiations against climate change. for example, penalties for those contributing less are likely to increase people’s investments. one way to increase investments could be by making this activity damaging to players, i.e., by giving players, who invest lit10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 10 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments tle, monetary penalties compared to those who do not show this behaviour. however, another way to increase investments could be by rewarding people’s contributory behaviours (i.e., rewarding those who do invest more). still, a third way could be to reward those who do invest against climate change and penalize those who do not invest against climate change. we plan to undertake some of these ideas as part of our future work involving crsd. another factor that is likely to influence investments against climate change is the presence or absence of income disparity among players. in this paper, we did not vary this factor as all players possessed income disparity across all information conditions. however, as part of our future research, we plan to systematically vary income disparity among different information conditions to understand the interaction of these factors. in this paper, we adapted instructions from tavoni et al. (2011) and used them across all information conditions. however, instructions provided to participants may influence their investment decisions in certain ways (zizzo, 2010). thus, as part of our future research, we plan to frame instructions in different ways to evaluate their influence on investments against climate change in conditions involving information asymmetry. some of these ideas form the immediate next steps in our research program involving negotiations against climate change. acknowledgements: this research is partially supported by indian institute of technology mandi and seed grant (iitm/sg/vd/32) which provided necessary computational and financial resources for this work. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: medha kumar was the research lead who designed the experiment and carried out data collection for this work. varun dutt was the principal investigator who served as a constant guiding light for this work. supplementary material: supplementary material available online. copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: kumar, m. & dutt, v. (2019). collective risk social dilemma: role of information availability in achieving cooperation against climate change. journal of dynamic decision making, 5, 2. doi:10.11588/jddm.2019.1.57360 received: 03 dec 2018 accepted: 03 may 2019 published: 18 may 2019 references barnett, j. (2007). the geopolitics of climate change. geography compass, 1(6), 1361–1375. doi:10.1111/j.17498198.2007.00066.x barrett, s., & dannenberg, a. (2012). climate negotiations under scientific uncertainty. proceedings of the national academy of sciences, 109(43), 17372–17376. doi:10.1073/pnas.1208417109 brown, d. a. (2013). climate change ethics. london: routledge. burton-chellew, m. n., may, r. m., & west, s. a. (2013). combined inequality in wealth and risk leads to disaster in the climate change game. climatic change, 120(4), 815–830. doi:10.1007/s10584-013-0856-7 dannenberg, a., löschel, a., paolacci, g., reif, c., & tavoni, a. (2015). on the provision of public goods with probabilistic and ambiguous thresholds. environmental and resource economics, 61(3), 365–383. doi:10.1007/s10640-014-9796-6 dennig, f., budolfson, m. b., fleurbaey, m., siebert, a., & socolow, r. h. (2015). inequality, climate impacts on the future poor, and carbon prices. proceedings of the national academy of sciences, 112(52), 15827–15832. doi:10.1073/pnas.1513967112 doulton, h., & brown, k. (2009). ten years to prevent catastrophe?: discourses of climate change and international development in the uk press. global environmental change, 19(2), 191–202. doi:10.1016/j.gloenvcha.2008.10.004 dunlap, r.e., van liere, k.d., mertig, a.g., & jones, r.e. (2000). new trends in measuring environmental attitudes: measuring endorsement of the new ecological paradigm: a revised nep scale. journal of social issues, 56(3), 425–442. doi:10.1111/00224537.00176 dutt, v. (2011). why do we want to defer actions on climate change? a psychological perspective (doctoral dissertation), carnegie mellon university. retrieved from https:// kilthub.cmu.edu/articles/why_do_we_want_to_defer_actions _on_climate_change_a_psychological_perspective/6724268/1 dutt, v., & gonzalez, c. (2012a). decisions from experience reduce misconceptions about climate change. journal of environmental psychology, 32(1), 19–29. doi:10.1016/j.jenvp.2011.10.003 dutt, v., & gonzalez, c. (2012b). human control of climate change. climatic change, 111(3–4), 497–518. doi:10.1007/s10584-011-0202-x fenigstein, a., scheier, m. f., & buss, a. h. (1975). public and private self-consciousness: assessment and theory. journal of consulting and clinical psychology, 43(4), 522–527. doi:10.1037/h0076760 fleurbaey, m. (2008). fairness, responsibility, and welfare. oxford university press. gonzalez, c., ben-asher, n., martin, j. m., & dutt, v. (2015). a cognitive model of dynamic cooperation with varied interdependency information. cognitive science, 39(3), 457–495. doi:10.1111/cogs.12170 hagel, k., milinski, m., & marotzke, j. (2017). the level of climate-change mitigation depends on how humans assess the risk arising from missing the 2◦ c target. palgrave communications, 3, 17027. doi:10.1057/palcomms.2017.27 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 11 https://doi.org/10.1111/j.1749-8198.2007.00066.x https://doi.org/10.1111/j.1749-8198.2007.00066.x https://doi.org/10.1073/pnas.1208417109 https://doi.org/10.1007/s10584-013-0856-7 https://doi.org/10.1007/s10640-014-9796-6 https://doi.org/10.1073/pnas.1513967112 https://doi.org/10.1016/j.gloenvcha.2008.10.004 https://doi.org/10.1111/0022-4537.00176 https://doi.org/10.1111/0022-4537.00176 https://kilthub.cmu.edu/articles/why_do_we_want_to_defer_actions_on_climate_change_a_psychological_perspective/6724268/1 https://kilthub.cmu.edu/articles/why_do_we_want_to_defer_actions_on_climate_change_a_psychological_perspective/6724268/1 https://kilthub.cmu.edu/articles/why_do_we_want_to_defer_actions_on_climate_change_a_psychological_perspective/6724268/1 https://doi.org/10.1016/j.jenvp.2011.10.003 https://doi.org/10.1007/s10584-011-0202-x https://doi.org/10.1037/h0076760 https://doi.org/10.1111/cogs.12170 https://doi.org/10.1057/palcomms.2017.27 https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 kumar & dutt: climate cooperation via monetary investments ipcc climate change. (2015). climate change 2014: mitigation of climate change (vol. 3). cambridge university press. jacquet, j., hagel, k., hauert, c., marotzke, j., röhl, t., & milinski, m. (2013). intra-and intergenerational discounting in the climate game. nature climate change, 3(12), 1025–1028. doi:10.1038/nclimate2024 kahneman, d., & tversky, a. (1979). prospect theory: an analysis of decision under risk. econometrica, 263–291. doi:10.2307/1914185 kumar, m., & dutt, v. (2015). understanding cooperative behavior against climate change through a public goods game. climate change, 1(2), 68–71. lee, t. m., markowitz, e. m., howe, p. d., ko, c. y., & leiserowitz, a. a. (2015). predictors of public climate change awareness and risk perception around the world. nature climate change, 5(11), 1014–1020. doi:10.1038/nclimate2728 milinski, m., hilbe, c., semmann, d., sommerfeld, r., & marotzke, j. (2016). humans choose representatives who enforce cooperation in social dilemmas through extortion. nature communications, 7, 10915. doi:10.1038/ncomms10915 milinski, m., röhl, t., & marotzke, j. (2011). cooperative interaction of rich and poor can be catalyzed by intermediate climate targets. climatic change, 109(3–4), 807–814. doi:10.1007/s10584011-0319-y milinski, m., sommerfeld, r. d., krambeck, h. j., reed, f. a., & marotzke, j. (2008). the collective-risk social dilemma and the prevention of simulated dangerous climate change. proceedings of the national academy of sciences, 105(7), 2291–2294. doi:10.1073/pnas.0709546105 mitchell, w. t. (1995). picture theory: essays on verbal and visual representation. university of chicago press. nash, j. (2016). the essential john nash. princeton university press. osborne, m. j., & rubinstein, a. (1994).a course in game theory (p. 15). cambridge, mass.: mit press. ricke, k. l., & caldeira, k. (2014). natural climate variability and future climate policy. nature climate change, 4(5), 333–338. doi:10.1038/nclimate2186 roberts, d. (2015, may 15). the awful truth about climate change no one wants to admit. retrieved from https://www.vox.com/ 2015/5/15/8612113/truth-climate-change santos, f. c., vasconcelos, v. v., santos, m. d., neves, p. n. b., & pacheco, j. m. (2012). evolutionary dynamics of climate change under collective-risk dilemmas. mathematical models and methods in applied sciences, 22(supp01), 1140004. doi:10.1142/s0218202511400045 schultz, p. w., nolan, j. m., cialdini, r. b., goldstein, n. j., & griskevicius, v. (2007). the constructive, destructive and reconstructive power of social norms. psychological science, 18(5), 429–434. doi:10.1111/j.1467-9280.2007.01917.x sterman, j. d. (2008). risk communication on climate: mental models and mass balance. science, 322(5901), 532–533. doi:10.1126/science.1162574 sterman, j. d., & sweeney, l. b. (2007). understanding public complacency about climate change: adults’ mental models of climate change violate conservation of matter. climatic change, 80(3–4), 213–238. doi:10.1007/s10584-006-9107-5 tajfel, h., & turner, j. c. (1979). an integrative theory of intergroup conflict. organizational identity: a reader, 56–65. tavoni, a., dannenberg, a., kallis, g., & löschel, a. (2011). inequality, communication, and the avoidance of disastrous climate change in a public goods game. proceedings of the national academy of sciences, 108(29), 11825–11829. doi:10.1073/pnas.1102493108 tversky, a., & kahneman, d. (1992). advances in prospect theory: cumulative representation of uncertainty. journal of risk and uncertainty, 5(4), 297–323. doi:10.1007/bf00122574 uno (2018, september 26). climate change, economic inequality, systemic bias among issues underlined by world leaders as general assembly continues debate. retrieved from https:// www.un.org/press/en/2018/ga12064.doc.html vuolevi, j. h., & van lange, p. a. (2012). boundaries of reciprocity: incompleteness of information undermines cooperation. acta psychologica, 141(1), 67–72. doi:10.1016/j.actpsy.2012.07.004 webb, j. (2012). climate change and society: the chimera of behaviour change technologies. sociology, 46(1), 109–125. doi:10.1177/0038038511419196 zhang, y. x., chao, q. c., zheng, q. h., & huang, l. (2017). the withdrawal of the us from the paris agreement and its impact on global climate change governance. advances in climate change research, 8(4), 213–219. doi:10.1016/j.accre.2017.08.005 zizzo, d. j. (2010). experimenter demand effects in economic experiments. experimental economics, 13(1), 75–98. doi:10.1007/s10683-009-9230-z 10.11588/jddm.2019.1.57360 jddm | 2019 | volume 5 | article 2 | 12 https://doi.org/10.1038/nclimate2024 https://doi.org/10.2307/1914185 https://doi.org/10.1038/nclimate2728 https://doi.org/10.1038/ncomms10915 https://doi.org/10.1007/s10584-011-0319-y https://doi.org/10.1007/s10584-011-0319-y https://doi.org/10.1073/pnas.0709546105 https://doi.org/10.1038/nclimate2186 https://www.vox.com/2015/5/15/8612113/truth-climate-change https://www.vox.com/2015/5/15/8612113/truth-climate-change https://doi.org/10.1142/s0218202511400045 https://doi.org/10.1111/j.1467-9280.2007.01917.x https://doi.org/10.1126/science.1162574 https://doi.org/10.1007/s10584-006-9107-5 https://doi.org/10.1073/pnas.1102493108 https://doi.org/10.1007/bf00122574 https://www.un.org/press/en/2018/ga12064.doc.html https://www.un.org/press/en/2018/ga12064.doc.html https://doi.org/10.1016/j.actpsy.2012.07.004 https://doi.org/10.1177/0038038511419196 https://doi.org/10.1016/j.accre.2017.08.005 https://doi.org/10.1007/s10683-009-9230-z https://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/57360 original research system structure and cognitive ability as predictors of performance in dynamic system control tasks jan hundertmark, daniel v. holt, andreas fischer, nadia said, and helen fischer department of psychology, heidelberg university, heidelberg, germany in dynamic system control, cognitive mechanisms and abilities underlying performance may vary depending on the nature of the task. we therefore investigated the effects of system structure and its interaction with cognitive abilities on system control performance. a sample of 127 university students completed a series of different system control tasks that were manipulated in terms of system size and recurrent feedback, either with or without a cognitive load manipulation. cognitive abilities assessed included reasoning ability, working memory capacity, and cognitive reflection. system size and recurrent feedback affected overall performance as expected. overall, the results support that cognitive ability is a good predictor of performance in dynamic system control tasks but predictiveness is reduced when the system structure contains recurrent feedback. we discuss this finding from a cognitive processing perspective as well as its implications for individual differences research in dynamic systems. keywords: dynamic system control, complex problem solving, reasoning ability, working memory, cognitive reflection it is a central question in problem solving and decisionmaking research which task properties and situational factors determine the difficulty of a problem and how these demands interact with the abilities of a problem solver. on the most general level, intelligence is useful for many types of problems and indeed, problem solving ability is often considered a defining aspect of general intelligence (e.g., sternberg, 1982). however, while in some problem domains the value of cognitive abilities is well established, in other domains it does not help much and occasionally even has adverse effects (e.g., wiley & jarosz, 2012). in dynamic system control paradigms intelligence has generally been shown to be beneficial (stadler, becker, gödker, leutner, & greiff, 2015), but it is still largely an open question in which way different aspects of dynamic systems (e.g., the number of variables or types of functional relations) contribute to problem difficulty and why some dynamic systems show high correlations with cognitive abilities while others do not. we therefore investigated the main effects of two characteristics of dynamic systems, system size and presence of oscillatory eigendynamics, and how they moderate the influence of cognitive abilities on control performance. additionally, we assessed the effects of cognitive load. taken together, we cover three groups of determinants of performance in dynamic system control tasks (as classified by funke, 1991): (a) system characteristics, (b) personal factors, and (c) context factors. systematically combining this range of factors in a single study allowed us to analyze their interaction, in particular, how system characteristics moderate the effect of cognitive abilities and context factors in determining task performance. to investigate these questions, we employed a computersimulated microworld paradigm. in microworld tasks participants interact with computer-simulated dynamic systems of varying size and complexity (kluge, 2008). systems are usually presented with a semantic framing such as managing a business, operating a complex machine, or carrying out chemistry experiments. the semantic framing may or may not give cues about the internal structure of the system. the task goal usually consists of exploring and successfully controlling the system to reach a target state. systems used in research vary widely in terms of complexity, realism, and prior knowledge required for successful control. the core idea of the microworld paradigm is to mimic essential characteristics of dynamic systems in the real world in a controlled laboratory environment (brehmer & dörner, 1993; gray, 2002). system characteristics early research on semantic aspects of complex problem solving investigated the extent to which prior knowledge could be applied to a given problem. this line of research demonstrated that misleading semantics are a huge impediment to successful system control (beckmann, 1994) and that prior knowledge accounts for a large proportion of performance in some common microworld tasks (wittmann & süß, 1999). driven by the desire to create psychometrically reliable assessment procedures, a more recent wave of research introduced semantically lean systems with highly reduced complexity, an approach termed “minimal complex systems” (e.g., greiff, wüstenberg, & funke 2012). it emphasizes formal aspects of problem difficulty by describing systems in a linear structural equation framework. the main determinant of difficulty is assumed to be the number of variables and system relations. studies using this approach report item difficulties roughly corresponding to this construction principle (e.g., greiff et al., 2012; wüstenberg, greiff, & funke, 2012), but the relation between specific system characteristics and difficulty is usually not analyzed in detail. building on berry and broadbent’s (1984, 1987, 1988) seminal sugar factory and person interaction tasks, which are conceptually similar to minimal complex systems (cf. fischer et al., 2015), we focused on the formal system characteristics system size and presence or absence of oscillatory eigendynamics (oed). corresponding author: jan hundertmark, center for psychosocial medicine, heidelberg university hospital, im neuenheimer feld 672, 69120 heidelberg, germany. e-mail: jan.hundertmark@med.uni-heidelberg.de 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 1 mailto:jan.hundertmark@med.uni-heidelberg.de http://dx.doi.org/10.11588/jddm.2015.1.26416 hundermark et al.: performance in dynamic system control tasks while system size may seem an obvious determinant of difficulty, surprisingly few studies have systematically investigated its effect in a controlled experimental design (e.g., funke, 1985). although berry and broadbent used small and large systems (e.g., 1984, 1987), they never compared the difficulty of these size variations in the same study. we operationalize system size as the number of variables and relations within a system. we expect large systems to be more difficult, as the increased number of target variables and relations makes system exploration and control cognitively more demanding. dynamic change over time is another crucial property of complex problems (dörner, 1980, 1983). one frequently encountered type of dynamics in system control tasks is a form of recurrent feedback termed eigendynamics, in which an output variable feeds back on itself. the feedback can be implemented either as a constant multiplier, leading to exponential growth or decay, or a negative sign of the feedback term. the latter may result in an oscillatory pattern with the output variable autonomously jumping between two values from one turn to another. the underlying equation is still linear, although the system’s behavior is not. in the present study, we applied the same oed as berry and broadbent (1984): we either included or excluded system relations with an output variable negatively feeding back on itself, in the form of yt+1 = 2 × xt − yt, with yt+1 = the new output, xt = the input given by the participant and yt = the previous trial’s output. oed are quite common to many real-world scenarios containing negative feedback mechanisms, e.g., predatorprey systems or economic boom-bust-cycles. using a cold store control scenario, dörner (1996) and güss (2010) have shown that systems with oscillatory behavior caused by negative feedback are indeed difficult to control, possibly due to the limited utility of simple exploration strategies such as the systematic variation of isolated variables to discover contingencies (e.g., chen & klahr, 1999). oscillation due to negative feedback may be more difficult to discern than simple time-based oscillation, e.g., based on a sine function, as they can be irregular and change with different inputs. we therefore expect a main effect of oed on task difficulty. as the structure of systems containing oed is apparently difficult to discern and verbalize, they have been labeled “non-salient” by berry and broadbent (1988). this term stems from implicit learning research, which postulates two distinct learning systems (e.g., berry & broadbent, 1988, 1995; reber, 1989; sun, slusarz, & terry, 2005): an explicit system responsible for forming a conceptual representation and an implicit system that stores events and contingencies in the form of subsymbolic associative links. in this approach’s language, “salient” relations are amenable to explicit, analytic reasoning, while implicit, automatic learning processes are more suited for acquiring knowledge about “non-salient” relations. what makes system features more or less salient may depend on a range of factors, such as whether they have an immediate effect or are time-delayed, whether random noise makes the system more intransparent or to what extent system structure matches participants’ expectations (see funke, 2003, for an overview). oed have been used as one paradigmatic manipulation to reduce a system’s salience (e.g., berry & broadbent, 1984). as the meaning of “salience” is only loosely specified, we focus on the specific system characteristic of oed. cognitive abilities personal factors relevant for dynamic system control may include a broad range of characteristics from cognitive ability to motivation and personality (funke, 1991). here, we investigate the aspect of cognitive abilities. while initially evidence was mixed (stadler et al., 2015), by now it can be considered a well-established finding that intelligence (often operationalized as reasoning ability) is a good predictor of performance for many dynamic system control tasks. in a recent meta-analysis, stadler et al. (2015) report a mean effect size of hedge’s g = .43 for the relation of intelligence and performance in a set of 62 studies. however, except for the attenuation of effect sizes due to measurement error, little is known about moderating factors and boundary conditions of this relation (stadler et al., 2015). we expect that systems including oed are not only harder to control but also that reasoning ability is less predictive for performance in this case. this may seem counter-intuitive, as superior intelligence and reasoning ability are generally associated with excelling at difficult tasks. however, reasoning is not a void process, but adds value to existing knowledge by transforming and recombining it according to the rules of logic. therefore, without explicit knowledge about the problem at hand, reasoning processes lack the “raw material” to operate on (goode & beckman, 2010). if we combine this insight with the observation by berry and broadbent (1984) that oed restricts the amount of explicit system knowledge acquired, it follows that reasoning cannot unfold its full potential in this case. this interpretation is in line with the elshoutraaheim hypothesis, according to which the utility of reasoning may be limited by the amount of knowledge available (leutner, 2002). studies in which explicit information about system structure is provided consistently found that reasoning ability and control performance are correlated (e.g., putzosterloh & lüer, 1981; kröner, plass, & leutner, 2005; wüstenberg et al., 2012). however, the most convincing line of evidence for the moderating effect of structural knowledge stems from goode and beckmann (2010; also goode, 2011). in these studies, the amount of structural knowledge available to participants was experimentally manipulated. goode and beckmann (2010) observed a notable difference in the correlation of intelligence and control performance depending on the amount of information provided. due to a relatively small sample in combination with a conservative analysis strategy, this difference was not statistically significant. in a later study using a larger sample the pattern of correlations was replicated and clearly reached statistical significance (goode, 2011). system size in contrast should not play a major role for the effects of reasoning provided that structural system knowledge can be acquired. again, this is supported by the results reported in goode (2011), as modifying system complexity by adding variables and relations did not result in an interaction of intelligence and complexity for predicting performance. larger systems may be more difficult to control, but the cognitive processes required do not fundamentally differ from those required for controlling smaller systems. we therefore expect no effect of system size on the predictiveness of reasoning for control performance. the validity of this analysis is of course contingent on the absence of artificial restrictions by ceiling or floor effects, but there were no indications for such restrictions in goode and beckman (2010) or goode (2011). 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 2 http://dx.doi.org/10.11588/jddm.2015.1.26416 hundermark et al.: performance in dynamic system control tasks we further included cognitive reflection (frederick, 2005) in our study, due to its good predictiveness for various judgment and decision making tasks (e.g., toplak, west, & stanovich, 2014; weber & johnson, 2009). as cognitive reflection is a reasoning-related disposition, we expect a pattern similar to reasoning ability, i.e., a main effect on control performance and interactions with oed. additionally, we investigated the effects of working memory on control performance. although reasoning and working memory are highly correlated, we expect that the predictors are not completely exchangeable. gonzalez, thomas, and vanyukov (2005) found that both constructs were good predictors of performance in the “water purification plant” scenario and showed statistically separable unique contributions to performance. however, we expect the effect of working memory on performance to be moderated less by oed and more by system size and concurrent dual tasking (see below). context factors context factors neither relate to structure or semantics of the system to be controlled, nor directly to characteristics of the person working on the task (cf. funke, 1991). they can, for example, include to what extent additional information about the system is provided (e.g., causal relation diagrams) or the goals given to participants (e.g., understanding systems structure versus reaching given control goals). in the present study, we investigated the effect of concurrent cognitive load on task performance as a relevant context factor. to this end, we introduced a dual task manipulation using a concurrent 2-back working memory task (cf. kirchner, 1958). a comparable manipulation using a random letter generation task has previously been used with variants of the person interaction task by hayes and broadbent (1988). they hypothesized that dual tasking should interfere with the working-memory intense selective processing in the salient condition more strongly than the nearly automatic unselective learning process in the non-salient condition. contrary to expectations, hayes and broadbent did not find such a selective impairment of learning in the “salient” condition under dual tasking, although response times were slowed down significantly. dual tasking only had an effect when learned responses had to be adapted for transfer to a modified second task. the authors suggest that the secondary task might not have been demanding enough to impair performance in the original system control task. however, another possibility is the very small sample (n = 18), resulting in low statistical power. for more robust evidence on this question, we included a dual task manipulation using a concurrent 2-back working memory task. we follow hayes and broadbent’s original hypothesis and expect dual tasks conditions not only to be more difficult, but to specifically impair the selective learning processes necessary to successfully control the stable, non-oscillatory systems. summary in this comprehensive study we aim to analyze three types of performance determinants in dynamic system control and their interactions. first, we quantify the relative effect of system size and oscillatory eigendynamics (oed) system relations on control performance. second, we analyze the predictive validity of reasoning ability and working memory capacity for control performance, particularly the interaction of these predictors with system size and the presence of oed. third, we study the effect of a cognitive load manipulation on control performance, again with a view toward its interactions with system characteristics. method participants one hundred and twenty-eight university students volunteered to participate in the study. one participant did not complete the system control tasks and was excluded from analysis. of the remaining participants, 103 were female, age ranged from 18 to 35 years with a median of 21 years, all were native german speakers. the experiment took about 90 minutes on average. participants received either e12 or course credit as compensation. for multivariate and repeated-measures analyses missing values were imputed using the expected maximization procedure (2.7% of data for the system control tasks). design each participant completed eight dynamic system control tasks and several tests of cognitive ability. in the systems control tasks, three experimental factors were manipulated within-subjects (two levels each: system size, presence of oed, cognitive load) in a fully crossed design. serial order of conditions was balanced using a latin square design. the cognitive load manipulation was applied block-wise, i.e., either to tasks one to four or to tasks five to eight. as an exploratory intervention, we gave half of the participants a brief instruction encouraging either explicit, rule-based exploration or an intuitive strategy. cognitive abilities measured included working memory, cognitive reflection and reasoning. using a within-subjects design with 127 participants yields 97% power to detect medium-sized effects at α = .05 (according to cohen, 1988). materials we designed four different basic types of dynamic system control scenarios in two parallel versions for a total of eight tasks. all scenarios were semantically framed as experiments in a biology laboratory where different substances (input variables) with fictitious labels, e.g., “dilarin” or “berophal”, could be added to cell cultures to produce different cell characteristics (output variables), e.g., nutrient requirement or temperature sensitivity, see fig. 1. the scenarios were turn-based, i.e., participants first changed the value of input variables using increment and decrement buttons (12 steps per variable) and then clicked a button to proceed to the next turn. the value of input variables remained stable unless manipulated by the participant, the value of output variables was determined by a set of simple linear equations (cf. funke, 2001) with a small random component (see table 1 for equations). values 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 3 http://dx.doi.org/10.11588/jddm.2015.1.26416 hundermark et al.: performance in dynamic system control tasks of system variables were capped at predefined minimum/maximum values to prevent participants from maneuvering systems into irrecoverable states. each scenario consisted of an exploration phase of 1.5 minutes followed by two control phases with different target values for 20 turns (or at most 2 minutes). successful system control required participants to first experiment with different input values and their effects on the output variables during the exploration phase. in the subsequent control phases, they had to apply their knowledge and manipulate the input variables to reach given target values. figure 1. task environment for a 2 × 2 mixed system (sta/oed, see table 1) during the exploration phase with dual-tasking. system size and presence of oed were experimentally manipulated with two levels each. system size was either small with one input and output variable (1 × 1 systems) or large with two input and output variables (2 × 2 systems). the oed factor was manipulated by either excluding or including oed in the system (cf. berry & broadbent, 1984, 1988). in the large systems, the oed was implemented for one of the output variables only. we refer to output variables excluding oed as stable (sta), because their values remain constant without the participant’s intervention (except for a small random term). taken together, the factors size and oed resulted in four basic system types for which two parallel versions each were constructed using different labels and numerical ranges (see table 1). the structure of the small oed system was identical to berry and broadbent’s (1984) tasks. we employed a 2-back parallel task to create a constant but not overwhelming load on working memory in half of the system control tasks. participants saw a sequence of large random letters on the top of the screen. each letter was presented for 2.5 seconds, followed by a 500 ms inter-stimulus interval. every time the current letter was the same as the second last letter back in the sequence, participants had to press the space key. we configured the task in such a way that a positive response was required in 30% of the trials. on errors, i.e., a false positive or a missed response (after 2500 ms), an acoustic beep was sounded. taken together, every participant completed eight scenarios: small and large systems including and excluding oed, once with and once without a parallel dual task (a 2 × 2 × 2 fully crossed within-subjects design controlled for effects of task order). scoring of control performance the control score was calculated by determining the proportion of turns during a control phase in which all variables of a system were within the target range. we chose the target range so that perfect control was in principle possible for every turn despite the random fluctuations. scores were averaged over the two control phases for each system control task. cognitive tests we assessed working memory capacity using an adapted version of the memory updating (mu) task described in lewandowsky, oberauer, yang, and ecker. (2010). the task requires participants to simultaneously encode a set of three to five digits and sequentially apply simple arithmetic operations on them. participants need to replace the memorized numbers with the results of the operation and recall them in a subsequent retrieval phase. in three validation experiments, the authors obtained high internal consistencies (average α = .87) and showed that mu was the best single predictor of general working memory capacity in a battery of commonly used wm tests. the correlation with intelligence was found to be r = .67. as an indicator of general reasoning ability, we used a short form of the raven advanced progressive matrices test (raven, court, & raven, 1985) developed by arthur and day (1994). in the present study we administered the short form with a time limit of 10 minutes. the original apm has been argued to be one of the purest available measure of analytical (fluid) intelligence (e.g., raven, 1989; carpenter, just, & shell, 1990). the short form shows an internal consistency of α = .72, its retest-reliability is rtt = .75, and it is strongly correlated with the apm long version, r = .90 (arthur & day, 1994). the original cognitive reflection test (crt; frederick, 2005) is a three-item questionnaire measuring the tendency to override a prepotent but incorrect response alternative and to engage in further reflection that leads to the correct response. the three questions are designed to make an intuitive yet erroneous answer spring to mind. for instance, the first question is “a bat and a ball cost $1.10. the bat costs $1.00 more than the ball. how much does the ball cost?” the correct answer (5 ct) requires the suppression of the 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 4 http://dx.doi.org/10.11588/jddm.2015.1.26416 hundermark et al.: performance in dynamic system control tasks table 1. basic equations used in the system control tasks. oscillatory eigendynamics system size absent (sta) present (oed) 1 × 1 y = x − 2 + r y = 2x − y ′ + r 2 × 2 y1 = 1.8 × x1 − 0.45 × x2 + r y1 = 1.8 × x1 − 0.45 × x2 + r y2 = 0.8 × x2 + 0.45 × x1 + r y2 = 1.3 × x2 + 0.95 × x1 − y ′2 + r note. x = input value; y = current trial’s output value; y ′ = preceding trial’s output value; r = random noise. equations adapated from berry and broadbent (1984, 1987, 1988). impulsive answer (10 ct). the crt was designed to assess a cognitive style related to readiness to engage in deliberate reflection, as postulated by dual process theories (see stanovich & west, 2000; also evans & frankish, 2009). it has been shown that the crt is closely related to measures of fluid intelligence (frederick, 2005; toplak et al., 2014) and particularly numerical reasoning ability (e.g., campitelli & gerrans, 2014). we used the expanded 7-item version as proposed by toplak et al. (2014). its correlation with the original crt is r = .86 and its internal consistency is α = .72. procedure after participants gave written informed consent they received a short oral instruction explaining the tasks and the set-up of the experiment. system control tasks were presented first, followed by the assessment of cognitive predictor variables. as an exploratory manipulation, we varied the instruction type by presenting one of two different task descriptions to participants. in the rule-based instruction condition, we instructed participants to “carefully observe the experiments’ results and try to form a rule in order to predict them accurately”. in contrast, in the intuition-based instruction condition, we encouraged them to “just take the presented results in and [...] not try to calculate or form a rule” and to instead “observe the results attentively and use [their] intuition”. this was repeated before every block for both conditions. the instructions aimed at eliciting a more selective (explicit) or unselective (implicit) learning mode, respectively. past research has shown similar wordings to affect participants’ approach to learning in dynamic system control tasks (cf. berry & broadbent, 1988; gebauer & mackintosh, 2007). after completing all system control tasks, we employed a manipulation check and asked participants to rate in which way they processed the tasks on a one-item nine-level likert scale ranging from entirely intuitive to entirely rule-based. furthermore, all participants completed a computer-based serial reaction time task (robertson, 2007), which was intended as a measure of implicit learning ability. due to technical problems data from this task were unusable and had to be excluded from analysis. results exploration the median exploration time per task was 80.9 seconds (iqr = 95.8) with a median of 26 exploration turns (iqr = 43). exploration was completed more quickly for the small systems (median 68.1 and 75.1 seconds for sta and oed) than for the large systems (median 95.9 and 96.5 seconds for sta/sta and sta/oed). the median number of exploration turns was comparable (24 and 25 turns versus 28.5 and 25 turns). dual tasking had no effect on exploration time (median 78.23 seconds with dual tasking, 83.7 seconds without), wilcoxon w(127) = 4300, p = .57, but a detrimental effect on the number of exploration turns (20 turns with dual tasking, 33 turns without), wilcoxon w(127) = 7001.5, p < .001. system characteristics and context factors the effect of system characteristics and context factors on control performance was analyzed using a fourfactor mixed anova with system size (small or large) and oed (present or absent) as system characteristics, which were varied within-subjects. the context factors were dual tasking (present or absent, withinsubjects) and instruction (rule-based or intuitionbased, between-subjects). to reduce the inflation of type i errors in multifactorial designs, we only report main effects and interactions for which hypotheses had been formulated. fig. 2 illustrates the characteristic behavior of systems with sta (stable) or oed (oscillatory) dynamics. the mean control performance scores for the four different system types were .69 (sd = .16) for sta, .27 (sd = .08) for oed, .24 (sd = .12) for sta/sta, and .09 (sd = .05) for sta/oed. as displayed in fig. 3, system size showed a strong main effect, f(1, 125) = 1673.06, p < .001, η2g = .55, as did oed, f(1, 125) = 870.55, p < .001, η2g = .52. both factors interacted, f(1, 125) = 310.77, p < .001, η2g = .19, indicating that the effect of oed partially depended on system size. comparing performance for the two target variables within the large mixed system (sta/oed) replicated the pattern of the separate sta and oed systems, f(1, 126) = 197.28, p < .001, η2g = .38. the context factor dual tasking exerted a small but statistically significant main effect on performance in the expected direction, f(1, 125) = 5.55, p = .02, η2g = .01. contrary to expectation, it did not interact 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 5 http://dx.doi.org/10.11588/jddm.2015.1.26416 hundermark et al.: performance in dynamic system control tasks figure 2. dynamics of a 1 × 1 sta (stable) and a 1 × 1 oed (oscillatory) system showing the development of the target variable over 20 control turns. the horizontal line indicates the given target value. each dotted line represents the output values of one participant. with oed, f(1, 125) = 0.45, p = .50. mean control performance was .33 (sd = .09) without dual tasking and .31 (sd = .08) with dual tasking. different instructions (encouraging rule formation or an intuitive approach) also showed a small but statistically significant effect on performance, f(1, 125) = 6.87, p = .01, η2g = .01 and no interaction with oed, f(1, 125) = 1.94, p = .17. mean control performance with rule-based instructions was .34 (sd = .06) and .31 (sd = .08) with intuition-based instructions. the self-rated processing style was not affected by the type of instruction given, t(125) = 0.15, p = .88. cognitive abilities a regression analysis for predicting overall system control (averaged over all tasks) with the cognitive ability variables apm, crt, and mu showed that in total 24.5% of performance variance could be explained by these predictors, f(3, 123) = 13.29, p < .001. crt was the strongest overall predictor, β = .41, p < .001, followed by apm, β = .20, p = .05, while mu did not significantly contribute, β = –.11, p = .25. table 2 lists the bivariate correlations between individual predictors and the different system types, supporting that crt was a good predictor throughout, while mu was comparatively weak. to test whether the predictiveness of cognitive variables interacts with the presence or absence of oed in the tasks as hypothesized, we conducted william’s tests for comparing dependent correlation coefficients for the small sta and oed systems. crt showed the expected difference, t(127) = 2.85, p < .01, with a lower correlation in the oed condition, but apm and mu did not, t(127) = 1.64, p = .10, and t(127) = 0.08, p = 0.93. combining all cognitive variables into a single general ability score by averaging z-standardized scores revealed that this overall ability variable also interacted with the absence of presence of oed, t(127) = 2.03, p = .04. correlations between cognitive ability variables and control performance may be attenuated by low reliabilities of the system control tasks. cronbach’s α for the two small sta system was only .42, and .28 for the two small oed systems. we therefore repeated the william’s tests applying a one-sided correction for attenuation to the control performance scores before comparing correlation coefficients. results support the initial analysis, even accentuating the interaction effects. crt showed the expected difference, t(127) = 6.94, p < .01, and with correction for attenuation so did apm, t(127) = 3.94, p < .001, while there still was no effect for mu, t(127) = 0.86, p = .39. for the combined general ability score this analysis also yielded a significant effect, t(127) = 4.83, p < .001. for the large systems, correlations of performance with cognitive abilities were not significantly different between those including or excluding oed, t’s < .40, p’s > .69. however, the analysis of whole systems may mask differences between the two target variables in the mixed system (sta/oed). we therefore conducted the comparisons of correlations of cognitive predictors and control performance just for the sta and oed variables within the mixed system (see table 2). similar to the results for the independent sta and oed systems, we found that the variable involving oed showed significantly lower correlations with two of the three cognitive predictors, t(127) = 2.16, p = .03 for the crt and t(127) = 2.09, p = .04 for mu. for apm correlations did not significantly differ, t(127) = .41, p = .68. again, these results were accentuated when correcting correlations for attenuation due to low reliabilities of the system control tasks (cronbach’s α = .41 for sta variables and .17 for oed 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 6 http://dx.doi.org/10.11588/jddm.2015.1.26416 hundermark et al.: performance in dynamic system control tasks figure 3. performance by target variable, averaged over parallel task versions. error indicators represent standard deviations. the value of each target variable is either controlled by a salient (light gray) or a non-salient relation (dark gray). (sta = 1 × 1 stable system, oed = 1 × 1 system containing oscillatory eigendynamics, sta/sta = 2 × 2 system with two stable target variables, sta/oed = 2 × 2 mixed system with one stable and one oscillatory target variable, see table 1). variables). to investigate whether dual tasking moderates the predictiveness of working memory in this task, we compared the correlation of mu and performance between dual tasking conditions. for all four system types, the correlation coefficients did not differ with and without dual tasking, t’s < 1.46, p’s > 0.14. further analyses performance in the 2-back secondary task was generally low with an average hit rate of .34 (sd = .20), although consistent (cronbach’s α = .86). a 2 × 2 anova showed a strong effect of system size on secondary task performance, f(1, 126) = 68.00, p < .001, but no effect of oed, f(1, 126) = 0.06, p = .80, and no interaction, f(1, 126) = 0.01, p = .93. this suggests that the larger systems were more working-memory demanding, thereby reducing cognitive resources for the secondary task. in addition to the effect on system control performance reported above, we also observed a clear effect of dual tasking on response latency: without dual tasking participants took an average of 2.20 (sd = 0.92) seconds per control turn and 2.77 (sd = 1.22) with dual tasking, f(1, 126) = 38.40, p < .001. discussion we observed that manipulating the presence of oscillatory eigendynamics (oed) and system size changed difficulty as expected, while manipulating cognitive load and the instructions only had a small effect on control performance. furthermore, we found that oed not only make system control more difficult, but that they can also reduce the effect of cognitive abilities on control performance. regarding system characteristics, we found that oed apparently were difficult to discern and control for most participants, in line with the results of berry and broadbent (1988). the small oed system was about as difficult as a stable system twice the size (sta/sta). what makes this finding particularly striking is that the mathematical change to the system structure was minimal, just an additional negative term in the linear equation. the difficulty pattern was replicated for the different target variables in the large mixed systems (sta/oed). the target variables behaved very similar to the small sta and oed systems, with the oed variable being much harder to control. these results show that operationalizing system complexity merely in terms of number of variables and relations does not fully cover complexity from a cognitive perspective. the emergent dynamic complexity of the system as a whole seems to be just as important, if not more so (e.g., brehmer & dörner, 1993; gonzalez et al., 2005). in their seminal work, berry and broadbent (1984) and reber (1967) referred to systems which are easy, respectively difficult, to explore and control using deliberate reasoning strategies as “salient” or “nonsalient”. however, we think that the effects of dynamic complexity produced by negative feedback go beyond berry and broadbent’s suggestion that low salience simply makes it less likely that participants focus their exploration on the relevant parts of the system. even when non-salient relations are detected and perhaps even partially understood (e.g., that there is oscillation), the system still may be more difficult to explore and control. simple exploration strategies such as control-of-variables (chen &klahr, 1999) are harder to apply due to the prior system state’s influence and the resulting instable system behavior. furthermore, for the same reason it is difficult to derive the correct control interventions even if the system structure is understood. we replicated the finding that cognitive ability is a good predictor of control performance (stadler et al., 2015). considering specific abilities, we found cognitive reflection to be the strongest overall predictor, followed by reasoning ability, while working memory capacity was a comparably weak predictor. this result is somewhat surprising, given the conceptual overlap between reasoning and working memory. as the measure of working memory employed, memory updating, is a well-established and reliable indicator, one possible explanation is that the relatively simple systems used in this study do not pose high working memory demands. this explanation is supported by the fact that the concurrent working memory load only had a small effect on performance. 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 7 http://dx.doi.org/10.11588/jddm.2015.1.26416 hundermark et al.: performance in dynamic system control tasks table 2. correlations of reasoning ability, cognitive reflection, and working memory with control performance. system type target variable in sta/oed sta oed sta/sta sta/oed sta oed reasoning ability .33 .17 .23 .27 .32 .18 working memory .16 .09 .17 .20 .30 .08 cognitive reflection .45 .16 .28 .31 .39 .18 note. n = 127. correlations for the target variables in the 2x2 mixed system (sta/oed) are shown separately. correlations with p < .05 shown in bold. coefficents above .23 are significant at p < .01, above .29 at p < .001. beyond their overall effects, cognitive abilities interacted with specific task characteristics. as expected, the predictors most closely related to abstract reasoning (apm, crt) interacted with the presence or absence of oed. specifically, these predictors were less correlated with performance in the small systems including oed. we found the same pattern in the large mixed system (sta/oed) when both target variables were analyzed separately. working memory capacity, in contrast, did not show an interaction with the presence of oed, possibly due to its generally low predictiveness in this paradigm. these results also hold when statistically controlling for the low measurement reliability of the systems including oed. the only predictor interacting with system size was cognitive reflection, a statistically significant, but very small effect. the interaction of cognitive abilities and system characteristics is in line with previous findings by goode (2011), who showed that reasoning ability is less predictive for highly complex systems. the explanation given by goode (2011, also goode & beckmann, 2010) is that reasoning ability can only unfold its effect if structural knowledge is acquired. however, as berry and broadbent (1984) have shown before, the presence of oed dramatically reduces the amount of structural knowledge acquired. consequently, reasoning ability should be less predictive in systems including oed. given that our results confirm this hypothesis, we conclude that, somewhat paradoxically, reasoning ability may be more helpful for relatively simple dynamic problems with an obvious structure. however, this result was obtained under laboratory conditions with a strict time limit and may be different when further opportunity for exploration or additional information sources are available. another conceivable criticism is that control performance in the oed systems is simply less reliable in psychometric terms and correlations with other constructs are therefore limited. we calculated corrections for attenuation as one approach to rule out this possibility. furthermore, this criticism is based on the assumption that there is a stable trait or ability reflected in performance, which does not need to be the case. alternatively, the performance scores can be considered a formative measure, i.e., they directly represent the degree of successful system control, which is the criterion to be predicted. an alternative candidate for an ability underlying performance in tasks with oed would have been implicit learning ability, as suggested by the observation that implicit learning takes place in these systems (berry & broadbent, 1984). although our measure of implicit learning ability was unusable for technical reasons, it is uncertain whether it would have added much explanatory value as a predictor. in a study using a relatively complex dynamic system, danner et al. (2011) showed that the latent correlation (corrected for measurement error) between implicit learning ability and control performance was just r = .26 compared to r = .86 for intelligence. furthermore, implicit learning as a unitary ability is not undisputed (gebauer & mackintosh, 2007) and its reliability seems to be generally low (reber & allen, 2000). moreover, the time restrictions in our study and the tasks’ superficial similarity despite their structural differences may have prevented implicit, instance-based learning (cf. kaufman, 2011). the correlations between reasoning and system control performance in our study suggest that mainly explicit, deliberate learning was required. this interpretation is supported by studies that similarly found such correlations in explicit learning conditions but not in implicit learning conditions (shown for intelligence by gebauer & mackintosh, 2007, and for working memory capacity by unsworth & engle, 2005). supporting earlier findings by hayes and broadbent (1988), our results show that dual tasking slowed participants down, but only had a negligible effect on control performance. while in some reasoning tasks cognitive load affects both accuracy and response latency (e.g., gilhooly, logie, wetherick, & wynn, 1993), a dissociation of the two is also sometimes observed (e.g., baddeley & hitch, 1974). our findings imply that in the present task it is possible to compensate experimentally reduced mental capacity by proceeding more slowly, and that participants seem to give priority to accuracy over speed. this pattern of results was the same for sta and oed systems. if performance in oed conditions was purely based on implicit learning, it should have been less affected by dual tasking. however, this was not the case, further supporting the interpretation that explicit learning may have been relevant in all conditions. in summary, the present study demonstrates that the presence of oscillatory eigendynamics in a system has a strong effect on difficulty and can act as a moderator on the effect of reasoning and cognitive reflection on control performance. system size has an effect on difficulty, but shows only limited interaction with cognitive abilities. furthermore, we found that analyzing 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 8 http://dx.doi.org/10.11588/jddm.2015.1.26416 hundermark et al.: performance in dynamic system control tasks target variables in the mixed (sta/oed) large systems separately mirror the pattern from comparing the separate small sta and oed systems. we therefore recommend the separate analysis of system parts for future cognitive research in dynamic system control. our results may also be informative for the psychometric application of dynamic system control tasks, as they contribute towards a more differentiated understanding of the effects of system characteristics and cognitive abilities on task performance. acknowledgements: this research was partially supported by grant no. fu 173/14 of the german research foundation (dfg). the authors would like to thank katharina berger and antje spiertz for assistance with data collection and three reviewers for their helpful comments on the manuscript. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. dh and af are co-editors of jddm. author contributions: all authors contributed to the design of the study. af, dh, and ns prepared the materials. jh coordinated the data collection. jh and dh analyzed the data and wrote the manuscript. all authors commented on and approved of the final version of the manuscript. supplementary material: supplementary material available online. handling editor: florian kutzner copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: hundertmark, j., holt, d. v., fischer, a., said, n., & fischer, h. (2015). system structure and cognitive ability as predictors of performance in dynamic system control tasks. journal of dynamic decision making, 1, 5. doi: 10.11588/jddm.2015.1.26416 received: 10 december 2015 accepted: 29 january 2016 published: 9 february 2016 references arthur, w., & day, d. v. (1994). development of a short form for the raven advanced progressive matrices test. educational and psychological measurement, 54, 394-403. doi: 10.1177/0013164494054002013 baddeley, a. d., & hitch, g. j. (1974). working memory. in g. bower, the psychology of learning and motivation, vol. 8, (pp. 47-89). new york, ny: academic press. beckmann, j. f. (1994). lernen und komplexes problemlösen: ein beitrag zur konstruktvalidierung von lerntests [learning and complex problems solving: a contribution to the construct validation of tests of learning potential]. bonn: holos. berry, d. c., & broadbent, d. e. (1984). on the relationship between task performance and associated verbalizable knowledge. the quarterly journal of experimental psychology, 36a, 209231. doi: 10.1080/14640748408402156 berry, d. c., & broadbent, d. e. (1987). the combination of explicit and implicit learning processes in task control. psychological research, 49, 7-15. doi: 10.1007/bf00309197 berry, d. c., & broadbent, d. e. (1988). interactive tasks and the implicit-explicit distinction. british journal of psychology, 79, 251-272. doi: 10.1111/j.2044-8295.1988.tb02286.x berry, d. c., & broadbent, d. e. (1995). implicit learning in the control of complex systems. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective. (pp. 131150). hillsdale, nj: erlbaum. brehmer, b., & dörner, d. (1993). experiments with computersimulated microworlds: escaping both the narrow straits of the laboratory and the deep blue sea of the field study. computers in human behavior, 9, 171-184. doi: 10.1016/07475632(93)90005-d campitelli, g., & gerrans, p. (2014). does the cognitive reflection test measure cognitive reflection? a mathematical modeling approach. memory & cognition, 42, 434-447. doi: 10.3758/s13421-013-0367-9 carpenter, p. a., just, m. a., & shell, p. (1990). what one intelligence test measures: a theoretical account of the processing in the raven progressive matrices test. psychological review, 97, 404-431. doi: 10.1037/0033-295x.97.3.404 chen, z., & klahr, d. (1999). all other things being equal: acquisition and transfer of the control of variables strategy. child development, 70, 1098-1120. doi: 10.1111/1467-8624.00081 cohen, j. (1988). statistical power analysis for the behavioral sciences (2nd ed.). hillsdale, nj: erlbaum. danner, d., hagemann, d., holt, d.v., bechthold, m., schankin, a., wüstenberg, s., & funke, j. (2011). measuring performance in a complex problem solving task: reliability and validity of the tailorshop simulation. journal of individual differences, 32, 225-233. doi: 10.1027/1614-0001/a000055 dörner, d. (1980). on the difficulties people have in dealing with complexity. simulation & gaming, 11, 87-106. doi: 10.1177/104687818001100108 dörner, d. (1983). heuristics and cognition in complex systems. in r. groner, m. groner, & w. f. bischof (eds.), methods of heuristics. hillsdale, nj: erlbaum. dörner, d. (1996). the logic of failure: recognizing and avoiding error in complex situations. new york, ny: perseus. evans, j., & frankish, k. (2009). in two minds: dual processes and beyond. oxford: university press. fischer, a., greiff, s., wüstenberg, s., fleischer, j., buchwald, f., & funke, j. (2015). assessing analytic and interactive aspects of problem solving competency. learning and individual differences, 39, 172-179. doi:10.1016/j.lindif.2015.02.008 frederick, s. (2005). cognitive reflection and decision making. journal of economic perspectives, 19, 25-42. doi: 10.1257/089533005775196732 funke, j. (1985). steuerung dynamischer systeme durch aufbau und anwendung subjektiver kausalmodelle [control of dynamic systems by generating and applying subjective causal models]. zeitschrift für psychologie, 193, 443-465. 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 9 http://dx.doi.org/10.11588/jddm.2015.1.26416 hundermark et al.: performance in dynamic system control tasks funke, j. (1991). solving complex problems: exploration and control of complex systems. in r. j. sternberg & p. a. frensch (eds.), complex problem solving: principles and mechanisms (pp. 185-222). hillsdale, nj: erlbaum. funke, j. (2001). dynamic systems as tools for analysing human judgement. thinking & reasoning, 7, 69-89. doi: 10.1080/13546780042000046 funke, j. (2003). problemlösendes denken [problem solving thinking]. stuttgart: kohlhammer. gebauer, g. f., & mackintosh, n. j. (2007). psychometric intelligence dissociates implicit and explicit learning. journal of experimental psychology: learning, memory, and cognition, 33, 34-54. doi: 10.1037/0278-7393.33.1.34 gilhooly, k. j., logie, r. h., wetherick, n. e., & wynn, v. (1993). working memory and strategies in syllogistic-reasoning tasks. memory & cognition, 21, 115-124. doi: 10.3758/bf03211170 gonzalez, c., thomas, r. p., & vanyukov, p. (2005). the relationships between cognitive ability and dynamic decision making. intelligence, 33, 169-186. doi: 10.1016/j.intell.2004.10.002 goode, n. (2011). determinants of the control of dynamic systems: the role of structural knowledge (doctoral thesis, university of sydney, sydney, australia). retrieved from http://ses.library.usyd.edu.au/handle/2123/8967 goode, n., & beckmann, j. f. (2010). you need to know: there is a causal relationship between structural knowledge and control performance in complex problem solving tasks. intelligence, 38, 345-352. doi: 10.1016/j.intell.2010.01.001 gray, w. d. (2002). simulated task environments: the role of high-fidelity simulations, scaled worlds, synthetic environments, and laboratory tasks in basic and applied cognitive research. cognitive science quarterly, 2, 205-227. greiff, s., wüstenberg, s., & funke, j. (2012). dynamic problem solving: a new assessment perspective. applied psychological measurement, 36, 189-213. doi: 10.1177/0146621612439620 güss, c. d. (2010). fire and ice: testing a model on culture and complex problem solving. journal of cross-cultural psychology, 1-20. doi: 10.1177/0022022110383320 hayes, n. a., & broadbent, d. e. (1988). two modes of learning for interactive tasks. cognition, 28, 249-276. doi: 10.1016/0010-0277(88)90015-7 kaufman, s. b. (2011). intelligence and the cognitive unconscious. in r. j. sternberg & s. b. kaufman (eds.), the cambridge handbook of intelligence (pp. 442-467). new york: cambridge university press. kirchner, w. k. (1958). age differences in short-term retention of rapidly changing information. journal of experimental psychology, 55, 352-358. doi: 10.1037/h0043688 kluge, a. (2008). performance assessments with microworlds and their difficulty. applied psychological measurement, 32, 156-180. doi: 10.1177/0146621607300015 kröner, s., plass, j. l., & leutner, d. (2005). intelligence assessment with computer simulations. intelligence, 33, 347-368. doi: 10.1016/j.intell.2005.03.002 leutner, d. (2002). the fuzzy relationship of intelligence and problem solving in computer simulations. computers in human behavior, 18, 685-697. doi: 10.1016/s0747-5632(02)00024-9 lewandowsky, s., oberauer, k., yang, l.-x., & ecker, u. k. h. (2010). a working memory test battery for matlab. behavior research methods, 42, 571-585. doi: 10.3758/brm.42.2.571 putz-osterloh, w., & lüer, g. (1981). the predictability of complex problem solving by performance on an intelligence test. zeitschrift für experimentelle und angewandte psychologie, 28, 309-324. raven, j. (1989). the raven progressive matrices: a review of national norming studies and ethnic and socioeconomic variation within the united states. journal of educational measurement, 26, 1-16. doi: 10.1111/j.1745-3984.1989.tb00314.x raven, j. c., court, j. h., & raven, j. (1985). manual for raven’s progressive matrices and vocabulary scales (revised ed.). london: lewis. reber, a. s. (1967). implicit learning of artificial grammars. journal of verbal learning and verbal behavior, 6, 855-863. doi: 10.1016/s0022-5371(67)80149-x reber, a. s. (1989). implicit learning and tacit knowledge. journal of experimental psychology, 118, 219-235. doi: 10.1037/00963445.118.3.219 reber, a. s., & allen, r. (2000). individual differences in implicit learning: implications for the evolution of consciousness. in r. g. kunzendorf & b. wallace (eds.), advances in consciousness research (pp. 227-247). amsterdam: john benjamins. robertson, e. m. (2007). the serial reaction time task: implicit motor skill learning? the journal of neuroscience, 27, 1007310075. doi: 10.1523/jneurosci.2747-07.2007 stadler, m., becker, n., gödker, m., leutner, d., & greiff, s. (2015). complex problem solving and intelligence: a meta-analysis. intelligence, 53, 92-101. doi: 10.1016/j.intell.2015.09.005 stanovich, k. e., & west, r. f. (2000). individual differences in reasoning: implications for the rationality debate? behavioral and brain sciences, 23, 645-665. doi: 10.1017/s0140525x00003435 sternberg, r. j. (1982). handbook of human intelligence. cambridge: cambridge university press. sun, r., slusarz, p., & terry, c. (2005). the interaction of the explicit and the implicit in skill learning: a dual-process approach. psychological review, 112, 159-192. doi: 10.1037/0033295x.112.1.159 toplak, m. e., west, r. f., & stanovich, k. e. (2014). assessing miserly information processing: an expansion of the cognitive reflection test. thinking & reasoning, 20, 147-168. doi: 10.1080/13546783.2013.844729 unsworth, n., & engle, r. w. (2005). individual differences in working memory capacity and learning: evidence from the serial reaction time task. memory & cognition, 33, 213-220. doi: 10.3758/bf03195310 weber, e. u., & johnson, e. j. (2009). mindful judgment and decision making. annual review of psychology, 60, 53-85. doi: 10.1146/annurev.psych.60.110707.163633 wiley, j., & jarosz, a. f. (2012). working memory capacity, attentional focus, and problem solving. current directions in psychological science, 21, 258-262. doi: 10.1177/0963721412447622 wittmann, w. w., & süß, h.-m. (1999). investigating the paths between working memory, intelligence, knowledge, and complex problem-solving performances via brunswik symmetry. in p. l. ackerman, p. c. kyllonen, & r. d. roberts (eds.), learning and individual differences (pp. 77-108). washington, dc: american psychological association. wüstenberg, s., greiff, s., & funke, j. (2012). complex problem solving. more than reasoning? intelligence, 40, 147-168. doi: 10.1080/13546783.2013.844729 10.11588/jddm.2015.1.26416 jddm | 2015 | volume 1 | article 5 | 10 http://dx.doi.org/10.11588/jddm.2015.1.26416 theoretical contribution how we conceptualize climate change: revealing the force-dynamic structure underlying stock-flow reasoning kurt stocker1 and joachim funke2 1university of zürich, switzerland; 2heidelberg university, germany how people understand the fundamental dynamics of stock and flow (sf) is an important basic theoretical question with many practical applications. such dynamics can be found, for example, in monitoring one’s own private bank account (income versus expenditures), the state of a birthday party (guests coming versus leaving), or in the context of climate change (co2 emissions versus absorption). understanding these dynamics helps in managing everyday life and in controlling behavior in an appropriate way (e.g., stopping expenditures when the balance of a bank account approaches zero). in this paper, we present a universal frame for understanding stock-flow reasoning in terms of the theory of force dynamics. this deep-level analysis is then applied to two different presentation formats of sf tasks in the context of climate change. we can explain why in a coordinate-graphic presentation misunderstandings occur (so called “sf failure”), whereas in a verbal presentation a better understanding is found. we end up with recommendations for presentation formats that we predict will help people to better understand sf dynamics. better public sf understanding might in turn also enhance corresponding public action – such as enhancing pro-environmental actions in relation to climate change. keywords: force dynamics, stock-flow reasoning, climate change, causation, presentation format, dynamic problems the dynamics of stock and flow represent a fundamental,abstract principle of nature: incoming objects are accumulated in a stock that keeps these objects for a certain period of time before they leave the stock by an outflow process. this abstract process description can be applied to species in a certain region (e.g., a given stock of birds has births as inflows and deaths as outflows), to the customers in a warehouse (e.g., a certain number of customers are at a given point in time in the store, new customers enter as inflows and satisfied customers leave as outflows), or to the daily food intake of persons (the current body weight representing the stock). one stock-flow dynamics is of special interest for the survival of most species on planet earth: the greenhouse gases, especially the carbon dioxide (co2) emissions that contribute to global warming. concerning co2 emissions into the atmosphere, a certain amount of emissions is added as inflow to the given stock in the atmosphere, the outflow dissolves mostly in the oceans or by means of photosynthesis. for more than 50 years, the balance between inand outflow seems to be disturbed by extreme increases of greenhouse gas emissions due to human activities (intergovernmental panel on climate change [ipcc], 2013). understanding such stock-flow (sf) processes is of great importance when one wants to control a given system (not only to beware a bathtub from overflow). therefore, it is alarming to hear about the “sf failure” that has been reported repeatedly (cronin, gonzalez, & sterman, 2009; sterman & sweeney, 2002, 2007): “results from the experiments reported here demonstrate an important and pervasive problem in human reasoning: our inability to understand stocks and flows, that is, the process by which the flows into and out of a stock accumulate over time” (cronin et al., 2009, p. 128). within the context of the climate change debate, the sf failure has important implications for the presentation of results (like those from the ipcc-reports) and for the possible implementation of educative measurements (clayton et al., 2016). a deeper understanding of the sf failure seems therefore necessary particularly because recent work from fischer, degen, and funke (2015) suggests that the sf failure might be an effect of the type of representation, i.e., how the information about stock and flow is presented to the participants. our paper offers an explanation of the representation effects found by fischer and colleagues in terms of a more fundamental analysis of sf reasoning processes. this analysis fleshes out the underlying causal structure that is inherent to sf reasoning. specifically, we (in section a) show for the first time how an sf problem (using atmospheric co2 accumulation as an example) can be described in terms of force dynamics, a prominent theory of causal cognition (e.g., stocker, 2014; talmy, 2000). moreover, we examine (in sections b–d) how well scientific presentation formats comprising coordinate systems and graphs, verbal formats, and pictorial-schematic formats of presenting an sf problem (exemplified by atmospheric co2 accumulation) represent the force-dynamic structure underlying basic sf reasoning. our central argument is: the better the formats represent the underlying force-dynamic structure of the sf problem, the better people can understand it. finally (in section e) the implications of these differences of how well the underlying force-dynamic structure of sf co2 accumulation is represented in these different formats are discussed. corresponding author: kurt stocker, university of zurich, department of psychology, neuropsychology, binzmühlestrasse 14, po box 25, ch-8050 zürich, switzerland, e-mail: kurt.stocker@gess.ethz.ch 10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 1 mailto:kurt.stocker@gess.ethz.ch https://doi.org/10.11588/jddm.2019.1.51357 stocker & funke: understanding climate change a. the force-dynamic structure of conceptualizing stock-flow co2 accumulation a most basic sf system consists of one inflow, one outflow, and one stock (sterman & sweeney, 2002, 2007; fischer et al., 2015). in this section the basic force-dynamic structure underlying sf reasoning is exemplified with atmospheric co2 level. people often prefer to make use of the underlying causal structure to understand sf reasoning (brehmer, 1976; fischer et al., 2015; garcia-retamero, wallin, & dieckmann, 2007; gonzalez, 2004). as a starting point to reveal the basic causal – force-dynamic – structure of atmospheric co2 level sf relations, consider the following two sentences adapted from fischer and colleagues (2015, p. 13). these sentences highlight the basic causal relations involved in increase and decrease of atmospheric co2 level: (1) examples of informal causal verbalizations of sf reasoning (exemplified with atmospheric co2 level) a. co2 emissions are caused by the burning of fossil fuels and increase atmospheric co2 concentration. b. co2 absorptions are caused by forests and oceans and decrease atmospheric co2 concentration. naturally, the stock will increase if the inflow is greater than the outflow, decrease if the outflow is greater than the inflow, and remain constant if there is an equal amount of inand outflow. if one puts these increase/decrease sf reasoning mechanisms into a slightly formalized verbal format, one notices that causality is involved at two different hierarchical levels: (2) the basic causal structure of sf reasoning in slightly formalized verbal format (exemplified with atmospheric co2 level) a. the burning of fossil fuels causeslevel-1 atmospheric co2 concentration to increase. b. absorption mechanisms of forests and oceans causelevel-1 atmospheric co2 concentration to decrease. c. larger increase of atmospheric co2 concentration [increase which is causedlevel-1 by the burning of fossil fuels] than decrease of atmospheric co2 concentration [decrease which is causedlevel-1 by absorption mechanisms of forests and oceans] causeslevel-2 atmospheric co2 concentration to increase. d. larger decrease of atmospheric co2 concentration [decrease which is causedlevel-1 by absorption mechanisms of forests and oceans] than increase of atmospheric co2 concentration [increase which is causedlevel-1 the burning of fossil fuels] causeslevel-2 atmospheric co2 concentration to decrease. e. equal increase of atmospheric co2 concentration [increase which is causedlevel-1 by the burning of fossil fuels] and decrease of atmospheric co2 concentration [decrease which is causedlevel-1 by absorption mechanisms of forests and oceans] causeslevel-2 atmospheric co2 concentration to remain constant. taking (2c) as an example, we can notice the embedded causal hierarchy that is underlying basic sf reasoning: in order to capture basic sf reasoning in a full-fledged way, one must not only understand the causes of the increase and decrease (here termed level-1-causality), but one must also understand that the relationship between increase and decrease (e.g., increase is higher than decrease) of course also has causal consequences (here termed level-2-causality; e.g., increase is higher than decrease causes stock to rise). abstracting away from (2), we can also outline the generic causal structure of sf reasoning: (3) the generic causal structure of sf reasoning in slightly formalized verbalized format a. a causeslevel-1 stock to increase. b. b causeslevel-1 stock to decrease. c. larger amounts of inflow [inflow which is causedlevel-1 by a] than outflow [outflow which is causedlevel-1 by b] causelevel-2 stock to increase. d. larger amounts of outflow [outflow which is causedlevel-1 by b] than inflow [inflow which is causedlevel-1 by a] causeslevel-2 stock to decrease. e. equal amounts inflow [inflow which is causedlevel-1 by a] and outflow [outflow which is causedlevel-1 by b] causelevel-2 the stock to remain constant. what are the basic mental elements (basic building blocks) that make up sf reasoning? the conceptualization of cause and effect has long been treated (and often still is) as if cause and effect are themselves conceptual primitives – two conceptual “atoms” that do not consist of still further smaller elements (e.g. goldvarg & johnson-laird, 2001; pearl, 2000; spirtes, glymour, & scheines, 2000). however, with the advent of the theory of force dynamics – a by now prominent theory of causal cognition that has initially been proposed by talmy (e.g., stocker, 2014; talmy, 2000) – it has been possible to demonstrate that cause and effect can be described as consisting of still smaller conceptual elements (see below). in analogy, cause and effect are more like conceptual “molecules” that consist of conceptual “atoms”, rather than being atoms themselves. that causation represents force-dynamic thinking patterns has also been supported by experimental findings (barbey & wolff, 2007; wolff, 2003, 2007; wolff & song, 2003). stocker (2014) has offered a substantial revision of the original force-dynamic account as developed by talmy (1985, 1988, 2000). this revision makes it possible that force dynamics can be applied to all causation, no matter how concrete or abstract the entities that are involved in the causation are (with the framework developed by talmy it was not clear, how force dynamics could be applied for certain abstract entities). this modern version of force dynamics is called elementary force dynamics (stocker, 2014) because it places a strong emphasis on identifying the elements (conceptual primitives) that make up a cause and the elements that make up an effect. first we introduce the basics of elementary force dynamics with a classic example from talmy. then we will demonstrate how elementary force dynamics can also be used to describe basic “elementary” sf reasoning, using atmospheric co2 accumulation as an example. 10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 2 https://doi.org/10.11588/jddm.2019.1.51357 stocker & funke: understanding climate change a.1. force dynamics: general elements of the theory both classic force dynamics (talmy, 2000) and elementary force dynamics (stocker, 2014) involve the assumption that mental representation of causation involves the conceptualization of two opposed entities that are engaged in a force interaction. all general elements of the forcedynamic theory explained in this section stem from theoretical cognitive-linguistic work (stocker, 2014; talmy, 2000). for a linguistic exemplification, consider the following sentence adapted from talmy (2000, p. 416): (4) the ball started rolling because of the wind. in a force-dynamic terms, the ball functions as the agonist (ago).1 in elementary force dynamics (stocker, 2014), ago is a cognitive entity which can take on any given state or action. in (4) ago is the ball, which initially takes on the state-value stationariness (if there were no wind, the ball would not be moving). the opposed conceptualized force entity is referred to as the antagonist (ant) in forcedynamic theory. in (4) the wind takes on the ant function. in elementary force dynamics, ant is always conceptualized as attempting to impose a state or action onto ago that is different from the outset action or state of ago. thus, when ago is initially associated with stationariness, ant will by definition try to impose a value or state other than stationariness. hence in (4) the wind carries a different force value than the ball: ant carries the force value motion, trying to impose this onto the inert ball. the resultant (effect) of a force-dynamic interaction always relates to ago. the resultant depends on which of the force entities – ago or ant – is conceptualized as being stronger. in causation types that can be phrased with because (of), as in (4), ant is always conceptualized as being stronger.2 consequently, in (4) the force of ant (the wind’s motion force) is stronger than force of ago (the ball’s inert force). thus, the ball is conceptualized as having been moved by the wind. stocker (2014) has introduced a specific formal system to describe force-dynamic interactions (which is an abstraction of talmy’s original force-dynamic diagramming system). for a force-dynamic interaction underlying the linguistic use of words like “because” or “caused” (formally termed cause or successful causation), the notational system looks as follows: (5) c: ago-x, ant-xdiff(++) → e: ago-xdiff c stands for cause and e for effect. in elementary force dynamics, ago can initially be associated with any given state or action value (value “x”). in contrast, ant’s value must by definition be a state or action value that differs from x (“xdiff” where diff stands for different). ant attempts to impose its different value onto ago. as ant’s force is stronger than ago’s force in because (“++” stands for stronger3), the effect e is that ago has to take on the different force value of ant (→ e: ago-xdiff). the content of (4) is force-dynamically distributed as follows (following notational convention, the content of (4) is added to the force-dynamic formula of (5) in underlined subscript): (6) c: agoball-xstationariness, antwind-xdiffmotion(++) → e: agoball-xdiffmotion (6) reads: the cause c involves conceptualizing the ball in the function of ago with the initial state value (x) of being stationary. ago is weaker than ant (the wind). ant is conceptualized as imposing the different value (xdiff) of motion onto ago (the ball). as ant’s imposing motion is conceptualized as being the stronger (++) force than ago’s inert force, the cognized effect e is indeed that the wind (ant) causes the ball (ago) to move. a.2. force dynamics applied to stock-flow reasoning as a novel contribution to the study of stock-flow reasoning, it is now shown that the force-dynamic elements in (5) readily offer themselves to capture the basic causal mechanisms that are at work in stock-flow reasoning. (7) lays out all the force-dynamic elements that are involved in sf reasoning – exemplified in relation to how people conceptualize climate change (the details of (7) will become clear as the analysis proceeds). (7) the force-dynamic elements of sf reasoning (exemplified with atmospheric co2 level) a. agonist (ago): stock (atmospheric co2 level) b. antagonist1 (ant1): stock increaser (burning of fossil fuels) c. antagonist2 (ant2): stock “decreaser” (absorption mechanisms of forests and oceans) d. x-state: at a certain level e. xdiff-action1: increase level f. xdiff-action2: decrease level g. xdiff-action3: remain level h. stronger than (++) i. larger than (>) j. equal to (=) looking at (7), we may note that it is proposed that the stock can always be conceptualized as ago (7a; atmospheric co2 level), and the forces that increase or decrease the stock always as ant (7b–c; the burning of fossil fuels as an atmospheric-co2-level increaser, and absorption mechanisms of forests and oceans as an atmospheric-co2-level “decreaser”). the initial conceptualized state of ago (atmospheric co2 level) before the intervention of ant is always to be at a certain level (7d). force interaction that results in increase, decrease or no increase/decrease of the stock (of atmospheric co2 level) always involves ant trying to intervene with ago’s state (7e–g), and depending on whether ago or ant is conceptualized as stronger (7h), the end result relates to either a change of stock amount – more increase than decrease or more decrease than increase (7i) – or no change of stock amount (7j). it is now shown how the elements of (7) can capture level-1 causality and 1talmy (2000) borrowed the terms agonist and antagonist from physiology, where these terms stand for members of specific opposed muscle pairs. in force dynamics, these terms are used in a different sense than in physiology (see stocker, 2014). 2ant is always stronger than ago when the causal relationship can be phrased with words such as “caused”, “because”, “therefore”, and the like. there are other forms where ago is stronger then ant, e.g. in „the ball did not roll despite the wind” (see stocker, 2014). 3talmy (2000) uses the symbol “+” to represent the forcedynamic meaning stronger than. however, since this symbol is already used in mathematics as a symbol to represent the meaning plus and since a few basic mathematical symbols (>, =) will later on in this article be used in a force-dynamic context, we use “++” to represent the force-dynamic meaning stronger than so as to avoid potential confusion with the mathematical meaning plus. 10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 3 https://doi.org/10.11588/jddm.2019.1.51357 stocker & funke: understanding climate change level-2 causality in their conceptual entirety. (8a) and (8b) represent force-dynamic reformulations of the slightly formalized verbal formats of (2a) and (2b), respectively (they represent level-1 causality). (8) level-1 causality of sf reasoning in force-dynamic format exemplified with atmospheric co2 level a. c: agoatmospheric co2-concentration-xat-a-certain-level, antburning of fossil fuels-xdiffincrease level (++) → e: agoatmospheric co2-concentration-xdiffincrease level verbalized: the burning of fossil fuels causeslevel1 atmospheric co2 concentration to increase. abbreviation: emission-causes-co2-increase b. c: agoatmospheric co2-concentration-xat-a-certain-level, antforests and oceans-xdiffdecrease level (++) → e: agoatmospheric co2-concentration-xdiffdecrease level verbalized: absorption mechanisms of forests and oceans causelevel1 atmospheric co2 concentration to decrease. abbreviation: absorption-causes-co2-decrease (8a) suggests that the atmospheric co2 concentration is force-dynamically conceptualized as ago. naturally, before any co2 concentration increasing or decreasing force interferes, the co2 concentration is conceptualized as being at a certain level – thus “at a certain level” functions as the x-value of ago. in (8a), the role of ant is taken on by the burning of fossil fuels. recall that in elementary force dynamics, ant is by definition conceptualized as having a value different from the one of ago which it tries to impose onto ago. here, ant carries within it the force value (xdiff) of increasing ago’s (co2 concentration’s) initial state of remaining at a certain level. the burningof-fossil-fuel’s (ant’s) force of increasing the co2 concentration is conceptualized as being stronger (++) than the co2 concentration’s basic state of remaining at a certain level. all this – cognizing the co2 concentration’s (ago’s) initial state (the state before any forces act upon ago) as being at a certain level and cognizing a stronger ant force that imposes an increasing-level force onto ago – represents the force-dynamic cause c. thus, given that ant is conceptualized as being stronger than ago, the cognized effect e is indeed that the burning of fossil fuels (ant) causes the co2 concentration (ago) to rise. as will be seen, the whole force-dynamic conceptualization of (8a) will become part of still larger force-dynamic conceptualizations in sf reasoning. in relation to this, it will be convenient, if we can abbreviate (8a). thus an abbreviation is already introduced: emission-causes-co2-increase (please recall that this abbreviation stands for the entire forcedynamic structure (8a)). force-dynamically (8b) reads very similarly to (8a) as it is only some content values that change. the ago at the outset is the same in (8b) as in (8a): the cause (c) again involves cognizing the co2 concentration’s (ago’s) initial state (the state before any forces act upon ago) as being at a certain level. but this time ago’s initial state is conceptualized as being intervened by a different ant: this time the forest and ocean absorption mechanisms function as ant and as such ant is carrying within it the force to decrease ago’s initial state of remaining at a certain level. as ant is again stronger (++) than ago, the conceptualized effect (e) is this time that co2 concentration decreases. formally (8b) is abbreviated to: absorptioncauses-co2-decrease (please recall that this abbreviation stands for the entire force-dynamic structure (8b)). as already shown in (2), the basic causal structure of sf reasoning involves a two-level causal hierarchy. so far, we have examined level-1 causality from a force-dynamic viewpoint. level-2 causality now involves force-dynamically to conceptualize atmospheric co2 increase with the following formulation for (2c), where co2 increase is larger than decrease: (9) level-2 causality (increase > decrease) of sf reasoning in force-dynamic format exemplified with increasing atmospheric co2 level (with abbreviations) c: agoatmospheric co2-concentration-xat-a-certain-level, ant: [emission-causes-co2-increase > absorption-causes-co2-decrease] xdiffincrease level (++) → e: agoatmospheric co2-concentration-xdiffincrease level verbalized: larger increase of atmospheric co2 concentration [increase which is causedlevel1 by the burning of fossil fuels] than decrease of atmospheric co2 concentration [decrease which is causedlevel1 by absorption mechanisms of forests and oceans] causeslevel2 atmospheric co2 concentration to increase. thus, ago in (9) is still (as in all other examples) the atmospheric co2 concentration. the role of ant is now force-dynamically complex: ant now stands for “increase is higher than decrease” (where increase and decrease themselves also have a cause, as captured by the formal abbreviations; the larger-than relation is represented with the standard mathematical symbol for this, >). this semantically complex ant (increase > decrease) naturally carries within it the force of increasing ago’s (co2 concentration’s) initial state of remaining at a certain level. thus, given that ant is conceptualized as being stronger than ago, the cognized effect e is that increase > decrease causes the co2 concentration (ago) to rise. the two remaining sf level-2-causality possibilities (co2 decrease or co2 remaining constant) can be formally captured in very similar ways to (9): (10) level-2 causality (decrease > increase; increase = decrease) of sf reasoning in force-dynamic format exemplified with decreasing and constant atmospheric co2 level a. c: agoatmospheric co2-concentration-xat-a-certain-level, ant: [absorption-causes-co2-decrease > emission-causes-co2increase]xdiffdecrease level (++) → e: agoatmospheric co2-concentration-xdiffdecrease level verbalized: larger decrease of atmospheric co2 concentration [decrease which is causedlevel1 by absorption mechanisms of forests and oceans] than increase of atmospheric co2 concentration [increase which is causedlevel1 10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 4 https://doi.org/10.11588/jddm.2019.1.51357 stocker & funke: understanding climate change by the burning of fossil fuels] causeslevel2 atmospheric co2 concentration to decrease. b. c: agoatmospheric co2-concentration-xat-a-certain-level, ant: [emission-causes-co2-increase = absorption-causes-co2-decrease] xdiffremain level (++) → e: agoatmospheric co2-concentration-xdiffremain level verbalized: equal increase of atmospheric co2 concentration [increase which is causedlevel1 by the burning of fossil fuels] and decrease of atmospheric co2 concentration [decrease which is causedlevel1 by absorption mechanisms of forests and oceans] causeslevel2 atmospheric co2 concentration to remain constant. thus, in (10a) the complex ant represents the relationship decrease > increase which imposes the force onto ago (atmospheric co2 level) of decreasing co2 level, and in (10b) ant represents the relationship increase = decrease which imposes the force onto ago (atmospheric co2 level) of keeping the co2 level at a constant level. in sum, in this section it has been shown how the forcedynamic elements of (7) can fully flesh out the complex causal mechanisms that underlie sf reasoning. in the following sections, different presentation formats for stockflow co2 accumulation will be investigated to see how well they represent (3), 2-level causality, and (7), the basic causal elements of sf reasoning. we will argue that the better the presentation format for stock-flow reasoning represents the underlying force-dynamic structure, the better people can understand stock-flow reasoning (in the following sections exemplified with co2 accumulation). b. pictorial-schematic presentation format for stock-flow co2 accumulation as fischer and colleagues write: the structure of sf systems is often explained by a bathtub analogy: the water level (stock) in a bathtub increases if the inflow of water through the faucet exceeds the outflow through the drain; the water level drops if the outflow exceeds the inflow (2015, p. 2). this bathtub analogy readily lends itself to a pictorialschematic presentation format for sf co2 accumulation. thus, in order to investigate how well a pictorial bathtub analogy of sf reasoning can represent the underlying force-dynamic structure, we have devised such a pictorialschematic presentation (see figure 1). this is how we intuitively would present the bathtub analogy in a pictorialschematic presentation format; future studies would have to determine if there is an optimal way of how to represent the bathtub analogy for climate change in a pictorialschematic format. all pictorial-schematic elements of stock-flow co2 accumulation in figure 1 have their ready analogues to potential water increase or decrease in a bathtub: instead of an increasing, decreasing, or constant water level in a bathtub container, we have an increasing, decreasing, or constant co2 level in an “atmosphere container”; instead of water entering and leaving a bathtub, we now have co2 “entering and leaving” the atmosphere; instead of larger/smaller amounts of inflow and outflow of water, we have larger/smaller amounts of inflow and outflow of co2 (larger/smaller amounts of inand outflow are depicted by smaller and larger arrows, respectively). as a basic guide to analyze the causal force-dynamic content of figure 1 we first use (3). this will allow us to check whether the pictorial-schematic representation of sf reasoning in figure 1 explicitly represents two-level causality. then (7) – the force-dynamic elements of sf reasoning – will be used as a force-dynamic check to examine whether a pictorial bathtub analogy of sf reasoning (figure 1) can represent all force-dynamic elements that are involved in basic sf reasoning. as examined in (3), sf level-1 causality involves the two basic causal relationships. first in level-1 causality (3a), a (i.e., ant1) increases the stock (ago). figure 1(a–c) depicts this with an arrow pointing into the stock (labeled “co2 in (emission: burning of fossil fuels)”), suggesting co2 increase in the atmosphere. second in level-1 causality (3b), b (i.e., ant2) decreases the stock (ago). figure 1(a–c) depicts this with an arrow pointing out of the atmosphere (labeled “co2 out (absorption: forests, oceans)”), suggesting co2 “leaving” the atmosphere. as also examined in (3), sf level-2 causality involves three basic causal relationships: (3c) increase > decrease causes stock increase; (3d) decrease > increase causes stock decrease; and (3e) increase = decrease causes stock to remain constant. in figure 1, the > and = relationships are captured by the different sizes of both solid-line arrows. thus, for instance, if (depicting (3a)) the inward-pointing arrow is larger than the outward-pointing arrow, then it can be deduced that this results in co2 increase (symbolized with the upward-pointing dotted arrow). force-dynamically we may furthermore note that figure 1 depicts all ten force-dynamic elements that are involved in basic sf reasoning (cf. with (7)): (11) the force-dynamic elements of sf reasoning in the pictorial bathtub-analogy format (exemplified with atmospheric co2 level) a. ago: stock (here co2 level) is pictorialschematically represented by a dotted line symbolizing the co2 level which is complemented by the verbal label co2 level b. ant1: stock increaser (emission) is pictorialschematically represented by an inward-pointing arrow which is complemented with the verbal labels co2 in and emission: burning of fossil fuels c. ant2: stock “decreaser” (absorption) is pictorial-schematically represented by an outward-pointing arrow which is complemented by the verbal labels co2 out and absorption: forests, oceans d. x-state: being at a certain given level; pictorialschematically represented by the horizontal orientation of the dotted co2 level line and by adding the verbal label level next to the verbal label co2 e. xdiff-action1: increase level is pictorialschematically represented by the dotted upward arrow which is complemented with the verbal label increase f. xdiff-action2: decrease level is pictorialschematically represented by the dotted downward arrow which is complemented with the written label decrease 10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 5 https://doi.org/10.11588/jddm.2019.1.51357 stocker & funke: understanding climate change figure 1. pictorial-schematic presentation format for stock-flow co2 accumulation (based on the bathtub analogy). (a) more co2 gets in the atmosphere (symbolized by the larger inward-pointing arrow) than out (smaller outward-pointing arrow), resulting in co2 increase (upward-pointing arrow). (b) more co2 leaves the atmosphere (larger outward-pointing arrow) than comes in (smaller inward-pointing arrow), resulting in co2 decrease (downward-pointing arrow). (c) the same amount of co2 enters and leaves the atmosphere (symbolized by equal-sized inwardand outward-pointing arrows), resulting in a co2 concentration that remains constant (symbolized by a horizontal non-pointed strip). g. xdiff-action3: remain level is pictorialschematically represented by the dotted horizontal non-pointed strip complemented with the written label constant h. stronger than (++): is pictorial-schematically represented by placing the larger flow arrow to the stronger force entity and the smaller flow arrow to the weaker force entity (the forcerelationship itself has no additional verbal label; it is thus only symbolized by symbolically suggesting that stronger = larger and weaker = smaller). i. larger than (>): is (similar to h) pictorialschematically represented by placing the larger flow arrow to the larger flow quantity and the smaller flow arrow to the smaller flow quantity j. equal to (=): is pictorial-schematically represented by placing equally-sized flow arrows to the inflow and outflow quantities summing up the analysis of figure 1, one can notice that the pictorial-schematic presentation format for stock-flow reasoning together with its complementing verbal labels depicts the causal structure of sf reasoning about co2 level in the atmosphere in its entirety: it depicts both levels of the causality hierarchy (level-1 and level-2 causality, cf. (3)) as well as all force-dynamic elements that make up basic sf reasoning (cf. (7)). c. verbal presentation format for stock-flow co2 accumulation consider the following verbal format (from fischer et al., 2015) that is used to assess the understanding of lay people’s sf understanding in the context of climate change: 12) verbal presentation format for stock-flow co2 accumulation co2 emissions are caused by the burning of fossil fuels and lead to an increase of atmospheric co2 concentration. co2 absorptions are caused by forests and oceans and decrease atmospheric co2 concentration. co2 emissions are currently twice as high as co2 absorptions. imagine that emissions were reduced by 30%: how would the atmospheric co2 concentration react? a. atmospheric co2 concentration would increase. b. atmospheric co2 concentration would decrease c. atmospheric co2 concentration would remain constant. this format yielded high understanding of sf problems: the majority of the participants correctly inferred that as long as there is more co2 increase than decrease (which is what the scenario above suggests), co2 level rises. let it now be examined how extensively (12) covers the underlying causal (force-dynamic) structure of basic sf reasoning. as with the pictorial-schematic format of the bathtub analogy which was examined in the previous section, first (3) serves again as a guide to analyze if both levels of the causality hierarchy (level-1 and level-2 causality) are represented; and then (7) is again used to see how extensively force-dynamic elements are represented. the sf level-1 causal relation (3a) is clearly covered in (12) with the sentence “co2 emissions are caused by the burning of fossil fuels and lead to an increase of atmospheric co2 concentration”. sf level-1 causal relation (3b) is also covered: “co2 absorptions are caused by forests and oceans and decrease atmospheric co2 concentration”. in (12), the level-2-causality relationship (3c) increase > decrease is covered by the two sentences “co2 emissions are currently twice as high as co2 absorptions. imagine that emissions were reduced by 30%” as well, of course, as by the corresponding correct answer “atmospheric co2 concentration would increase”. the causal thinking that is behind the other two theoretically possible basic relationships – (3d: decrease > increase) and (3e: increase = decrease)– are covered by the (wrong) answers “atmospheric co2 concentration would decrease” and “atmospheric co2 concentration would remain constant”, respectively. in (12), how extensively are the force-dynamic elements that are involved in basic sf reasoning (cf. with (7)) rep10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 6 https://doi.org/10.11588/jddm.2019.1.51357 stocker & funke: understanding climate change figure 2. coordinate-graphic presentation format for stock-flow co2 accumulation. the graph plots time on the x-axis and hypothesized amounts of atmospheric co2 in giga-tons on the y-axis. the predicted emission and absorption curves are plotted. resented? this is shown in (13). if force-dynamic elements are missing, then this will be symbolized with “—” and “n.a.” stands for not applicable. (13) the force-dynamic elements of sf reasoning in verbal format (exemplified with atmospheric co2 level) a. ago: “co2 concentration” b. ant1: “burning of fossil fuels” c. ant2: “forests and oceans” d. x-state: — (xat a certain level has to be inferred) e. xdiff-action1: “co2 concentration would increase” f. xdiff-action2: “co2 concentration would decrease” g. xdiff-action3: “concentration would remain constant” h. stronger than (++): n.a. i. larger-than (>): n.a j. equal-to (=): n.a. hence, we may note that (12) explicitly represents almost all force-dynamic elements that are involved in basic sf reasoning. what is not explicitly expressed in (12) is that ago’s (the co2 concentration’s) state is conceptualized as being initially at a certain level (before any intervention takes place). however, given the high sf understanding that (12) has produced (fischer et al., 2015), one may perhaps assume that the omission of this piece of information can easily be inferred. that relational information such as increase > decrease (13h–j) is not represented in (12), lies in the nature of the task. after all, the very nature of this task is for the participants to infer relevant relational information, such as increase > decrease, from the other given information. in sum, in this section it has been shown that causal force-dynamic structure of sf reasoning about atmospheric co2 level is almost captured in its entirety by the verbal presentation format of fischer and colleagues (2015) and that the one force-dynamic value missing – the x-state at a certain level – can probably be easily inferred. d. coordinate-graphic presentation format for stock-flow co2 accumulation figure 2 (as used in fischer et al., 2015) shows one of the most common forms of how to represent stock-flow co2 accumulation in studies on understanding sf phenomena: a coordinate-graphic presentation format based on standard scientific representation norms (plotting a relation of two variables on an xand y-axis). it is now examined how well the coordinate-graphic presentation format for stockflow co2 accumulation represents the underlying causal structure of sf reasoning. we may note that the graphic presentation of figure 2 does neither depict level-1 causality (3a) nor level-1 causality (3b). hence, neither the actual causal interplay between emission and co2 increase (how more emission leads to more co2) nor between absorption and co2 decrease (how more absorption leads to less co2) is depicted in the coordinate-graphic presentation. similarly, one also does not find level-2 causality relationships directly represented in the graph of figure 2. only part of the relationship (3c) “increase > decrease causes stock increase” can be logically inferred; for instance for 2050, by noticing that emission is still higher on the y-axis than absorption (this, of course, allows to infer “increase > decrease”). however, the causal consequence of this (“causes stock to increase”) can no longer be inferred from the visual information given by the graph of figure 2. hence, while parts of level-2 causality can be logically inferred by inspecting the visual information in the graph, the actual causal consequences cannot be directly derived by solely inspecting the visual information in the graph. finally, we turn to the question: how extensively are the force-dynamic elements that are involved in basic sf reasoning (cf. with (7)) represented in figure 2? (14) the force-dynamic elements of sf reasoning in the coordinate-graphic format (exemplified with atmospheric co2 level) a. ago: “co2 in giga tons” b. ant1: the plotted line labeled “emission” c. ant2: the plotted line labeled “absorption” d. x-state: — (xat a certain level has to be inferred) e. xdiff-action1: — (in accompanying text only) 10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 7 https://doi.org/10.11588/jddm.2019.1.51357 stocker & funke: understanding climate change figure 3. pictorial-schematic presentation format for stock-flow co2 accumulation where increase/decrease/constancy has to be inferred. could be combined with a verbal format. f. xdiff-action2: — (in accompanying text only) g. xdiff-action3: — (in accompanying text only) h. stronger than (++): n.a. i. larger-than (>):(++): n.a j. equal-to (=): n.a. the force-dynamic analysis of the graph in (14) is summarized quickly: the force-dynamic interaction between the ago and the ant is not represented because the actionvalues of the antagonists (the powers that lead to increase, decrease, or constancy) are not represented in the graph. therefore, none of the actual causal interplay between atmospheric co2 level and emission/absorption that leads to co2 increase is graphically displayed. while the coordinate-graphic presentation format (14) displays the underlying causal relationships only marginally, it of course does display information that the pictorial-schematic presentation format (figure 2) and the verbal presentation format (12) does not display: predictions of how levels of emission and absorption develop over time. such information is of course vital and indispensable, also for the public. thus, what might be at hand in order to optimize public understanding of climate change (cf. introduction) is a synthesis of the coordinate-graphic, the verbal, and the pictorial-schematic format (see discussion). e. discussion in this article two main endeavors were undertaken. first, we developed a force-dynamic account of stockflow reasoning and showed that our new theoretical approach to stock-flow reasoning can explain all basic cognitive aspects that are involved in stock-flow reasoning. second, we carried out an analysis in terms of this new force-dynamic account for stock-flow reasoning in relation to three different modes of presentation of co2 accumulation: (a) the pictorial-schematic format of the bathtub analogy (as shown in figure 1), (b) the verbal format (as shown in (12) and (13)), and (c) the coordinate-graphic format (as shown in (14) and in figure 2). force-dynamic theory explains why (b) produces better understanding than (c) because the necessary force-dynamic elements for a correct understanding are only present in (b). thus, our force-dynamic account has the theoretical power to explain the empirical finding of fischer and colleagues (2015) that a verbal format of stock-flow reasoning produces a better understanding than what studies using coordinategraphic format (cronin, gonzalez, & sterman, 2009; sterman & sweeney, 2002, 2007) have found. additionally, – as a new empirical question – we can also turn to the question how the bath-tube-analogybased pictorial-schematic presentation format could be used to enhance understanding of sf reasoning (including understanding the sf reasoning about climate change). so far (in section b) we have only shown (see figure 2) pictorial-schematic presentation formats that “give away” whether there will be stock increase, decrease, or constancy (because the purpose was to identify the underlying causal and forcedynamic structure and not to present ways how sf reasoning abilities might be tested). however, any investigation into basic sf reasoning skills must of course not give away what has been termed level-2 causality in this paper (increase > decrease causes stock increase; decrease > increase causes stock decrease; and increase = decrease causes stock to remain constant) – because it is just the correctness of these inferences that usually are tested when investigating how well people perform in sf reasoning (cronin, gonzalez, & sterman, 2009; fischer et al., 2015; sterman & sweeney, 2002, 2007). figure 3 gives one example of a pictorial-schematic presentation format for stock-flow co2 accumulation that does not give away level-2 causality. the pictorial-schematic format in figure 3 could for instance be combined with the verbal format of fischer and colleagues (2015), with (12), to assess basic understanding of sf reasoning skills. answering the questions in (12), participants in corresponding studies could use a graph among the lines of figure 3 to think their answers through. as participants would be reading the addition (“twice as high”) and subtraction (“reduced by 30%”) information, they might for instance inspect the graph and might mentally project 10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 8 https://doi.org/10.11588/jddm.2019.1.51357 stocker & funke: understanding climate change figure 4. pictorial-schematic complementation of the coordinate-graphic presentation format for stock-flow co2 accumulation. furthermore, this combined format could also be presented together with an additional coordinate graph that shows the real, actual increase over the years. corresponding higher and lower levels of co2 levels into the picture. such a visual aid could possibly lead to still higher understanding of sf reasoning about climate change. schematic visual information – like the one in figure 3 that highlights more spatial relations than actual visual content – might indeed assist reasoning skills. people for instance do better in logical reasoning when little or no imagery is required, but when the problem can be cognized with a clear spatial layout (knauff, 2013). finally: what could be done to counteract so called “sf failure” – the widespread wrong interpretations of sf problems that involve emission/absorption coordinate graphs (cronin, gonzalez, & sterman, 2009; sterman & sweeney, 2002, 2007)? this is a very important question, as such coordinate graphs feature prominently in official reports about global warming (e.g., intergovernmental panel on climate change [ipcc], 2013). a simple attempt to counteract sf failure would be to add a pictorial-schematic presentation format of figure 1 to the coordinate-graphic presentation format of figure 3. a text that could be added to fig. 1 is for instance: “2050. still co2 increase. as long as there is more co2 emission than absorption, there is co2 increase”. this is represented in figure 4. as shown in figure 4, bringing together different presentation formats (coordinate, pictorial-schematic, which could furthermore be accompanied by verbal) could form a powerful synthesis that could possibly help a) researchers to more adequately assess basic stock-flow reasoning skills, and b) the public to better understand reasoning about climate change. together with coordinate graphs showing the real increase over the years, this synthesis would allow representing future predictions of emission and absorption developments (sf coordinate format), while at the same time making the underlying force-dynamic (causal) structure transparent (sf verbal and pictorial-schematic format). given the “relentless rise of carbon dioxide” (as nasa puts it4), and given the dangers that come with it for our planet, we should not too easily accept the “sf failure” of people as a scientific fact. quite the opposite: we have offered a theoretical (force-dynamic) interpretation of an empirical finding (of fischer and colleagues, 2015) that makes a strong case that – as long as the presentation formats for sf reasoning represents force-dynamic causal thinking patterns – people actually show large sf competence. sf competence might be a much more promising basis than sf failure to promote pro-environmental behavior among the human species in order to protect our planet. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: the authors contributed equally to this work. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: stocker, k., & funke, j. (2018). how we conceptualize climate change: revealing the force-dynamic structure underlying stock-flow reasoning. journal of dynamic decision making, 5, 1. doi:10.11588/jddm.2019.1.51357 received: 24 aug 2018 accepted: 03 may 2019 published: 07 may 2019 4 https://climate.nasa.gov/climate_resources/24/ 10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 9 https://doi.org/10.11588/jddm.2019.1.51357 https://climate.nasa.gov/climate_resources/24/ https://doi.org/10.11588/jddm.2019.1.51357 stocker & funke: understanding climate change references barbey, a. k., & wolff, p. (2007). learning causal structure from reasoning. in proceedings of the 29th annual conference of the cognitive science society (pp. 713–718). nashville, tn: erlbaum. brehmer, b. (1976). learning complex rules in probabilistic inference tasks. scandinavian journal of psychology, 17(1), 309–312. doi:10.1111/j.1467-9450.1976.tb00245.x clayton, s., devine-wright, p., swim, j., bonnes, m., steg, l., whitmarsh, l., & carrico, a. (2016). expanding the role for psychology in addressing environmental challenges. american psychologist, 71(3), 199–215. doi:10.1037/a0039482 cronin, m. a., gonzalez, c., & sterman, j. d. (2009). why don’t well-educated adults understand accumulation? a challenge to researchers, educators, and citizens. organizational behavior and human decision processes, 108(1), 116–130. doi:10.1016/j.obhdp.2008.03.003 fischer, h., degen, c., & funke, j. (2015). improving stock-flow reasoning with verbal formats. simulation & gaming, 46(3-4), 255–269. doi:10.1177/1046878114565058 garcia-retamero, r., wallin, a., & dieckmann, a. (2007). does causal knowledge help us be faster and more frugal in our decisions? memory & cognition, 35(6), 1399–1409. doi:10.3758/bf03193610 goldvarg, e., & johnson-laird, p. n. (2001). naive causality: a mental model theory of causal meaning and reasoning. cognitive science, 25(4), 565–610. doi:10.1207/s15516709cog2504_3 gonzalez, c. (2004). learning to make decisions in dynamic environments: effects of time constraints and cognitive abilities. human factors, 46(3), 449–460. doi:10.1518/hfes.46.3.449.50395 intergovernmental panel on climate change [ipcc] (2013). summary for policymakers. in t. f. stocker, d. qin, g.-k. plattner, m. tignor, s. k. allen, j. boschung,...p.m. midgley (eds.), climate change 2013: the physical science basis. cambridge, uk: cambridge university press. retrieved from http://www.ipcc .ch/report/ar5/wg1/#.uuqh1bdgooc knauff, m. (2013). space to reason: a spatial theory of human thought. cambridge, ma: mit press. pearl, j. (2000). causality: models, reasoning, and inference. cambridge, england: cambridge university press. spirtes, p., glymour, c., & scheines, r. (2000). causation, prediction, and search. cambridge, ma: mit press, 2nd edition. sterman, j. d., & sweeney, l. b. (2002). cloudy skies: assessing public understanding of global warming. system dynamics review, 18(2), 207–240. doi:10.1002/sdr.242 sterman, j. d., & sweeney, l. b. (2007). understanding public complacency about climate change: adults’ mental models of climate change violate conservation of matter. climatic change, 80(3-4), 213–238. doi:10.1007/s10584-006-9107-5 stocker, k. (2014). the elements of cause and effect. international journal of cognitive linguistics, 5(2), 121–145. talmy, l. (1985). force dynamics in language and thought. in papers from the twenty-first regional meeting of the chicago linguistic society, 293–337. chicago linguistic society. talmy, l. (1988). force dynamics in language and cognition. cognitive science, 12(1), 49–100. doi:10.1016/03640213(88)90008-0 talmy, l. (2000). toward a cognitive semantics. vol. 1: concept structuring systems. cambridge, ma: mit press. wolff, p. (2003). direct causation in the linguistic coding and individuation of causal events. cognition, 88(1), 1–48. doi:10.1016/s0010-0277(03)00004-0 wolff, p. (2007). representing causation. journal of experimental psychology: general, 136(1), 82–111. doi:10.1037/00963445.136.1.82 wolff, p., & song, g. (2003). models of causation and the semantics of causal verbs. cognitive psychology, 47(3), 276–332. doi:10.1016/s0010-0285(03)00036-7 10.11588/jddm.2019.1.51357 jddm | 2019 | volume 5 | article 1 | 10 https://doi.org/10.1111/j.1467-9450.1976.tb00245.x https://dx.doi.org/10.1037/a0039482 https://dx.doi.org/10.1016/j.obhdp.2008.03.003 https://dx.doi.org/10.1177/1046878114565058 https://doi.org/10.3758/bf03193610 https://doi.org/10.1207/s15516709cog2504_3 https://doi.org/10.1518/hfes.46.3.449.50395 http://www.ipcc.ch/report/ar5/wg1/#.uuqh1bdgooc http://www.ipcc.ch/report/ar5/wg1/#.uuqh1bdgooc https://dx.doi.org/10.1002/sdr.242 https://dx.doi.org/10.1007/s10584-006-9107-5 https://doi.org/10.1016/0364-0213(88)90008-0 https://doi.org/10.1016/0364-0213(88)90008-0 https://doi.org/10.1016/s0010-0277(03)00004-0 https://doi.org/10.1037/0096-3445.136.1.82 https://doi.org/10.1037/0096-3445.136.1.82 https://doi.org/10.1016/s0010-0285(03)00036-7 https://doi.org/10.11588/jddm.2019.1.51357 original research the role of difficulty in dynamic risk mitigation decisions lisa vangsness and michael e. young kansas state university, department of psychological sciences previous research suggests that individuals faced with risky choices seek ways to actively reduce their risks. the risk defusing operators (rdos) that are identified through these searches can be used to prevent or compensate for (here, preand post-event rdos, respectively) negative outcomes. although several factors that affect rdo selection have been identified, they are limited to static decisions conducted during descriptive tasks. the factors that influence rdo selection in dynamically unfolding environments are unknown, and the relationship between task characteristics and rdo selection has yet to be mapped. we used a videogame environment to conduct two experiments to address these issues and found that experienced losses impacted risk mitigation strategy: when the task was difficult, participants experienced greater losses and were more likely to select preventive rdos (experiment 1). additionally, risk mitigation behavior stabilized as participants gained experience with the task (experiments 1 and 2) and could be shifted by making an rdo easier to use (experiment 2). exploratory analyses suggested that these risk mitigation choices were not driven by judgments of difficulty (jods), even though participants’ jods were accurate and aligned with task difficulty. this research suggests that while people seek preventive rdos when tasks are difficult and risky, risk mitigation strategy is shaped by experienced losses; decision makers do not use jods to anticipate future risks and inform risk mitigation decisions. keywords: difficulty, risk mitigation, risk defusing operators, judgments of difficulty, dynamic environments most people brush their teeth before work in the morn-ing. when repeated twice a day, this small preventive measure can significantly reduce the risk of cavities and improve overall oral hygiene. despite the positive benefits of tooth brushing, more than half of americans report forgetting to brush their teeth at least once in the past year (delta dental, 2014). failing to brush your teeth invites risk but generally does not result in a negative outcome. unless you consistently fail to brush or are particularly susceptible to cavities, you will not require a filling. this everyday decision is a simplified example of the choices that are made in high-risk medical, defense, and educational situations (to name a few) around the world: is it better to expend time and energy on preventive measures or should we wait and minimize the costs of prevention by taking action only if a negative outcome occurs? our research studies sought to identify factors that contribute to when and how risk mitigation strategies are chosen, specifically within dynamic environments that rapidly change and respond to peoples’ actions. previous research involving the use of risk mitigation strategies has focused on the conditions under which people will search for risk defusing operators (rdos), which are actions or tools that can be used to reduce the risks associated with a decision (huber, beutter, montoya, & huber, 2001). in these experiments, participants seek rdos by asking questions about a vignette. these questions may emphasize preventive or compensatory strategies that could be employed before (pre-event rdos) or after (postevent rdos) a decision is made to reduce the likelihood or severity of a negative outcome (for a review see huber, 2012). more than a decade of research with these vignette-based descriptive tasks suggests that participants’ willingness to seek rdos depends on information availability and environmental pressures. that is, risk mitigation depends not only on the environment, but also on a person’s ability to detect and interpret environmental cues. while vignette-based tasks fail to capture the dynamic nature of some real-world decisions, this research illustrates an important concept: people will actively engage with the environment to reduce their risks when they perceive the opportunity to do so (huber, beutter, montoya, & huber, 2001). for this reason, we will review research that uses vignette-based tasks before exploring the implications these findings have on choices made in dynamic decisionmaking tasks and the current studies. risk mitigation in vignette-based decision-making tasks within the context of vignette-based tasks, individuals initiate a search for rdos when they recognize that their desired choice is associated with an unacceptable level of risk and will discontinue this search when an acceptable rdo is found (bär & huber, 2008). this search hinges on their experience with a task as well as their knowledge of risks and rdo availability. when a scenario is unfamiliar and includes explicit cues about the detection of a negative event (e.g., a test that detects the negative side effects of a medication) or about rdo availability (e.g., access to an expert that may be aware of successful risk mitigation strategies), individuals are more likely to ask questions about these factors and use this information to make decisions. this search is less likely to occur when information cues are absent (huber & huber, 2008; huber & huber, 2003) and when individuals have background corresponding author: lisa vangsness, kansas state university, department of psychological sciences, 492 bluemont hall, manhattan, ks 66506, usa, e-mail: lvangsness@ksu.edu 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 1 mailto:lvangsness@ksu.edu https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation knowledge that precludes the need for additional information (huber & macho, 2001). environmental pressures also appear to play a role in the search for risk mitigation strategies. under time pressure, questions about rdos become more strategic and focused on rdo availability (huber & kunz, 2007), while requiring people to justify their choices discourages them from taking risks even when rdos are available (huber, bär, & huber, 2009). to summarize, individuals use environmental cues to determine when and how to search for risk mitigation strategies in the context of vignette-based tasks. while the factors influencing rdo search are wellstudied in the context of vignette-based tasks, less is known about how rdos are used during situations that are continuously unfolding. although participants indicate an interest in using preventive strategies when negative outcomes are difficult to detect and severe losses are expected (e.g., a symptomless virus; huber & huber, 2003), these preferences may change when decisions are encountered repeatedly within a single context (e.g., camilleri & newell, 2013; hau, pleskac, kiefer, & hertwig, 2008; hau, pleskac, & hertwig, 2010). repeated decisions allow participants to receive feedback in the form of experienced or avoided losses, which can be used to inform future risk mitigation decisions. thus, an rdo’s role in a dynamic task can change depending on the event to which the decisionmaker anchors their choice. for example, insurance is often considered a compensatory, pre-event rdo because it is purchased in advance but only pays out if a negative outcome occurs (e.g., huber, 2012); yet some people wait to purchase insurance until after they have sustained a loss (zaleskiewwicz, piskorz, & borkowska, 2002), making insurance a post-event compensatory rdo. more research is needed to determine how individuals’ willingness to seek and employ rdos is affected by the repeated choices made within dynamic environments. learning from experience in repeated-choice tasks initially, repeated-choice tasks involve uncertainty (knight, 1921) in that their risks are not yet known: new homeowners have not yet encountered a flood nor has a novice physician seen the effects of a deadly virus. as individuals gain experience in a novel environment, they sample context-specific information which can be used to estimate the probability of risk (hertwig & erev, 2009) and calibrate subsequent judgements (cf., brunswick, 1952). that is, people track experienced losses and become aware of task-related cues that signal increased risks. these cues can then be used to inform subsequent risk mitigation decisions. one class of cues that signal risk encompasses those related to task difficulty. on a broad level, difficulty correlates with risk. the more challenging the task, the more likely a person will be to make errors and experience losses. the easier the task, the more likely a person is to succeed. this relationship can be used to inform decision-making: individuals may use experienced losses to help identify and calibrate their use of cues to difficulty (e.g., visual complexity present on a radar screen), which can then be used to infer their level of risk. taken together, experienced losses and cues to difficulty should affect the probability with which an individual will elect to use either type of rdo. if a task is perceived to be risky, an individual may become more likely to use an rdo, especially when the magnitude of the loss associated with that risk is high (cf., huber & huber, 2003). ideally, preventive strategies will be employed when risks and losses are large, such as during a challenging task (huber, 2012); however, upfront costs (e.g., time, money, effort) may dissuade people from adopting preventive strategies (sigurdsson, taylor, & wirth, 2013). this is particularly true of demanding tasks that require many resources to complete. for instance, the chex decision-aid, a tool intended to help air traffic controllers and tactical coordinators improve their situation awareness, does not improve performance partly because it distracts people from their primary task of monitoring the airspace. in this case, participants rarely used the preventive rdo because it occupied resources that could be used to complete the task itself (vallières, hodgetts, vachon, & tremblay, 2016). thus, compensatory strategies may be preferred during challenging tasks because they do not require any effort unless they must be used. converging evidence from cognitive, comparative, and motivational literature supports the notion that people weigh the trade-offs between effort and reward when making decisions (for reviews see mitchell, 2017; walton, kennerley, bannerman, phillips, & rushworth, 2006; and locke & latham, 2002, respectively). if effort is not commensurately rewarded, people will minimize resource allocation by abandoning tasks in favor of easier or more rewarding endeavors. in this way, they strive to maximize the utility of their limited resources (kurzban, duckworth, kable, & myers, 2013) and may use the effortreward trade-off to inform risk mitigation decisions. the role of expertise. an individual’s ability to judge task difficulty and estimate risk may be mediated by taskspecific knowledge that is acquired through practice. a single task can be made difficult in many ways (e.g., the enemies in a videogame can move more quickly or more slowly; alternatively, these same enemies could take more or fewer shots to destroy). through practice, people become sensitive to the risk-reward relationships present in their environments (pleskac & hertwig, 2014) and will actively exploit them by selecting strategies that maximize their successes and rewards (lovett & anderson, 1996). this sensitivity is particularly pronounced when cues to difficulty are easily discriminable and frequently encountered (gaeth & shanteau, 1984; shanteau, 1992). when difficulty is determined by multiple task dimensions or when decision makers receive limited feedback, expertise may negatively impact decision making. under these conditions, experts are more likely to attend to irrelevant cues and may be poorly calibrated in their estimates of difficulty and risk (for a review, see koehler, brenner, & griffin, 2002). studying risk mitigation decisions within a controlled dynamic task will allow us to determine whether people use cues to difficulty to evaluate risks and whether these relationships can be learned over time. judgments of difficulty as a measure of resource demands because difficulty, risk, and loss are closely intertwined, judgments of difficulty (jods) should reflect peoples’ awareness of changing cues to difficulty and inform the strategies they pursue as they engage in a challenging task (kahneman 1973; kanfer & ackerman, 1989; for a review see kurzban 2016), jods should also predict the risk mitigation strategies that will allow people to achieve their 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 2 https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation goals (kurzban et al., 2013). to be successful, people must evaluate task difficulty frequently enough to detect changes in the environment that may affect their ability to effectively allocate resources toward their goals (brunswik, 1956). once a task is underway, jods can be used to reallocate resources in response to changing task demands (flavell, 1979). historically, researchers interested in metacognitive evaluations of difficulty have used knowledge assessments (e.g., multiple-choice questionnaires; comprehension) to determine the degree to which people accurately estimate the disparity between their abilities and those that the situation requires (e.g., ozuru, kurby, & mcnamara, 2012). however, jods made in static environments differ significantly from those that must be made repeatedly as situations rapidly unfold over time. recent research involving dynamic tasks suggests that people integrate multiple cues to difficulty when making jods, and that the weighting of these cues can change over time (desender, van opstal, & van den bussche, 2017; koriat, 1997). peoples’ ability to identify, integrate, and update cues is an integral part of selecting appropriate problem-solving strategies (lovett & schunn, 1999); thus, jods may be related to rdo selection in dynamic tasks. an illustration of this purported relationship can be found in figure 1. figure 1. while it is likely that cues to difficulty, level of risk, and the magnitude of losses directly influence risk mitigation decisions, it is also possible that these factors are captured by individuals’ judgments of difficulty (jods). evidence from two experiments revealed that jods are unaffected by the magnitude of losses incurred by an individual, and that jods do not impact risk mitigation decisions. studying rdos in a dynamic environment we wished to understand the influence that task difficulty and jods have on risk mitigation strategies during a dynamic task. our dynamic task was a third-person shooter videogame designed using the unity game engine (unity, 2016). interested parties can find a video and brief description of this task at http://youtu.be/q6ahswfayyy. previous literature suggested competing hypotheses regarding the relationship between task difficulty and risk mitigation. if experienced losses and anticipated risks underlie rdo selection, preventive measures (pre-event rdos) should be selected more frequently when performance is expected to worsen. in the context of this task, pre-event rdos might be selected more often when a player’s in-game losses are greater and more frequent. however, it is also possible that current resource availability dominates risk mitigation decisions. if this holds true, compensatory strategies (post-event rdos) should be favored during difficult tasks that negatively impact performance. that is, players should allocate greater attention and working memory to improve their performance during a difficult level of the videogame rather than invest these resources in a pre-event rdo. these perspectives can be summarized in the following way: h1a (risk estimation): participants will be more likely to select pre-event rdos as their task performance becomes impaired. h1b (resource minimization): participants will be less likely to select pre-event rdos as their task performance becomes impaired. if risk estimation underlies rdo selection, then people should be cognizant of the relationship between difficulty and risk and can track this relationship through repeated decisions. support for this hypothesis would suggest that risk mitigation in dynamic tasks mirrors that of vignette-based tasks and can be conceptualized as a form of problem-solving. if resource minimization underlies rdo selection, then individuals should be less capable of minimizing the risks they encounter due to task difficulty. this finding would provide a simple explanation for peoples’ tendency to violate workplace safety precautions when tasks are difficult (e.g., sigurdsson, taylor, & wirth, 2013), even though such behavior is suboptimal. however, such behavior could also be explained by h1a if risk estimation is driven by experienced losses and cannot be anticipated; an exploratory model comparison will address this issue if h1a is supported. because the cues to difficulty within our environment were relatively straightforward and varied along a single dimension, we anticipated that previous videogame experience would change the way in which these strategies were adopted. namely, if expertise enhances participants’ ability to identify and use cues to difficulty and informs the selection of rdos, participants who report significant experience with videogame tasks should display early risk mitigation preferences and be less likely to sample alternative strategies at the onset of the task (h2). in other words, participants with previous videogame experience should have adopted the rdo strategies outlined by h1a and h1b more quickly. we also believed that risk mitigation behavior would stabilize as time-on-task increased, regardless of participants’ previous experience with videogame tasks (h3). if these hypotheses are supported, it would suggest that expertise improves the calibration of risk mitigation activities. experiment 1 participants seventy-nine participants (43 female) from the general psychology pool at kansas state university completed the experimental task and received 1 hr of research credit compensation to fulfill a course requirement. one participant experienced a computer malfunction and needed to restart the videogame. the remaining data from this participant is included in the analyses. 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 3 http://youtu.be/q6ahswfayyy https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation design and procedure participants completed a 40-min session of a thirdperson shooter videogame in which they controlled the avatar of a young boy who had shrunk to miniature size and was pursued by stuffed-animal zombies inside his bedroom. during each level, stuffed-animal zombies appeared at semi-random locations and pursued the boy throughout the room. participants guided their avatar across the bedroom floor using the arrow/aswd keys and eliminated enemies with a laser cap gun that was controlled by moving and clicking with the computer mouse. the goal of the videogame was to successfully pass through as many levels as possible before the session was finished. experimenters encouraged participants to pursue this goal by stating that “most participants clear eight levels before the session ends.” successful completion of this goal required participants to prioritize their performance in the game because death was a time-costly event that occurred once the avatar’s health was fully depleted by enemy attacks, which occurred each time a stuffed-animal zombie touched the avatar. each enemy attack depleted 20 hit points of the avatar’s 100 hit points of health. when the avatar’s hit points dropped below 0, the avatar died and the game was paused for 30 s while a loading screen appeared. the purpose of this waiting period was to serve as an aversive consequence that discouraged players from using death as a gameplay strategy to avoid enemy characters. following this delay, the avatar was restored to full health and placed at a random location within the game space. participants advanced to a new level by eliminating enemies. each enemy elimination earned the player 1 point. once participants eliminated 30 enemies from the game space (raised their score from 0 to 30 points), their score reset and the game advanced to a new level with an identical layout that could be easier or harder than the last (how this was accomplished is detailed later): unlike a traditional video game, the degree of difficulty was randomly assigned at the beginning of each level. participants’ ability to track changes in task difficulty was assessed using a pop-up window that appeared at the beginning of each level and every 2 minutes during the game. this pop-up window contained two buttons that allowed participants to indicate whether the videogame was “easier” or “harder” than it was before. this format allowed participants to make comparative assessments without interpreting scale anchors and without making assumptions about the scaling of jods (for additional information, see böckenholt, 2004). once participants selected an option with the computer mouse, the pop-up window disappeared from the screen. gameplay remained paused for 3 s before and after the pop-up window appeared to reduce the performance costs associated with task interruption (altmann & trafton, 2007). after 40 mins of gameplay, the videogame ended and participants completed a demographic questionnaire that included questions about sex and videogame experience. participants also completed a modified version of the game engagement questionnaire (brockmyer et al., 2009). rdo selection. in an effort to ensure that all participants anchored their risk mitigation actions to the same event, rdos were made available at the beginning of each level and every subsequent 5 mins. at these times, a pop-up window invited participants to “select a tool” that they could use to improve their performance during the game. participants could select one of two tools, a shield (a pre-event rdo) or a health pack (a post-event rdo). either tool could be used to mitigate 20 hit points of damage from an enemy character by preventing an enemy attack (shield) or restoring the avatar’s health (health pack). additionally, these tools differed in how difficult they were to use. while post-event rdos could be used at any point following an enemy attack, pre-event rdos needed to be timed to the enemy attack because they only shielded the avatar for up to 5 s and needed to be redeployed once an enemy character touched the shield. selecting a tool placed five of these items into the avatar’s inventory, which was indicated by a set of icons in the lower left corner of the screen. although participants received an opportunity to restore their inventory every 5 mins, they could neither stockpile items nor could they hold items of more than one type. thus, participants needed to use their experiences in the game to develop a risk mitigation strategy that considered the strengths and weaknesses of both the tools and themselves. participants could use these tools at their discretion by pressing the f key on the keyboard. each time participants used a tool, they received notification by visual and auditory cues: a 250-ms sound and a 3-d bubble accompanied each rdo use. both the sound and the bubble were specific to the tool and could be used to differentiate tool choice. after a tool restored hit points or deflected an enemy attack, one of the five icons disappeared from the bottom of the screen. for a screenshot of the videogame task, see figure 2. task difficulty and risk. task difficulty was manipulated as a between-subjects variable (difficulty type) by adjusting one characteristic of the enemy characters’ behavior at the start of each level. this characteristic was automatically adjusted within-subjects by a programmed algorithm that randomly selected a value from a uniform distribution that represented a wide range of difficulty, as determined through participants’ performance during pilot testing (vangsness, 2017). this randomly selected value was held throughout the level, while all other characteristics of the enemy characters’ behavior remained constant during the session. for example, participants assigned to the “speed” condition saw the enemy characters’ rate of movement change between levels but did not experience changes in the enemy characters’ hit points or population rate. similarly, participants assigned 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 4 https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation figure 2. a screen shot from the videogame task depicts the player’s avatar surrounded by three enemy characters. the player’s health and remaining shields are depicted in the lower left corner. to the “population” condition experienced changes in how quickly enemy characters appeared in the level, but did not see changes in the enemy characters’ speed or hit points. a brief description of the characteristics and their sampling values can be found in table 1. previous analyses of gameplay data showed that difficulty was inversely related to gameplay performance (vangsness, 2017). that is, participants were attacked more frequently and experienced greater losses when the manipulated difficulty parameter took values near the upper limit of the range. conversely, participants experienced fewer losses when this parameter took on smaller values. these analyses suggest that risk is higher during more difficult levels and is lower during easier levels. while it is theoretically possible to estimate the moment-by-moment risks incurred by each participant, this estimation would require knowledge of many factors (e.g., skill of the individual player; location, velocity, and enemies’ expected time of arrival; etc.) that fluctuate considerably during the task. as we were interested in broad, robust patterns of behavior that transcend a single, specific context, we defined risk as it varied with task difficulty. tutorial level. the videogame included a tutorial level to familiarize participants with the layout and controls of the game. the tutorial level was identical to the videogame task in all respects but only contained three enemy characters which participants were required to eliminate before progressing to the first level of the game. because the tutorial level differed significantly from the remainder of the videogame, data from this portion is excluded from subsequent analyses. results risk mitigation strategy. we explored the factors underlying participants’ risk mitigation strategies with a multilevel logistic regression model that predicted the probability that a participant would select either tool (health pack, shield) using participants’ game performance, time-on-task, previous videogame experience, and difficulty type (population, speed, strength) in the fixed effect structure. game performance was defined as the rate of damage from enemy characters that had elapsed since the most recent rdo selection ("total damage since last rdo choice ÷ time since last rdo choice" ), videogame experience as the summed responses to relevant items from the demographic questionnaire, and time-on-task as the amount of time that had elapsed since the beginning of the first level of the videogame task. the random effect structure was selected using aic comparisons (akaike, 1973), which supported a structure that included the intercept, game performance slope, and time-on-task slope. this specification allowed the model to account for participant differences in overall ability, perceptions of difficulty, and rate of learning. a full disclosure of random effect comparisons can be found in the appendix. the findings from this analysis are illustrated by figure 3. the slope in each time-slice panel illustrates that risk mitigation strategy was significantly affected by participants’ damage rate since last rdo selection, and that this relationship changed over time. early in the game, participants had little preference for either rdo but as time-on-task increased they learned to use preventive rdo strategies to compensate for heavy losses. when participants performed well during the 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 5 https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation table 1. both experiments included a between-subjects manipulation in which participants experienced different difficulty types. condition description randomly selected values constant values population rate (n = 26) the rate at which enemies appeared in the game space 1 – 25 s 10s speed (n = 23) the speed at which enemy characters could travel. 0.2 – 15.0 unity units 5.0 unity units strength (n = 30) the number of hit points enemies had when they first appeared in a level. 20 – 400 hit points 115 hit points note. unity units are an arbitrary measure that can be used to scale game objects with respect to one another. later stages of the game they became increasingly likely to select post-event rdos. this pattern of behavior aligns with our hypothesis that risk estimation underlies rdo selection (h1a). specifically, participant selected risk mitigation strategies that would prevent losses when they were likely to occur rather than choosing to conserve resources for task completion by selecting the less-effortful post-event rdo. this relationship became more pronounced over time, suggesting that risk mitigation strategies stabilize as individuals become more familiar with available rdos (h3). the other main effects included in the model were nonsignificant (p’s > .05), suggesting that there is not a strong relationship between previous videogame experience and rdo selection (h2). all estimates and significance values are disclosed in table 2. table 2. model estimates from experiment 1 reveal that game performance and time-on-task significantly predict participants’ risk mitigation strategy during gameplay. predictor b se z p intercept 0.77 0.28 2.74 .01 game performance -0.34 0.16 -2.11 .04 time-on-task 0.82 0.29 2.82 .005 previous videogame experience 0.03 0.02 1.32 .19 population 0.22 0.22 1.01 .31 speed -0.09 0.23 -0.39 .69 performance x time-on-task -0.40 0.19 -2.13 .03 note. performance (m = 1.32, sd = 1.53) was centered around 1.17, a value halfway between the means of experiments 1 and 2. previous videogame experience (m = 8.93, sd = 7.40) was centered around its mean, and time-on-task (m = 1088.25, sd = 723.64) was centered around its mean and scaled by dividing by 1,000 prior to analysis. experimental condition was effect coded, with strength serving as the -1, -1 baseline. to evaluate participants’ ability to use jods as a measure of resource demands, we used aic values to compare the existing model with one that included participants’ perceptions of task difficulty as a main effect. adding this predictor did not significantly improve the predictions of our earlier model (∆aic = -2.87). we interpreted this finding to have one of two meanings: either participants’ jods were highly correlated with damage rate, suggesting that participants used the magnitude of their losses as a cue to game difficulty, or participants did not incorporate their jods in risk mitigation decisions. exploratory analysis. to address the multiple interpretations of our model comparison, we conducted an exploratory analysis to determine whether damage rate was responsible for participants’ jods, or if perceptions of difficulty were based on additional unmeasured factors. this was accomplished by comparing two multilevel logistic regression models that included either a measure of participants’ game performance (damage rate since last jod question) or of objective game difficulty (task difficulty parameter standardized across experimental condition). both models included time-on-task, previous videogame experience, and experimental condition (population, speed, strength) in the fixed effect structure. aic comparisons supported a random effect structure that included intercept, standardized difficulty slope, and time-on-task slope to account for participant differences in ability, experiences of difficulty, and rate of learning. a full disclosure of random effect comparisons can be found in the appendix. table 3. model estimates from an exploratory analysis reveal that an objective measure of difficulty and time-on-task predict participants’ jods in experiment 1. predictor b se z p intercept 0.43 0.15 2.76 .01 standardized difficulty 3.50 0.37 9.41 <.001 time-on-task -0.47 0.14 -3.33 <.001 previous videogame experience 0.01 0.02 0.48 .63 population 0.001 0.19 0.01 .99 speed -0.12 0.21 -0.55 .58 note. standardized difficulty (m = 0.51, sd = 0.30) and previous videogame experience (m = 9.16, sd = 7.22) were centered around their means. time-on-task (m = 1214.01, sd = 713.03) was centered around its mean and scaled by dividing by 1,000 prior to analysis. experimental condition was effect coded, with strength serving as the -1, -1 baseline. model comparisons using aic strongly supported a model that included objective game difficulty as a fixed effect (∆aic = 82.95). the findings from this model (see figure 4) suggest that cues to difficulty unrelated to the magnitude of losses (e.g., the number of enemies 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 6 https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation figure 3. during experiment 1, participants’ risk mitigation strategies were not initially sensitive to changes in task difficulty. as the experimental session continued, participants began to compensate for changes in task difficulty by selecting preventive tools (i.e., the shield) when they experienced greater losses and compensatory tools (i.e., the health pack) when they experienced fewer losses. error ribbons represent one standard error above and below the model estimates. visible on the screen; how quickly enemy characters move) underlie participants’ jods. despite this, the positive slope in each time-slice reveals that participants’ jods were well-calibrated to the difficulty level of the game. participants were more likely to indicate the game was “harder than before” when they were playing levels that were objectively harder, and were more likely to indicate the game was “easier than before” when playing levels that were objectively easier. we also found that participants’ jods were influenced by time-on-task such that they became less likely to say that the game was “harder than before” later in the game; however the size of this effect was small. the other predictors included in the model were not significant (p’s > .05). all estimates and significance values are disclosed in table 3. discussion participants’ risk mitigation strategies were affected by the interaction between their experienced losses and time-on-task. initially, participants’ risk mitigation strategies were unaffected by experienced losses, but over time, pre-event rdos (i.e., the shield) were preferred following heavy losses. these results suggest that people respond to environmental changes by adopting risk mitigation strategies that reflect experienced losses (here, damage rate since last rdo question) and that these strategies change as people gain experience with a task. this behavior lends support to the hypothesis (h1a) that risk estimation drives the selection of risk mitigation strategies because participants actively compensated for their losses with a most costly pre-event rdo rather than allocating all their resources toward task completion. participants’ behavior was unaffected by their level of videogame experience (h2), but did stabilize over time lending support to hypothesis h3. our results also demonstrated that people actively and accurately monitor the environment for cues that reflect changes in task difficulty, but that these cues are not determined by the magnitude of participants’ losses and may instead focus on cues to difficulty within the videogame itself (e.g., the number of on-screen enemies). because participants’ risk mitigation strategies were predicted by experienced losses while jods were predicted by cues to difficulty, we believe that the shifts in risk mitigation strategy are caused by individuals’ awareness of experienced losses, and that the cues used to select a risk mitigation strategy differ from those used to make jods. this would seem to suggest that individuals’ risk mitigation strategies do not anticipate risks but respond to them after they have occurred. although our results support the risk mitigation hypothesis (h1a), they do not completely discount the resource optimization account of human behavior (e.g., kurzban et al., 2013; vallières, hodgetts, vachon, & tremblay, 2016). while losses led participants to select resource-intensive pre-event rdos, they did shift toward selecting post-event rdos when losses were infrequent. perhaps participants recognized that preventing losses, while strategic, came with inherent costs and therefore effectively navigated the trade-off between effort and reward. we reasoned that if participants engaged in trading off effort and reward, they would shift toward preventive risk mitigation strategies when this tool was made easier to use (h4a). however, if resource optimization did not underlie participants’ behavior, tool selection would not be influenced by the pre-event rdo’s ease-of-use (h4b). we tested these competing hypotheses in experiment 2 by manipulating the coordination required to effectively use the shield tool and measured the impact this had on tool selection throughout the videogame task. 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 7 https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation figure 4. participants’ judgments of difficulty (jods) were well-calibrated to the difficulty level of the videogame (parameter values standardized across difficulty types). jods were also consistent across both experiments. error ribbons represent one standard error above and below the model’s estimates. experiment 2 participants eighty-eight participants (41 female) from the general psychology pool at kansas state university completed the experimental task and received 1 hr of research credit compensation to fulfill a course requirement. design and procedure participants completed a 40-min session of the videogame task in which we manipulated the difficulty of the shield’s use as a between-subjects condition variable (rdo type), but held the reward for using this tool (avoiding an enemy attack) constant. in the steady condition, pre-event rdos were less costly: participants that selected the shield needed only to deploy it a single time. once active, the shield protected the participants’ avatar from five enemy attacks. in the sporadic condition, the shield was more costly because behaved as it did in experiment 1. that is, it remained active for five seconds and participants needed to deploy it multiple times to remain protected from enemy attacks. furthermore, the timed activation window required participants to coordinate the shield’s deployment with an anticipated attack. because the between-subject difficulty manipulation (population, speed, strength) was not a significant predictor in the experiment 1 analyses, we included only two levels of the difficulty manipulation (difficulty type: strength, speed), in experiment 2. we counterbalanced the four possible combinations of difficulty type and rdo type across experimental sessions. in all other respects, the videogame task was identical to that used in experiment 1. results we again used multilevel logistic regression to predict the probability that a participant would select either tool (health pack, shield) using participants’ game performance, time-on-task, previous videogame experience, difficulty manipulation (strength, speed), and rdo type (sporadic, steady). game performance, videogame experience, and time-on-task were included in the fixed effect structure and operationalized using the measures outlined in experiment 1. aic comparisons supported a random effect structure that included the intercept, game performance, and time-ontask which allowed the model to account for participant differences in overall ability, perceptions of difficulty, and rate of learning. because we were interested in replicating the effects found in experiment 1, we included the three-way interaction between game performance, time-on-task, and rdo type. the results of our analysis are depicted in figure 5. the stark difference in risk mitigation patterns between the sporadic and steady rdo type is clear; only rdo type and its two-way interaction with time 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 8 https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation affected participants’ risk mitigation strategies during the game (see table 4). this effect intensified as timeon-task increased and became most apparent in the final time-slice panel. including participants’ perceptions of task difficulty as a main effect again did not significantly improve our model’s predictions (∆aic = -2.97), complementing our results from experiment 1. the results of the twoand three-way interactions involving game performance, time, and rdo type also align with our previous analysis. although these effects did not reach significance, the model estimates for the “sporadic” rdo type fall within the 95% confidence intervals established in experiment 1. as this subset of the data represents only half of that included in our previous experiment, we expect that the increasing sensitivity to damage rate observed in experiment 1 would have replicated had we included more participants. table 4. model estimates from experiment 2 demonstrate that the ease-of-use manipulation overshadowed all other factors in predicting participants’ risk mitigation strategy. predictor b se z p game performance -0.03 0.18 -0.19 .85 time-on-task -0.43 0.36 -1.19 .23 previous videogame experience 0.02 0.03 0.71 .48 sporadic 1.49 0.43 3.46 <.001 speed -0.28 0.19 -1.53 .13 performance x time-ontask -0.13 0.18 -0.71 .48 performance x sporadic 0.001 0.18 0.01 .99 time x sporadic 1.15 0.38 3.07 .002 performance x time-ontask x sporadic -0.07 0.19 -0.39 .69 note. performance (m = 17.53, sd = 34.01) was centered around 1.17, a value halfway between the means of experiments 1 and 2. previous videogame experience (m = 6.30, sd = 9.76) was centered around its mean, and time-on-task (m = 1285.87, sd = 846.32) was centered around its mean and scaled by dividing by 1,000 prior to analysis. rdo type and difficulty type were effect coded, with steady and strength coded as -1. exploratory analyses. we again conducted an exploratory analysis to determine whether participants’ jods reflected changes in damage rate, or if a different factor was responsible for their perceptions of difficulty. we used aic values to compare two multilevel logistic regressions that included either game performance (damage rate since last jod question) or objective game difficulty (task difficulty parameter standardized across experiment condition). both models included time-on-task, previous videogame experience, difficulty type (speed, strength), and rdo type (steady, sporadic) in the fixed effect structure. aic comparisons supported a random effect structure that included intercept, standardized difficulty, and time-on-task slope to account for participant differences in ability, experiences of difficulty, and rate of learning. a full disclosure of random effect comparisons can be found in the appendix. model comparisons again supported the second model (∆aic = 171.99), replicating our finding that the participants did not use the magnitude of losses to make jods. as before, positive slopes across each time-slice (see figure 4) reveal that participants’ jods were well-calibrated to the objective difficulty of the game. time-on-task again affected participants’ jods: participants became less likely to say the game was “harder than before” as time progressed (see table 5). table 5. model estimates from an exploratory analysis reveal that an objective measure of difficulty and time-on-task predict participants’ jods in experiment 2. predictor b se z p intercept 0.09 0.14 0.61 .54 standardized difficulty 4.07 0.40 10.24 <.001 time-on-task -0.29 0.11 -2.69 .01 previous videogame experience -0.03 0.02 -1.30 .19 sporadic 0.10 0.13 0.73 .46 speed 0.13 0.15 0.89 .37 note. standardized difficulty (m = 0.71, sd = 0.31) and previous videogame experience (m = 6.39, sd = 5.69) were centered around their means. time-on-task (m = 1242.38, sd = 775.39) was centered around its mean and scaled by dividing by 1,000 prior to analysis. rdo type and difficulty type were effect coded, with steady and strength coded as -1. discussion the results of experiment 2 strongly confirm the hypothesis that people attempt to balance effort and reward during challenging tasks (h4a). indeed, when we manipulated the effort-reward trade-off and included the pre-event rdo’s ease-of-use as a predictor in our model it attenuated the effects of many other predictors, including game performance. this suggests that people prioritize the immediate conservation of resources only when it does not negatively impact their performance goals: unlike the participants in experiment 1, participants in experiment 2 were willing to use pre-event rdos exclusively because they were easier to use and no longer presented a resource cost. the findings from our exploratory analysis, which revealed that jods were affected by the difficulty manipulation but not by the ease-of-use manipulation, further illustrates that the factors used to select rdos are different from those used to make overall judgments of task difficulty. general discussion our study provides conclusive evidence that decisionmakers balance effort and reward to select appropriate risk mitigation strategies. in experiment 1, participants developed risk mitigation preferences as the 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 9 https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation figure 5. participants’ behavior in experiment 2 differed as a function of rdo type. although participants in the sporadic condition behaved similarly to those in experiment 1 (to which it is identical), participants in the steady condition developed a strong preference for the shield which was easier to use in this condition. error ribbons depict one standard error above and below model estimates. task progressed. later in the session, participants selected more resource-intensive pre-event rdos when losses were likely and preferred easier-to-use post-event rdos when losses occurred less frequently. this preference shifted in experiment 2 among participants for whom pre-event rdos were made easier to use. in both experiments, behavior stabilized over time as participants gained familiarity with each tool. together, this evidence suggests that while experienced losses influence the risk mitigation strategy an individual pursues, preferences can also be affected by how difficult an rdo is to use. although people recognize and respond to elevated risks and severe consequences by adopting pre-event rdos (c.f., huber, 2012; huber & huber, 2003), they are sensitive to the effort-reward trade-off presented by the rdo’s ease-of-use (c.f., sigurdsson, taylor, & wirth, 2013). while jods do not contribute to peoples’ risk mitigation strategies, people are affected by how easy rdos are to use. harder-to-use pre-event rdos, which require an upfront investment of effort to employ, were only favored when they are necessary to reduce experienced losses. when pre-event rdos were made easier to use, people relied upon them more often regardless of their experienced losses. this finding supports the theoretical opinion of kurzban et al. (2013), in that participants will avoid unnecessary risk mitigation strategies if they are difficult to use. this finding is particularly relevant to situations that involve infrequent but costly risks during which preventive actions may be undervalued with respect to the efforts they require, such as natural disaster preparedness (douglas, leigh, & david, 2005) and responding to variations in air traffic control workload (desmond & hoyes, 1996). the specificity of cues to difficulty and jods was further revealed in our analysis of participants’ jods. although objective measures of task difficulty predicted jods, damage rate (a measure of a participants’ experienced losses) did not produce a good model fit. this suggests that participants used other cues to produce jods (see the right side of figure 1), an assertion that is supported by the difference across rdo manipulations in experiment 2. thus, it is likely that the magnitude of losses was responsible for or mediated the relationship between level of risk and rdo selection but did not provide a cue to task difficulty overall; however, this relationship should be explored more directly before strong claims are made. unlike previous research, which showed that participants discontinued their search for rdos when they had previous experience in an area (huber & macho, 2001), we found that participants’ behavior was unaffected by domain-specific background knowledge (videogame experience). however, participants developed a systematic adoption of risk mitigation strategies over time, supporting previous research that successful strategies are pursued once they are learned (c.f., lovett & anderson, 1996). this result also supports huber and huber’s (2008) assertion that people use their expectations to determine the availability and efficacy of rdos, as evidenced by the shifts in behavior that occurred over time and resulted in stabilization of risk mitigation strategy. although general aspects of risk mitigation behavior appear to be consistent, behavior in experiential tasks does differ from that of descriptive tasks in important ways. recent research has suggests that people can be trained to attend to certain task-related cues more strongly than others when making jods (desender et al., 2017). it may be possible to encourage indi10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 10 https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation viduals to use task-related cues to select risk mitigation strategies and to down-weight the influence of an rdo’s ease-of-use. similar means might be achieved by architecting an environment that emphasizes certain task cues above others. together, these lines of research will clarify the factors that influence risk mitigation decisions and help people mitigate risks strategically. acknowledgements: we would like to thank abigail basham, sierra davila, landon fossum, naomi mwebaza, and jacob sanderson for their assistance in running this study. portions of the work were presented at the november 2017 meeting of the psychonomic society. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. handling editor: andreas fischer author contributions: the authors contributed equally to this work. copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: vangsness, l., & young, m. e. (2017). the role of difficulty in dynamic risk mitigation decisions. journal of dynamic decision making, 3, 5. 10.11588/jddm.2017.1.41543 received: 29 september 2017 accepted: 7 december 2017 published: 15 december 2017 references akaike, h. (1973). maximum likelihood identification of gaussian autoregressive moving average models. biometrika, 60(2), 255– 265. doi: 10.2307/2334537 altmann, e. m., & trafton, j. g. (2007). timecourse of recovery from task interruption: data and a model. psychonomic bulletin & review, 14(6), 1079–1084. doi:10.3758/bf03193094 bär, a. s., & huber, o. (2008). successful or unsuccessful search for risk defusing operators: effects on decision behaviour. european journal of cognitive psychology, 20(4), 807–827. doi:10.1080/09541440701686227 böckenholt, u. (2004). comparative judgments as an alternative to ratings: identifying the scale origin. psychological methods, 9(4), 453–465. doi:10.1037/1082-989x.9.4.453 brockmyer, j. h., fox, c. m., curtiss, k. a., mcbroom, e., burkhart, k. m., & pidruzny, j. n. (2009). the development of the game engagement questionnaire: a measure of engagement in video game-playing. journal of experimental social psychology, 45(4), 624–634. doi:10.1016/j.jesp.2009.02.016 brunswik, e. (1956). perception and the representative design of psychological experiments. berkeley, ca: university of california press. camilleri, a. r., & newell, b. r. (2013). the long and short of it: closing the description-experience "gap" by taking the long-run view. cognition, 126(1), 54–71. doi:10.1016/j.cognition.2012.09.001 delta dental (2014). 2014 oral health and well-being survey. retrieved from https://www.deltadental.com/ ddpaoralhealthwellbeingsurveybrochure2014.pdf. desender, k., van opstal, f., & van den bussche, e. (2017). subjective experience of difficulty depends on multiple cues. scientific reports, 7, 1–14. doi:10.1038/srep44222 desmond, p. a., & hoyes, t. w. (1996). workload variation, intrinsic risk and utility in a simulated air traffic control task: evidence for compensatory effects. safety science, 22(1–3), 87– 101. doi:10.1016/0925-7535(96)00008-2 douglas, p., leight, s., & david, j. (2005). when good intentions turn bad: promoting natural hazard preparedness. australian journal of emergency management, 20(1), 25–30. flavell, j. h. (1979). metacognition and cognitive monitoring: a new area of cognitive–developmental inquiry. american psychologist, 34(10), 906–911. doi:10.1037/0003-066x.34.10.906 geath, g. j., & shanteau, j. (1984). reducing the influence of irrelevant information on experienced decision makers. organizational behavior & human performance, 33(2), 263–282. doi:10.1016/0030-5073(84)90024-2 hau, r., pleskac, t. j., & hertwig, r. (2010). decisions from experience and statistical probabilities: why they trigger different choices than a priori probabilities. journal of behavioral decision making, 23(1), 48–68. doi:10.1002/bdm.665 hau, r., pleskac, t. j., kiefer, j., & hertwig, r. (2008). the description-experience gap in risky choice: the role of sample size and experienced probabilities. journal of behavioral decision making, 21(5), 493–518. doi:10.1002/bdm.598 hertwig, r., & erev, i., (2009). the description-experience gap in risky choice. trends in cognitive science, 13(12), 517–523. doi:10.1016/j.tics.2009.09.004 huber, o. (2012). risky decisions: active risk management. current directions in psychological science, 21(1), 26–30. doi:10.1177/0963721411422055 huber, o., & huber, o. w. (2003). detectability of the negative event: effect on the acceptance of preor postevent risk-defusing actions. acta psychologica, 113(1), 1–21. doi:10.1016/s0001-6918(02)00148-8 huber, o., & huber, o. w. (2008). gambles vs. quasirealistic scenarios: expectations to find probability and riskdefusing information. acta psychologica, 127(2), 222–236. doi:10.1016/j.actpsy.2007.05.002 huber, o., & kunz, u. (2007). time pressure in risky decisionmaking: effect on risk defusing. psychology science, 49(4), 415–426. huber, o., & macho, s. (2001). probabilistic set-up and the search for probability information in quasi-naturalistic decision tasks. risk decision and policy, 6(1), 1–16. doi:10.1017/s1357530901000230 huber, o., bär, a. s., & huber, o. w. (2009). justification pressure in risky decision making: search for risk defusing operators. acta psychologica, 130(1), 17–24. doi:10.1016/j.actpsy.2008.09.009 huber, o., beutter, c., montoya, j., & huber, o. w. (2001). riskdefusing behaviour: towards an understanding of risky decision making. european journal of cognitive psychology, 13(3), 409– 426. doi:10.1080/09541440125915 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 11 http://dx.doi.org/10.11588/jddm.2017.1.41543 https://doi.org/10.2307/2334537 https://doi.org/10.3758/bf03193094 https://doi.org/10.1080/09541440701686227 https://doi.org/10.1037/1082-989x.9.4.453 https://doi.org/10.1016/j.jesp.2009.02.016 https://doi.org/10.1016/j.cognition.2012.09.001 https://www.deltadental.com/ddpaoralhealthwellbeingsurveybrochure2014.pdf https://www.deltadental.com/ddpaoralhealthwellbeingsurveybrochure2014.pdf https://doi.org/10.1038/srep44222 https://doi.org/10.1016/0925-7535(96)00008-2 https://doi.org/10.1037/0003-066x.34.10.906 https://doi.org/10.1016/0030-5073(84)90024-2 https://doi.org/10.1002/bdm.665 10.1002/bdm.598 https://doi.org/10.1016/j.tics.2009.09.004 https://doi.org/10.1177/0963721411422055 https://doi.org/10.1016/s0001-6918(02)00148-8 https://doi.org/10.1016/j.actpsy.2007.05.002 https://doi.org/10.1017/s1357530901000230 https://doi.org/10.1016/j.actpsy.2008.09.009 https://doi.org/10.1080/09541440125915 https://doi.org/10.11588/jddm.2017.1.41543 vangsness & young: difficulty and risk mitigation kahneman, d. (1973). attention and effort. englewood cliffs, nj: prentice-hall kanfer, r., & ackerman, p. l. (1989). motivation and cognitive abilities: an integrative/aptitude-treatment interaction approach to skill acquisition. journal of applied psychology, 74(4), 657– 690. doi: 10.1037//0021-9010.74.4.657 knight, f. h. (1921). risk, uncertainty, and profit. boston: houghton mifflin. koehler, d. j., brenner, l., & griffin, d. (2002). the calibration of expert judgment: heuristics and biases beyond the laboratory. in t. gilovich, d. griffin, & d. kahneman (eds.), heuristics and biases: the psychology of intuitive judgment (pp. 686–715). cambridge, uk: cambridge university press. koriat, a. (1997). monitoring one’s own knowledge during study: a cue-utilization approach to judgments of learning. journal of experimental psychology: general, 126(4), 349–370. doi:10.1037//0096-3445.126.4.349 kurzban, r. (2016). the sense of effort. current opinion in psychology, 7, 67–70. doi:10.1016/j.copsyc.2015.08.003 kurzban, r., duckworth, a., kable, j. w., & myers, j. (2013). an opportunity cost model of subjective effort and task performance. behavioral and brain sciences, 36(6), 661–79. doi:10.1017/s0140525x12003196 locke, e. a., & latham, g. p. (2002). building a practically useful theory of goal setting and task motivation. american psychologist, 57(9), 705–717. doi:10.1037//0003-066x.57.9.705 lorist, m. m., boksem, m. a., & ridderinkhof, k. r. (2005). impaired cognitive control and reduced cingulate activity during mental fatigue. cognitive brain research, 24(2), 199–205. doi:10.1016/j.cogbrainres.2005.01.018 lovett, m. c., & anderson, j. r. (1996). history of success and current context in problem solving: combined influences on operator selection. cognitive psychology, 31(2), 168–217. doi:10.1006/cogp.1996.0016 lovett, m. c., & schunn, c. d. (1999). task representations, strategy variability, and base-rate neglect. journal of experimental psychology: general, 128(2), 107–130. doi:10.1037/00963445.128.2.107 mitchell, s. (2017). devaluation of outcomes due to their cost: extending discounting models beyond delay. in j. r. stevens (ed.), nebraska symposium on motivation: impulsivity (vol. 64, pp. 145–161). basel, switzerland: springer international publishing. ozuru, y., kurby, c. a., & mcnamara, d. s. (2012). the effect of metacomprehension judgment task on comprehension monitoring and metacognitive accuracy. metacognition and learning, 7(2), 113–131. doi:10.1007/s11409-012-9087-y pleskac, t. j., & hertwig, r. (2014). ecologically rational choice and the structure of the environment. journal of experimental psychology general, 143(5), 2000–2019. doi:10.1037/xge0000013 shanteau, j. (1992). competence in experts: the role of task characteristics. organizational behavior and human decision processes, 53(2), 252–266. doi:10.1016/0749-5978(92)90064-e sigurdsson, s. o., taylor, m., a., & wirth, o. (2013). discounting the value of safety: effects of perceived risk and effort. journal of safety research, 46, 127–134. doi:10.1016/j.jsr.2013.04.006 unity game engine (2016). [computer software]. (version 5.4). san francisco, ca: unity. vallières, b. r., hodgetts, h. m., vachon, f., & tremblay, s. (2016). supporting dynamic change detection: using the right tool for the task. cognitive research: principles and implications, 1(1), 32–52. doi:10.1186/s41235-016-0033-4 vangsness, l. (2017). perceptions of effort and risk assessment. (unpublished master’s thesis). kansas state university, manhattan, ks. walton, m. e., kennerley, s. w., bannerman, d. m., phillips, p. e., & rushworth, m. f. (2006). weighing up the benefits of work: behavioral and neural analyses of effortrelated decision making. neural networks, 19(8), 1302–1314. doi:10.1016/j.neunet.2006.03.005 zaleskiewicz, t., piskorz, z., & borkowska, a. (2002). fear or money? decisions on insuring oneself against flood. risk, decision, and policy, 7(3), 221–233. doi:10.1017/s1357530902000662 appendix aic comparisons suggested that the random effect structures for the models used to analyze experiment 1 data could include intercept and time-on-task or intercept, performance, and time-on-task. random effect structure aic experiment 1 – rdo selection intercept only 706.33 intercept and performance 706.09 intercept and time-on-task 668.57 intercept, performance, and time-on-task 674.18 experiment 1 – jods intercept only 1199.64 intercept and performance 1181.73 intercept and time-on-task 1199.13 intercept, performance, and time-on-task 1180.63 experiment 2 – rdo intercept only 877.10 intercept and performance 861.21 intercept and time-on-task 769.85 intercept, performance, and time-on-task 767.62 experiment 2 – jods intercept only 1767.43 intercept and performance 1755.55 intercept and time-on-task 1764.33 intercept, performance, and time-on-task 1753.60 10.11588/jddm.2017.1.41543 jddm | 2017 | volume 3 | article 5 | 12 https://doi.org/10.1037//0021-9010.74.4.657 https://doi.org/10.1037//0096-3445.126.4.349 https://doi.org/10.1016/j.copsyc.2015.08.003 https://doi.org/10.1017/s0140525x12003196 https://doi.org/10.1037//0003-066x.57.9.705 https://doi.org/10.1016/j.cogbrainres.2005.01.018 https://doi.org/10.1006/cogp.1996.0016 https://doi.org/10.1037/0096-3445.128.2.107 https://doi.org/10.1037/0096-3445.128.2.107 https://doi.org/10.1007/s11409-012-9087-y https://doi.org/10.1037/xge0000013 https://doi.org/10.1016/0749-5978(92)90064-e https://doi.org/10.1016/j.jsr.2013.04.006 https://doi.org/10.1186/s41235-016-0033-4 https://doi.org/10.1016/j.neunet.2006.03.005 https://doi.org/10.1017/s1357530902000662 https://doi.org/10.11588/jddm.2017.1.41543 original research the effects of general mental ability and memory on adaptive transfer in work settings barbara frank and anette kluge ruhr-university bochum, department of work, organizational and business psychology to handle complex technical operations, operators acquire skills in vocational training. most of these skills are not used immediately but at some point later; this is called temporal transfer. our previous research showed that cognitive abilities such as general mental ability (gma) and memory are good predictors of temporal transfer. in addition to temporal transfer, operators also have to solve non-routine and abnormal upcoming problems using their skill set; this type of transfer is called adaptive transfer. based on previous findings, it is assumed that gma and memory will affect adaptive transfer as well. thirty-three engineering students learned how to operate a complex technical system in normal operation with either a fixed or a contingent sequence. after two weeks, all participants had to adapt their learned skills to handle the adaptive transfer task, which was not initially trained. it was shown that high gma positively predicted adaptive transfer, but no effect of memory was found. this implies that gma is required to solve new complex tasks using a learned skill set. the findings are in line with studies that showed an effect of gma on temporal transfer. keywords: adaptive transfer, mental abilities, complex task, process control, complex skills operators, pilots or surgeons have to apply complextasks in their daily work. such complex tasks include controlling the system of refinery or chemical plants, controlling and flying an airplane, or performing surgeries. these complex tasks can be described by a number of sub-tasks, sub-sequences of sub-tasks, and integration of sub-tasks, and require the coordination and realization of predefined objectives, attention and simultaneous information processing (kluge, 2014). employees have to handle complex tasks in both routine and non-routine situations (kluge, 2014): in routine situations, complex tasks are performed regularly e.g. operators monitor, control, adjust the system and follow often used standard operating procedures (sops), or surgeons suture surgical wounds after performing a well-known appendectomy. non-routine situations, on the other hand, can be divided into situations which require temporal or adaptive transfer: 1) usually, non-routine situations occur in the medium or long term after the initial skill acquisition and require temporal transfer, meaning that such situations have to be handled after longer periods of non-use. 2) non-routine situations that require adaptive transfer are novel to the operator. adaptive transfer is needed if skill components have to be applied in dynamic, complex and unpredictable situations that have not been previously encountered (bolstad, cuevas, costello, & babbitt, 2008;ivancic & hesketh, 2000; kluge & burkolter, 2013; kluge, sauer, burkolter, & ritzmann, 2010). adaptive transfer in a novel situation requires operators to understand the upcoming event that has never occurred before, quickly generate an appropriate reaction to ensure system safety on the basis of their knowledge, and adapt acquired skills to the novel situation (vicente & rasmussen, 1990). such an adaptive transfer is required, for instance, when a surgeon has to handle a complication that has never arisen before. a similar description of adaptive transfer can be found for complex problem solving, which requires the coordination of complex cognitive operations like action planning, strategic development, knowledge acquisition and evaluation in order to achieve a goal (funke, 2010). temporal transfer requires recognition of the situation, system and upcoming events, and the selection and execution of correct sops in terms of memory of rule-based knowledge based on “if-then” associations. adaptive transfer, by contrast, requires operators to solve problems which consist of situations that are opaque, dynamic and interconnected and call for complex problem solving skills (fischer & neubert, 2015; wüstenberg, greiff, & funke, 2012). in conclusion, complex tasks in non-routine situations require adaptive transfer if operators have to handle complex tasks for which they are not specifically trained and of which they have no personal previous experience. in this case, operators have to retrieve and use an existing skill set in order to have a basic understanding of the task and a starting point for finding new solutions. previous research findings on the relation between solving new problems and general mental ability have been conflicting (beckmann & guthke, 1995; wittmann & hattrup, 2004; wüstenberg et al., 2012). however, in the area of applied complex process control tasks, only a small amount of research has investigated the effect of general mental ability on adaptive transfer (gonzalez, thomas, & vanyukov, 2005), despite the fact that such findings could be directly applied for personnel selection. therefore, it is a highly relevant topic in this area: past accidents in nuclear power plants or refineries indicate that operators were often unable to transfer their skills to a novel situation. on the other hand, the absence of many more accidents suggests that adaptive transfer has often gone very well in the past (e.g. u. s. chemical safety and hazard investigation board, 2007). for a better understanding of why adaptive transfer goes well in some cases and badly in others, the present paper investigates the effect of general mental abilcorresponding author: annette kluge, department of work, organizational and business psychology, ruhr-university bochum, bochum, e-mail: annette.kluge@rub.de 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 1 mailto:annette.kluge@rub.de mailto:annette.kluge@rub.de http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings ity. as recent studies have demonstrated the effect of general mental ability and memory on temporal transfer, the present paper investigates whether participants are able to solve a new problem (adaptive transfer) in a process control task using their present skill set, and whether their performance is affected by general mental ability and/or memory (hülsheger, maier, & stumpp, 2007; kluge, frank, maafi, & kuzmanovska, 2015). as such, the study contributes to existing research by analysing the effect of general mental ability and memory in a complex task embodied by a simulated process control task. in the following, it is introduced how operators acquire a set of skills required to solve an adaptive transfer. subsequently, determinants of adaptive transfer and the effect of general mental ability and memory on performance in complex tasks and adaptive transfer are described. acquisition of complex cognitive skills to operate in a control room skilfully, operators acquire complex cognitive skills (van merriënboer, 1997). complex cognitive skills are described as a combination of cognitive and motor sub-skills, although most of them are cognitive sub-skills and at least some skills require conscious processing (van merriënboer, 1997). the concept of complex skill is similar to dörner and güss’ (2013) concept of action schema as the basic unit of action. similar to complex skills, an action schema consists of a sensor input schema (e.g. information on computer screen), a motor schema (e.g. how to open a valve to fill a tank), and a sensor output schema (e.g. to see whether the tank fills). an action schema is a realization of a so-called tote unit (miller, galanter, & pribram, 1960) and includes an action sequence of test (whether conditions are given), operate (execute action), test (whether expectation is met), and exit (or continue, dörner, & güss, 2013, p. 304). acquiring a skill is the basis for a competent, expert, rapid and conscious performance (anderson, 1982). for the acquisition of complex cognitive skills, an extensive learning phase is necessary (time and effort). for the adaptive transfer, the operators not only need to know “what” to do and “how” to do a task, but they also need to have an underlying understanding of the task, to know “why” things happen (kimball & holyoak, 2000). the understanding of the task can be acquired by informationand practice-based learning (salas & cannon-bowers, 1997), which can take the form of simulator-based learning (wexley & latham, 2002). this type of learning is important for the acquisition and application of learned skills in a realistic setting, which facilitates learning transfer (kluge, 2014; wexley & latham, 2002). simulator-based learning can help to transfer the learned skills in a realistic setting with a very high number of identical elements (thorndike, 1904; salas et al., 2012). in the present paper, it is assumed that complex cognitive skills in terms of knowing “what” “how” and “why” are acquired during initial training through informationand practice-based learning (salas & cannon-bowers, 1997). determinants of temporal transfer since most of the time, automated control loops regulate the technical systems, fault-finding skills as well as system control skills may face long periods of non-use (stammers, 1996; kluge, sauer, burkolter, & ritzmann, 2010). furthermore, process control involves low-frequency tasks (e.g., start-up and shut-down procedures), which are also at risk of skill decrements and loss of the ability to recall an action schema (dörner & güss, 2013) due to non-use (arthur, bennett, stanush, & mcnelly, 1998). a metaanalysis of research on skill retention revealed that procedural skills in particular are very vulnerable to forgetting (arthur et al., 1998). temporal transfer is affected by achieving a high level of proceduralisation of skills through repeated practice of a task (kluge et al., 2009). farr (1987) and arthur et al. (1998) pointed out that the degree of initial learning can be increased by rehearsal and repetition. this is supported by merrill (2001) and by further research evidence provided by foss, fabiani, mané and donchin (1989), kontogiannis and shepherd (1999), mattoon (1994), morris and rouse (1985), hesketh (1997), and hagman and rose (1983), who concluded that repetitions are effective when applied both before and after task proficiency has been achieved. similarly, goldstein and ford (2002) referred to automaticity of task completion as a powerful means to maintain performance over extended lay-off periods. research has found that satisfactory skill retention can be achieved even after considerable lay-off periods if appropriate training methods are used. however, differences in the effectiveness of different methods for temporal transfer have also been shown (e.g. kluge et al., 2009; kluge & frank, 2014). determinants of adaptive transfer adaptive transfer can be explained by analogy transfer (gentner, 1983), which means that analogical content, e.g. a complex skill and action schema (dörner & güss, 2013), has to be recalled from memory, aligned and mapped to the target scenario (gentner, holyoak, & kokinov, 2001). a learned complex skill cannot be applied 1:1 but needs to be adapted to fit the particular purpose for which no schema exists. a selection of learned skills that need to be adapted is recalled from memory in order to compare their usefulness in the new context. recall can be guided by structural similarity (“do i know a similar context? e.g. from process control? can i use my skills learned in a different plant?”) or by structural similarity (“do i remember a solution that i once applied (independent of the context), which might be useful here?”). empirical findings suggest that recall of analogical content is supported by surface similarity of the target scenario (catrambone, 2002), and the alignment and mapping of learned content for a new target scenario is supported by structural similarity (gentner, rattermann, & forbus, 1993). past research has shown that methodological and person-related variables determine adaptive transfer. methodological factors such as teaching methods (e.g. case-based learning, comparing examples, discovering), degree of learning, learning strategy, similarity of transfer context and whether the transfer is informed or uninformed have been found to be important for acquiring the competence to handle adaptive transfer (gentner, loewenstein, & thompson, 2003, 2004; kimball & holyoak, 2000). person-related factors that can affect adaptive transfer are, for example, domain-specific knowledge, prior knowledge and task specific knowledge, problem-solving competence and general mental ability (abele et al., 2012; noke, 2005). as mentioned above, adaptive transfer can be described similarly to complex problem solving (wüstenberg et al., 2012): complex problem solving means the successful interaction with dynamic task environments and a successful exploration and gathering of information to reveal the environments’ regularities (buchner, 1995). considering the 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 2 http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings similarities between adaptive transfer and complex problem solving, some studies have shown an effect of general mental ability on complex problem solving, while others have not (e.g. beckmann & guthke, 1995; wittmann & hattrup, 2004). moreover, results regarding the importance of general mental ability for solving new tasks have been inconsistent (abele et al., 2012). general mental ability can be described as the capacity “that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience” (gottfredson, 1997, p. 13). in line with the modified model of primary mental abilities (kersting, althoff, & jäger, 2008), general mental ability can be described as the combination of fluid and crystallized intelligence. fluid intelligence is the given intelligence, which cannot be influenced by the environment, provides the basis for crystallized intelligence, and is necessary in order to learn new information and to solve new problems (cattell, 1963; mcgrew, 2009; primi, ferrão, & almeida, 2010). crystallized intelligence consists of knowledge and abilities that are learned during the lifetime, and also depends on cultural background (cattell, 1963; maltby, day, & macaskill, 2011). general mental ability is assumed to be an important prerequisite for learning, meaning that high general mental ability alone will not qualify operators to complete tasks or solve a problem. rather, operators must also acquire complex cognitive skills that enable them to handle the system’s processes, procedures and objectives, and learn, for instance, how to time actions and allocate attention (fischer & neubert, 2015). for this reason, operators first have to acquire skills on what to do and how to do the task, as well as to understand the underlying mechanisms of the task in order to solve an adaptive transfer. a metaanalysis by colquitt, lepine, and noe (colquitt, lepine, & noe, 2000) showed that general mental ability predicts learning and learning transfer to a medium extent. in summary, various studies have found that general mental ability predicts learning and learning transfer of complex skills (blume, ford, baldwin, & huang, 2010; burke & hutchins, 2007; day, arthur, & gettman, 2001; day et al., 2013; hülsheger et al., 2007; rosander, bäckström, & sternberg, 2011; schmidt & hunter, 2004). as the transfer of tasks is also determined by the acquisition of knowledge and the retention of such knowledge, it therefore depends on memory (kimball & holyoak, 2000). additionally, the retrieval of analogical content is the starting point for modifying learned content for new situations (gentner et al., 1993). thus, memory might also affect adaptive transfer. memory is defined as the ability to memorise and reproduce information and associations that were learned a short time ago (kersting et al., 2008). it is the ability to store information in the short and medium term as well as to recall it (jäger, süß, & beauducel, 1997; kersting et al., 2008; thurstone, 1938). memory can be described as one component of fluid intelligence referring to the modified model of primary mental abilities (kersting et al., 2008). it is divided into three content abilities: verbal memorisation, numerical memorisation and figural memorisation. verbal memorisation describes, e.g., communication skills; numerical memorisation describes, e.g., mathematical skills; and figural memorisation describes, e.g., spatial skills. with regard to the objectives of this paper, it is assumed that, although it is likely that many of the basic mechanisms of memory are common across individuals, the encoding and organisation of information varies as a function of individuals’ memory (beauducel & kersting, 2002). moreover, despite the fact that, as early as 1938, thurstone called for memory to be exhaustively studied due to its significance for education and training, research in this area is still lacking. the present study further analyses the relation between general mental ability, memory and solving the adaptive transfer in a process control task. it is examined whether participants who are trained in an informationand practice-based manner are able to solve an adaptive transfer, and whether their performance depends on general mental ability and/or memory. information-based learning focuses on learning knowledge transfer, whereas practice-based learning focuses on learning by experience (salas & cannon-bowers, 1997). the effect of general mental ability has been widely analysed in the context of skill acquisition (e.g. ackerman, 1992; burkolter, kluge, sauer, & ritzmann, 2009; matthews, davies, westerman, & stammers, 2000). however, only a few studies have investigated the effect of general mental ability on temporal transfer and adaptive transfer in the context of complex cognitive skills in industrial tasks (e.g. day et al., 2013; gonzalez et al., 2005). so far, our own studies have shown that general mental ability and memory influence temporal transfer (frank & kluge, 2015; kluge et al., 2015). based on the theoretical outline given above, we assume that persons with higher general mental ability are better able to perform the adaptive transfer because of their greater ability to process and understand complex ideas and to learn from experience. to investigate the impact of general mental ability on complex cognitive skills, the following hypothesis is formulated: h1: general mental ability positively affects adaptive transfer. as it is important to be able to use analogical content as a starting point in new situations (kimball & holyoak, 2000), the recall of the memorised learning environment, learning interface and the learned skills is expected to affect adaptive transfer. therefore, the single effect of memory is analysed to ascertain whether memory alone might affect adaptive transfer. accordingly, we propose the following hypothesis for the effect of memory on adaptive transfer: h2: memory positively affects adaptive transfer. method sample the results of the present sample originate from a larger dfg-funded project (dfg kl2207/3-3) on skill retention and its influencing factors, with a particular focus on refresher training methods and their interaction with person-related variables. the overall sample comprised 200 participants across 10 different experimental conditions (4 refresher interventions, 1 control group x 2 types of sequences; see appendix a). the two groups without any refresher training methods are analysed to investigate the effect of general mental ability and memory on adaptive transfer in the present study. the study was conducted from october 2014 to december 2015. all participants were randomly assigned to the different experimental conditions. 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 3 http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings from october 2014 to december 2015, 40 engineering students (12 female) were part of the described subsample of the study. seven participants were excluded based on a predefined selection criterion (the participants were requested to produce >= 200 litres of purified gas). the participants were recruited by postings on social networking sites and flyers handed out to engineering students. to ensure technical understanding, which was required for the technical task, only students from faculties of engineering were recruited. participants received 25 euros for taking part. the study was approved by the local ethics committee. participants were informed about the purpose of the study and told that they could discontinue participation at any time (in terms of informed consent). all participants were novices in learning the process control task used in the study. the process control task the complex cognitive skill in the present study was acquired in a simulated process control task: the participants had to learn the content of the particular start-up procedure (sop) and how to interact with the interface in order to operate the microworld waste water treatment simulation (watrsim, figure 1). watrsim represents a typical task of a so-called control room operator in process industries such as chemical plants, refineries, steel production etc, in which operators work in control rooms and are required to observe, monitor, control, and optimize the process variables with the help of synoptic diagrams (kluge, nazir & manca, 2014). control room operators control material and energy flows, which are made to interact with and transform each other (kluge, 2014). by means of physical or chemical transformation, the “process control industry” incorporates the continuous and batch processing of materials and energy in their operations (moray, 1997). “examples include the generation of electricity in conventional fuel and nuclear power plants, the separation of petroleum by fractional distillation in refineries into gas, gasoline, oil, and residue, hot strip rolling in steel production, chemical pulping in the production of paper; pasteurization of milk, and high-pressure synthesis of ammonia” (woods, obrien & hanes 1987, p. 1726). watrsim represents a complex technical system, as it includes: couplings and interconnections (kluge, 2008; moray, 1997; vicente, 2007; wickens & hollands, 2000), which require the operator to simultaneously process the interplay of cross-coupled variables in order to either assess a process state or predict the dynamic evolution of the plant. dynamic effects (kluge, 2008; vicente, 1999; walker, stanton, salmon, jenkins, & rafferty, 2010), which require the operator to mentally process and envisage the change rates of cross-coupled variables and to develop sensitivity for the right timing of decisions in order to be successful (kluge, 2014). non-transparency (funke, 2010; kluge, 2014; vicente, 1999; woods, roth, stubler, & mumaw, 1990), which requires the operator to work with more or less abstract visual cues that need to be composed into a mental representation and compared with the operator’s mental model (kluge, 2014). multiple or conflicting goals (brehmer & dörner, 1993; funke, 2010; kluge, 2008; reason, 2008; verschuur, hudson, & parker, 1996; wickens & hollands, 2000), which require the operator either to balance management intentions or to decide on priorities in the case of goal conflicts in the decision-making process, e.g. which course of action to take (kluge, 2014). in watrsim, the operator’s task is to separate waste water into fresh water and gas by starting up, controlling and monitoring the plant. watrsim was developed by colleagues from the technical university dresden who are specialised in complex technical systems and automation (burkolter, kluge, german & grauel, 2009). the operation goal is to maximize the amount of purified water and gas and to minimize the amount of waste water. this goal is achieved by considering the timing of actions and following the start-up procedure. the type of start-up procedure is an independent variable in the present study. a demonstration of the procedure can be found here: http://www.aow.ruhr-unibochum.de/fue/gazeguiding.html.de the operation of watrsim includes the start-up of the plant in 13 steps in the fixed sequence and 18 steps in the contingent sequence (see appendix a). a startup procedure is assumed to be a non-routine situation that requires skill retention (wickens & hollands, 2000). due to safety reasons in order to avoid a deflagration, the start-up procedure is predefined. usually, the start-up process of a large chemical process plant takes several hours, up to days or even weeks. in watrsim, processes are speeded up, with one simulation step equalling one second. the operator receives direct feedback of his/her actions. the operator’s actions are executed on 6 valves, 4 tanks and a heating system in the fixed sequence (see figure 1) and 9 valves and 2 heating systems in the contingent sequence (figure 1 and appendix 1). the handling of watrsim can usually be learned within 2 hours. watrsim has been used for experimental studies since 2009 in different versions, dependent on the purpose of the respective study (e.g. burkolter et al., 2009; kluge, burkolter & frank, 2012; kluge & frank. 2014; kluge, frank & miebach, 2014; von der heyde, brandhorst & kluge, 2015a/b). the adaptive transfer task two weeks after the initial training, the participants had to perform a) the temporal transfer of the initially learned task and b) the adaptive transfer task. the adaptive transfer task consisted of controlling and adjusting the plant operation in response to an unknown technical fault that occurred: the participants were told that due to a technical fault, two tank trunks can only deliver waste water with a volume of 900 l/h in10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 4 http://www.aow.ruhr-uni-bochum.de/fue/gazeguiding.html.de http://www.aow.ruhr-uni-bochum.de/fue/gazeguiding.html.de http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings figure 1. watrsim interface. the production outcome tanks for purified gas are labelled in red. stead of 1200 l/h. additionally, the tank trunks cannot deliver waste water at inflow z1 until the simulation step 240, because of reconstructions of inflow z1. the task took 480 seconds of operating watrsim and participants controlled the plant with already filled tanks and a broken inflow. adaptive transfer was measured by production outcome. an example solution procedure is given in appendix b. research design a within-subjects design was implemented. the performance of one group was analysed at two measurement times (week 1: initial training and week 3: transfer assessment). procedure the participants attended the initial training (week 1) and two weeks later took part in the transfer assessment (operation without help; week 3; figure 2). initial training: in week 1, the participants were trained in an informationand practice-based manner on how to start up the plant using the start-up procedure. this initial training lasted for 120 minutes. first, the participants were welcomed and introduced to watrsim. after completing tests concerning variables measuring individual differences relevant for the study (general mental ability and memory) and prior knowledge, participants explored and familiarised themselves with the simulation twice. they were then given information and instructions about the start-up procedure and practiced performing the 13-step start-up procedure (see appendix a) four times. during these first four trials, participants were allowed to use and consult the manual. following this, they had to perform the start-up procedure four times without the manual and were required to produce 200 litres of purified gas at least once. the best trial of this series was used as the reference level of skill mastery after training. the amount of purified gas was used as selection criterion: the participants were requested to produce >= 200 litres of purified gas. transfer assessment: two weeks after the initial training, the transfer assessment took place, which lasted for approximately 30 minutes (week 3). after the participants had been welcomed, they were asked to start up the plant up to five times without help (temporal transfer) and were then asked to operate a new task (adaptive transfer). variables and measures independent variables: general mental ability and memory general mental ability: general mental ability was measured at the beginning of the initial training with a german version of the wonderlic personnel test (wonderlic, 2002). participants answered 50 items in twelve minutes, including analogies, analysis of geometric figures, logic tasks, mathematical tasks, similarities or word definitions like "a boy is five years old and his sister is twice his age. when the boy is eight, how old will his sister be?" correct answers were summed up (range: 0 to 50; α=.93; wonderlic, 2002). the average score was 26.52 (sd=5.11), which is comparable to scores from other german-speaking studies (cf. blickle & kramer, 2012). memory: memory was measured with the wilde intelligence test-2, consisting of verbal, numerical and figural information (kersting et al., 2008). the participants had to memorise the presented information for four minutes. after a disruption phase of 17 minutes, they answered 21 reproduction tasks of the memorised information, choosing one of six response options (range: 0 to 21; α=.78; kersting et al., 2008). sequence: participants executed either the fixedor the contingent-sequence start-up procedure. the operation included the start-up procedure of the plant as a fixed sequence comprising 13 steps (described in appendix a) or a contingent sequence with 13 steps and five consecutive steps. performing the watrsim 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 5 http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings figure 2. procedure of the study. start-up procedure with a fixed sequence correctly and in a timely manner led to a production outcome of a minimum of 200 litres of purified gas. the start-up time was max. 180 seconds. the operation of the contingent sequence included the start-up procedure of the plant as a contingent-sequence task comprising 13 steps and a subsequent five steps for each condition. the subsequent five steps had to be executed depending on the conditions: heating w1>15◦c or heating w2<70◦c. after one of the conditions (step 1 of 5) had occurred, the correct following four steps had to be executed (described in appendix a). performing the watrsim start-up procedure correctly and in a timely manner led to a production outcome of a minimum of 200 litres of purified gas. the start-up time was max. 240 seconds due to the fact that the start-up procedure consisted of five additional steps. compared to the fixed sequence, when executing the contingent sequence, the correct collection, selection and interpretation of information is critical in order to correctly understand the status of the plant. based on the correctly inferred status of the plant (the “if” condition in the present study, whether w1 > 15◦c or > 70◦c), the operator decides (“then”) which steps of the procedures need to be applied – a or b. this process requires selective attention (wickens & mccarley, 2008) and visual scanning of the interface in order to gather the required data from the screen. dependent variables: adaptive transfer adaptive transfer: the adaptive transfer was measured by the produced amount of purified gas at the transfer assessment (week 3). control variables: temporal transfer and prior knowledge temporal transfer: the baseline temporal transfer was measured by the produced amount of purified gas at initial training (week 1) and transfer assessment (week 3), which was logged by the simulation program. to ensure that all participants began with a similar set of skills, they were required to produce a minimum of >= 200 litres at initial training. the best trial of initial training and the first trial of transfer assessment were used for calculations. the best trial was used as a reference level for the best performance shown during initial training because participants were required to produce 200 litres at least once, and the first trial of transfer was used to assess participants’ skill level after two weeks of not using the skill as this would be necessary, for example, after a real-world shut down. prior knowledge: as previous studies have shown an effect of domain or task knowledge on solving new problems (abele et al., 2012; kimball & holyoak, 2000), domain knowledge was assessed with a prior knowledge test. this test included seven questions about wastewater and general technical knowledge, and assessed knowledge about wastewater treatment and basic chemical understanding (range: 0 to 7; α=.65). results to ensure that all participants were able to operate the task correctly and started under the same conditions, only participants with a production outcome of ≥ 200 litres (selection criterion) during initial training were included in the calculations. thirty-three of the 40 participants were included in the following calculations. descriptive statistics are given in table 1. to ensure that prior knowledge had no effect on adaptive transfer, a spearman correlation was calculated, with r=.261, p=.142 (see table 1). hypothesis-testing a spearman correlation between general mental ability, memory and adaptive transfer was calculated to test the hypotheses. the results are shown in table 1. the correlations showed a significant effect of general mental ability (r=.385, p=.027) on adaptive transfer but no effect of memory (r=.142, p=.432). hypothesis 1, regarding the correlation of general mental ability and adaptive transfer, was therefore supported, but hypothesis 2, regarding the correlation of memory and adaptive transfer, cannot be accepted. additionally, a significant correlation between adaptive transfer and temporal transfer at week 2 was found (r=.350, p=.046). 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 6 http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings table 1. descriptive statistics and correlations statistics corrrelations 1 2 3 4 5 6 7 1 sex 10 female 23 male 2 age 22.36 (3.06, 18-30) .186 3 prior knowledge 5.42 (1.37, 2-7) .189 .342 4 baseline for temporal transfer (week 1; min. 200 litres) 383.48 (96.17, 236-607.96) .232 .113 -.055 5 temporal transfer (week 3) 55.95 (98.95, 0-299.93) .043 .074 .133 -.031 6 general mental ability (0-50) 26.52 (5.11, 18-36) .108 .241 .385* -.004 .219 7 memory (0-21) 14.18 (2.88, 6-20) -.028 .194 .408* .042 -.031 .536** 8 adaptive transfer (week 3) 337.95 (291.38, 0-948) .098 .229 .261 .228 .350* .385* .142 in the following, the sequence of start-up procedure (fixed or contingent sequence) was considered as independent variable to analyse whether general mental ability had an effect on adaptive transfer as covariate. an ancova with dependent variable adaptive transfer, independent variable sequence (fixed or contingent sequence) and covariate general mental ability was calculated. the calculations showed a significant main effect of covariate general mental ability on adaptive transfer (f(1,32)=6.68, p=.015, n2p=.18). no effect of type of sequence was found (f(1,32)=1.19, p=.284, n2p=.04). this indicates the effect of general mental ability but no effect of sequence. moreover, when analysing the effect of memory on adaptive transfer, no significant effect of memory or type of sequence was found (memory: f(1,32)=1.34, p=.403, n2p=.77; sequence: (f(1,32)=1.33, p=.309, n2p=.24). discussion the present study showed that general mental ability and adaptive transfer correlated positively, with a medium effect size. however, no effect of memory on adaptive transfer was found. for further understanding and to ensure that the different learning conditions did not cause any biases, it was analysed whether the task itself had an effect on the presented results, but no effect of task was found. this leads to the conclusion that the performance in an adaptive transfer task correlates with general mental ability regardless of the initially learned task. the correlation between general mental ability and adaptive transfer provides a slight indication that participants with higher general mental ability levels have a higher chance of solving complex problems. the results on the effect of general mental ability are in line with previous research demonstrating a relation of general mental ability with performance, complex problem solving or temporal transfer (buchner, 1995; burke & hutchins, 2007; cattell, 1987; day et al., 2001; gentner et al., 1993; hülsheger et al., 2007). the present findings are also supported by other research using a similar process control task for temporal transfer (frank & kluge, submitted). nevertheless, it was also found that the cognitive ability of memory is not required for adaptive transfer. this can be explained by the fact that to solve a complex task, it is not sufficient to remember how a once learned task has to be performed; rather, it also seems to be necessary to be able to understand the new task and combine and develop new strategies. this is also supported by previous studies on memory, which found memory to be an important factor for temporal transfer of complex tasks (frank & kluge, 2015; kluge et al., 2015). however, as the present study shows, memory alone might not be as important as general mental ability for adaptive transfer. thus, memory seems to be more important for temporal transfer than for adaptive transfer. the results showed a substantial correlation between general mental ability and memory and also a correlation between temporal transfer at week 3 and adaptive transfer. this indicates that the performance before the adaptive transfer influenced the performance at the adaptive transfer. the low temporal transfer at week 3 leads to the conclusion that if participants had shown a higher temporal transfer at week 3, they could have operated the adaptive transfer even better than in the present study. this can be attributed to the fact that with a better performance in temporal transfer, the skill is retrained, and the task, operation and underlying procedures might be better understood by the participants (beckmann & guthke, 1995). as past studies have shown inconsistent results regarding the effect of general mental ability on complex problem solving, and due to the small sample size, these results should be replicated in future studies with a larger number of participants and a wide range of tasks to reproduce the effect (beckmann & guthke, 1995; wittmann & hattrup, 2004). it would also be interesting to analyse sub-processes of the task in order to investigate the effects of rule identification, rule knowledge and rule application, as components of complex problem solving, on adaptive transfer. 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 7 http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings to ensure that all participants had a similar background and similar previous knowledge, the sample in the present study comprised only engineering students. moreover, only participants with a minimum production outcome were included. this might have affected the variance of the study sample, but on the other hand, it was necessary to be certain that all participants had the same skill level before performing the adaptive transfer task. the effect of prior knowledge was assessed and showed no effect on adaptive transfer, which is in line with previous research (kluge & frank, 2014; kluge, frank, & miebach, 2014). however, future studies could also control for task-specific knowledge. as past studies found effects of domainor task-specific knowledge on problem-solving skills (abele et al., 2012), the present results would benefit from an analysis of the moderating effect of such concepts. additionally, the used skill acquisition learning method was designed to teach the temporal transfer of the complex task. future studies could apply a more strategic learning method with a focus on a deeper understanding of the task and the use of complex problem solving, which could help to gain more specific domain knowledge (anderson, 2005; kimball & holyoak, 2000). as past studies showed an impact of general mental ability on learning and of learning on performance, it might be interesting to analyse the moderation or mediation effects of learning strategies on the relationship between mental ability and adaptive transfer. in summary, the present study gives first indications that in a complex task work environment, the handling and operation of an adaptive transfer can be affected by the employees’ general mental ability. the findings also indicate that memory is not required for solving complex problems. however, in order to take into account variables other than general mental ability, future studies could analyse the potential of different learning strategies to counteract general mental ability effects on adaptive transfer. acknowledgements: this research is supported by german research foundation (kl2207/3-3). declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. handling editor: andreas fischer author contributions: the authors contributed equally to this work. supplementary material: supplementary material is available online: http://www.aow.ruhr-unibochum.de/fue/gazeguiding.html.de copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: frank, b., & kluge, a. (2017). the effects of general mental ability and memory on adaptive transfer in work settings. journal of dynamic decision making, 3, 4. doi:10.11588/jddm.2017.1.40004 received: 03 july 2017 accepted: 10 september 2017 published: 06 october 2017 references abele, s., greiff, s., gschwendtner, t., wüstenberg, s., nickolaus, r., & funke, j. (2012). dynamische problemlösekompetenz – ein bedeutsamer prädiktor von problemlöseleistungen in technischen anforderungskontexten? [dynamic problem solving competency – a significant predictor for solving problems in technical requirements contexts?]. zeitschrift für erziehungswissenschaft, 15(2), 363–391. doi:10.1007/s11618-012-0277-9 ackerman, p. l. (1992). prediciting individual differences in complex skill acquisition: dynamics of ability determinants. journal of applied psychology, 77(5), 598–614. doi:10.1037//00219010.77.5.598 anderson, j. r. (1982). acquisition of cognitive skill. psychological review, 89(4), 369–406. doi:10.1037//0033-295x.89.4.369 anderson, j. r. (2005). cognitive psychology and its implications (6. edition). new york: worth pub. arthur, w., bennett, w., stanush, p., & mcnelly, t. (1998). factors that influence skill decay and retention: a quantitative review and analysis. human performance, 11(1), 57–101. doi:10.1207/s15327043hup1101_3 beauducel, a., & kersting, m. (2002). fluid and crystallized intelligence and the berlin model of intelligence structure (bis). european journal of psychological assessment, 18(2), 97. doi:10.1027//1015-5759.18.2.97 beckmann, j. f., & guthke, j. (1995). complex problem solving, intelligence, and learning ability. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective (pp. 177–200). hillsdale: erlbaum. blickle, g., & kramer, j. (2012). intelligenz, persönlichkeit, einkommen und fremdbeurteilungen der leistung in sozialen berufen [intelligence, personality, income and external assessment of performance in social professions]. zeitschrift für arbeitsund organisationspsychologie, 56(1), 1– 10. doi:10.1026/0932-4089/a000070 blume, b. d., ford, j. k., baldwin, t. t., & huang, j. l. (2010). transfer of training: a meta-analytic review. journal of management, 36(4), 1065–1105. doi:10.1177/0149206309352880 bolstad, c. a., cuevas, h. m., costello, a. m., & babbitt, b. (2008). predicting cognitive readiness of deploying military medical teams. proceedings of the human factors and ergonomics society annual meeting september 2008, 52(14), 970– 974. doi:10.1177/154193120805201404 brehmer, b., & dörner, d. (1993). experiments with computersimulated microworlds: escaping both the narrow straits of the laboratory and the deep blue sea of the field study. computers in human behavior, 9(2–3), 171–184. doi:10.1016/07475632(93)90005-d 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 8 http://www.aow.ruhr-uni-bochum.de/fue/gazeguiding.html.de http://www.aow.ruhr-uni-bochum.de/fue/gazeguiding.html.de http://dx.doi.org/10.11588/jddm.2017.1.40004 https://doi.org/10.1007/s11618-012-0277-9 https://doi.org/10.1037//0021-9010.77.5.598 https://doi.org/10.1037//0021-9010.77.5.598 https://doi.org/10.1037//0033-295x.89.4.369 https://doi.org/10.1207/s15327043hup1101_3 https://doi.org/10.1027//1015-5759.18.2.97 https://doi.org/10.1026/0932-4089/a000070 https://doi.org/10.1177/0149206309352880 https://doi.org/10.1177/154193120805201404 https://doi.org/10.1016/0747-5632(93)90005-d https://doi.org/10.1016/0747-5632(93)90005-d http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings buchner, a. (1995). basic topics and approaches to the study of complex problem solving. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective (pp. 27– 63). hillsdale: erlbaum. burke, l. a., & hutchins, h. m. (2007). training transfer: an integrative literature review. human resource development review, 6(3), 263–296. doi:10.1177/1534484307303035 burkolter, d., kluge, a., sauer, j., & ritzmann, s. (2009). the predictive qualities of operator characteristics for process control performance: the influence of personality and cognitive variables. ergonomics, 52(3), 302–311. doi:10.1080/00140130802376067 catrambone, r. (2002). the effects of surface and structural feature matches on the access of story analogs. journal of experimental psychology: learning, memory, and cognition, 28(2), 318–334. doi:10.1037//0278-7393.28.2.318 cattell, r. b. (1963). theory of fluid and crystallized intelligence: a critical experiment. journal of educational psychology, 54(1), 1-22, doi:10.1037/h0046743 cattell, r. b. (1987). intelligence: its structure, growth and action. amsterdam: elsevier. colquitt, j. a., lepine, j. a., & noe, r. a. (2000). toward an integrative theory of training motivation: a meta-analytic path analysis of 20 years of research. journal of applied psychology, 85(5), 678–707.doi:10.1037//0021-9010.85.5.678 day, e. a., arthur, w., & gettman, d. (2001). knowledge structures and the acquisition of a complex skill. journal of applied psychology, 86(5), 1022. doi:10.1037//0021-9010.86.5.1022 day, e. a., arthur, w., jr., villado, a. j., boatman, p. r., kowollik, v., bhupatkar, a., et al. (2013). relating individual differences in ability, personality, and motivation to the retention and transfer of skill on a complex command-and-control simulation task. in w. j. arthur, e. a. day, w. j. bennett & a. m. portrey (eds.), individual and team skill decay (pp. 283–301). new york: routledge. dörner, d., & güss, c. d. (2013). psi: a computational architecture of cognition, motivation, and emotion. review of general psychology, 17(3), 297–317. doi:10.1037/a0032947 farr, m. j. (1987). the long-term retention of knowledge and skills. new york: springer. fischer, a., & neubert, j. c. (2015). the multiple faces of complex problems: a model of problem solving competency and its implications for training and assessment. journal of dynamic decision making, 1(6), 1–13. doi:10.11588/jddm.2015.1.23945 foss, m. a., fabiani, m., mané, a. m., & donchin, e. (1989). unsupervised practice: the performance of the control group. acta psychologica, 71(1–3), 23–51. doi:10.1016/00016918(89)90004-8 frank, b., & kluge, a. (2015). the predictive quality of retentivity for skill acquisition and retention in a simulated process control task. in d. de waard, j. sauer, s. röttger, a. kluge, d. manzey, c. weikert, a. toffetti, r. wiczorek, k. brookhuis, & h. hoonhout (eds.), proceedings of the human factors and ergonomics society europe chapter 2014 annual conference. available from http://hfes-europe.org. frank, b., & kluge, a. (submitted). complex cognitive skill retention: the roles of general mental ability and refresher interventions in a simulated vocational setting. funke, j. (2010). complex problem solving: a case for complex cognition? cognitive processing, 11(2), 133–142. doi:10.1007/s10339-009-0345-0 gentner, d. (1983). structure mapping: a theoretical framework for analogy. cognitive science, 7(2), 155–170. doi:10.1016/s0364-0213(83)80009-3 gentner, d., holyoak, k. j., & kokinov, b. n. (2001). the analogical mind: perspectives from cognitive science. cambridge: mit press. gentner, d., loewenstein, j., & thompson, l. (2003). learning and transfer: a general role for analogical encoding. journal of educational psychology, 95(2), 393–408. doi:10.1037/00220663.95.2.393 gentner, d., loewenstein, j., & thompson, l. (2004). analogical encoding: facilitating knowledge transfer and integration. proceedings of twenty-sixth annual meeting of cognitive science society. available online: http://groups.psych.northwestern.edu/ gentner/papers/gentnerloewensteinthompson04.pdf gentner, d., rattermann, m. j., & forbus, k. d. (1993). the roles of similarity in transfer: separating retrievability from inferential soundness. cognitive psychology, 25(4), 524–575. doi:10.1006/cogp.1993.1013 goldstein, i. l., & ford, j. k. (2002). training in organizations. belmont: wadsworth. gonzalez, c., thomas, r. p., & vanyukov, p. (2005). the relationships between cognitive ability and dynamic decision making. intelligence, 33(2), 169–186. doi:10.1016/j.intell.2004.10.002 gottfredson, l. s. (1997). why g matters: the complexity of everyday life. intelligence, 24(1), 79–132. doi:10.1016/s01602896(97)90014-3 hagman, j. d., & rose, a. m. (1983). retention of military tasks: a review. human factors, 25(2), 199–213. doi:10.1177/001872088302500207 hesketh, b. (1997). dilemmas in training for transfer and retention. applied psychology: an international review, 46(4), 317–386. doi:10.1111/j.1464-0597.1997.tb01234.x hülsheger, u. r., maier, g. w., & stumpp, t. (2007). validity of general mental ability for the prediction of job performance and training success in germany: a meta-analysis. international journal of selection and assessment, 15(1), 3–18. doi:10.1111/j.1468-2389.2007.00363.x ivancic, i. k., & hesketh, b. (2000). learning from errors in a driving simulation: effects on driving skill and self-confidence. ergonomics, 43(12), 1966–1984. doi:10.1080/00140130050201427 jäger, a. o., süß, h.-m., & beauducel, a. (1997). berliner intelligenzstruktur-test, form 4. göttingen: hogrefe. kersting, m., althoff, k., & jäger, a. o. (2008). wilde-intelligenztest 2 (wit-2). göttingen: hogrefe. kimball, d. r., & holyoak, k. j. (2000). transfer and expertise. in e. tulving & f. i. m. craik (eds.), the oxford handbook of memory (pp. 109–122). new york: oxford university press. kluge, a. (2008). what you train is what you get? task requirements and training methods in complex problem solving. computers in human behavior, 24(2), 284–308. doi:10.1016/j.chb.2007.01.013 kluge, a. (2008). performance assessments with microworlds and their difficulty. applied psychological measurement, 32(2), 156– 180. doi:10.1177/0146621607300015 kluge, a. (2014). the acquisition of knowledge and skills for taskwork and teamwork to control complex technical systems. heidelberg: springer. kluge, a., & burkolter, d. (2013). training for cognitive readiness: research issues and experimental designs. journal of cognitive engineering and decision making, 7(1), 96–118. doi:10.1177/1555343412446483 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 9 https://doi.org/10.1177/1534484307303035 https://doi.org/10.1080/00140130802376067 https://doi.org/10.1037//0278-7393.28.2.318 https://doi.org/10.1037/h0046743 https://doi.org/10.1037//0021-9010.85.5.678 https://doi.org/10.1037//0021-9010.86.5.1022 https://doi.org/10.1037/a0032947 https://doi.org/10.11588/jddm.2015.1.23945 https://doi.org/10.1016/0001-6918(89)90004-8 https://doi.org/10.1016/0001-6918(89)90004-8 http://hfes-europe.org https://doi.org/10.1007/s10339-009-0345-0 https://doi.org/10.1016/s0364-0213(83)80009-3 https://doi.org/10.1037/0022-0663.95.2.393 https://doi.org/10.1037/0022-0663.95.2.393 http://groups.psych.northwestern.edu/gentner/papers/gentnerloewensteinthompson04.pdf http://groups.psych.northwestern.edu/gentner/papers/gentnerloewensteinthompson04.pdf https://doi.org/10.1006/cogp.1993.1013 https://doi.org/10.1016/j.intell.2004.10.002 https://doi.org/10.1016/s0160-2896(97)90014-3 https://doi.org/10.1016/s0160-2896(97)90014-3 https://doi.org/10.1177/001872088302500207 https://doi.org/10.1111/j.1464-0597.1997.tb01234.x https://doi.org/10.1111/j.1468-2389.2007.00363.x https://doi.org/10.1080/00140130050201427 https://doi.org/10.1016/j.chb.2007.01.013 https://doi.org/10.1177/0146621607300015 https://doi.org/10.1177/1555343412446483 http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings kluge, a., burkolter, d., & frank, b. (2012). “being prepared for the infrequent”: a comparative study of two refresher training approaches and their effects on temporal and adaptive transfer in a process control task. proceedings of the human factors and ergonomics society annual meeting, 56(1), 2437-2441. doi:10.1177/1071181312561496 kluge, a., & frank, b. (2014). counteracting skill decay: four refresher interventions and their effect on skill and knowledge retention in a simulated process control task. ergonomics, 57(2), 175–190. doi:10.1080/00140139.2013.869357 kluge, a., frank, b., maafi, s., & kuzmanovska, a. (2015). does skill retention benefit from retentivity and symbolic rehearsal? two studies with a simulated process control task. ergonomics, 59(5), 641–656. doi:10.1080/00140139.2015.1101167 kluge, a., frank, b., & miebach, j. (2014). measuring skill decay in process control results from four experiments with a simulated process control task. in d. de waard, k. brookhuis, r. wiczorek, f. di nocera, r. brouwer, p. barham, c. weikert, a. kluge, w. gerbino, & a. toffetti (eds.), proceedings of the human factors and ergonomics society europe chapter 2013 annual conference. available from http://hfes-europe.org. kluge, a., nazir, s. & manca, d. (2014). advanced applications in process control and training needs of field and control room operators. iie transactions on occupational ergonomics and human factors, 2(3–4), 121–136, doi: 10.1080/21577323.2014.920437 kluge, a., sauer, j., burkolter, d., & ritzmann, s. (2010). designing training for temporal and adaptive transfer: a comparative evaluation of three training methods for process control tasks. journal of educational computing research, 43(3), 327– 353. doi:10.2190/ec.43.3.d kluge, a., sauer, j., schüler, k. & burkolter, d. (2009). designing training for process control simulators: a review of empirical findings and common practice. theoretical issues in ergonomic science, 10(6), 489–509. doi:10.1080/14639220902982192 kontogiannis, t., & shepherd, a. (1999). training conditions and strategic aspects of skill transfer in a simulated process control task. human-computer interaction, 14(4), 355–393. doi:10.1207/s15327051hci1404_1 maltby, j., day, l., & macaskill, a. (2011). personality, individual differences and intelligence. upper saddle river, nj: pearson education. matthews, g., davies, d. r., westerman, s. j., & stammers, r. b. (2000). human performance: cognition, stress, and individual differences. hove: psychology press. mattoon, j. s. (1994). designing instructional simulations: effects of instructional control and type of training task on displayinterpretation skills. the international journal of aviation psychology, 4(3), 189–209. doi:10.1207/s15327108ijap0403_1 mcgrew, k. s. (2009). chc theory and the human cognitive abilities project: standing on the shoulders of the giants of psychometric intelligence research. intelligence, 37(1), 1–10. doi:10.1016/j.intell.2008.08.004 merrill, m. d. (2002). first principles of instruction. educational technology research and development, 503, 43–59. doi:10.1007/bf02505024 miller, g. a., galanter, e., & pribram, k. h. (1960). plans and the structure of behavior. new york, ny: holt, rinehart & winston. doi:10.1037/10039–000 moray, n. (1997). human factors in process control. in g. salvendy (ed.), handbook of human factors and ergonomics (pp. 1944–1971). new york: wiley morris, n. m. & rouse, w. b. (1985). review and evaluation of empirical research in troubleshooting. human factors, 27(5), 503–530. doi:10.1177/001872088502700502 noke, t. j. (2005). an investigation into adaptive shifting in knowledge transfer. proceedings of the twenty-seventh annual conference of the cognitive science society. primi, r., ferrão, m. e., & almeida, l. s. (2010). fluid intelligence as a predictor of learning: a longitudinal multilevel approach applied to math. learning and individual differences, 20(5), 446– 451. doi:10.1016/j.lindif.2010.05.001 reason, j. (2008). the human contribution. unsafe acts, accidents, and heroic recoveries. surrey: ashgate. rosander, p., bäckström, m., & sternberg, g. (2011). personality traits and general intelligence as predictors of academic performance: a structural equation modelling approach. learning and individual differences, 21(5), 590–596. doi:10.1016/j.lindif.2011.04.004 salas, e., & cannon-bowers, j. a. (1997). methods, tools, and strategies for team training. in m. a. quiñones & a. ehrenstein (eds.), training for a rapidly changing workplace: applications of psychological research (pp. 249–279). washington d.c.: american psychological association. salas, e., tannenbaum, s. i., kraiger, k., & smith-jentsch, k. a. (2012). the science of training and development in organizations: what matters in practice. psychological science in public interest, 13(2), 74–101. doi:10.1177/1529100612436661 schmidt, f. l., & hunter, j. (2004). general mental ability in the world of work: occupational attainment and job performance. journal of personality and social psychology, 86(1), 162. doi:10.1037/0022-3514.86.1.162 stammers, r. b. (1996). training issues. in n. stanton (ed.), human factors in nuclear safety (pp. 189–197). london: taylor & francis. thorndike, e. l. (1904). an introduction to the theory of mental and social measurements. new york: teachers college. thurstone, l. l. (1938). primary mental abilities. chicago: university of chicago press. u.s. chemical safety and hazard investigation board (2007). investigation report: refinery explosion and fire. retrieved from http://www.csb.gov/assets/1/19/csbfinalreportbp .pdf (09.01.2017). van merriënboer, j. j. g. (1997). training complex cognitive skills: a four-component instructional design model for technical training. englewood cliffs: educational technology. verschuur, w., hudson, p., parker, d. (1996). violations of rules and procedures: results of item analysis and test of the behavioural model. field study nam and shell expro aberdeen. report leiden university of sip: leiden. vicente, k. j. (1999). cognitive work analysis: toward safe, productive, and healthy computer-based work. mahwah: lawrence erlbaum assoc. vicente, k.j. (2007). monitoring a nuclear power plant. in a. f. kramer, d.a. wiegmann, & a. kirlik (eds.), attention. from theory to practice (pp. 90-99). oxford: oxford university press. vicente, k. j., & rasmussen, j. (1990). the ecology of human-machine systems ii: mediating direct perception in complex work domains. ecological psychology, 2(3), 207–249. doi:10.1207/s15326969eco0203_2 von der heyde, a., brandhorst, s. & kluge a. (2015a). the impact of the accuracy of information about audit probabilities on safetyrelated rule violations and the bomb crater effect. safety science, 74, 160–171. doi:10.1016/j.ssci.2014.12.004 walker, g. h., stanton, n. a., salmon, p. m., jenkins, d. p., & rafferty, l. (2010). translating the concepts of complexity to the field of ergonomics. ergonomics, 53(10), 1175–1186. 10.1080/00140139.2010.513453 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 10 https://doi.org/10.1177/1071181312561496 https://doi.org/10.1080/00140139.2013.869357 https://doi.org/10.1080/00140139.2015.1101167 http://hfes-europe.org https://doi.org/10.1080/21577323.2014.920437 https://doi.org/10.2190/ec.43.3.d https://doi.org/10.1080/14639220902982192 https://doi.org/10.1207/s15327051hci1404_1 https://doi.org/10.1207/s15327108ijap0403_1 https://doi.org/10.1016/j.intell.2008.08.004 https://doi.org/10.1007/bf02505024 http://dx.doi.org/10.1037/10039--000 https://doi.org/10.1177/001872088502700502 https://doi.org/10.1016/j.lindif.2010.05.001 https://doi.org/10.1016/j.lindif.2011.04.004 https://doi.org/10.1177/1529100612436661 https://doi.org/10.1037/0022-3514.86.1.162 http://www.csb.gov/assets/1/19/csbfinalreportbp.pdf http://www.csb.gov/assets/1/19/csbfinalreportbp.pdf https://doi.org/10.1207/s15326969eco0203_2 https://doi.org/10.1016/j.ssci.2014.12.004 https://doi.org/10.1080/00140139.2010.513453 http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings wexley, k. n., & latham, g. p. (2002). developing and training of human resources in organizations. upper saddle river, nj: prentice hall. wickens, c. d., & hollands, j. g. (2000). engineering psychology and human performance. upper saddle river: prentice hall. wickens, c. d. & mccarley, j. s. (2008). applied attention theory. boca raton: crc press. wittmann, w. w., & hattrup, k. (2004). the relationship between performance in dynamic systems and intelligence. systems research and behavioral science, 21(4), 393–409. doi:10.1002/sres.653 wonderlic, i. (2002). wonderlic personnel test. libertyville: wonderlic inc. woods, d. d., o’brien, j. f., hanes, l. f. (1987). human factors challenges in process control: the case of nuclear power plants. in g. salvendy (ed.), handbook of human factors (pp. 1725– 1770). new york: john wiles & sons. woods, d. d., roth, e. m., stubler, w. f., & mumaw, r. j. (1990). navigating through large display networks in dynamic control applications. proceedings of the human factors and ergonomics 34th annual meeting, 396–399. doi:10.1177/154193129003400435 wüstenberg, s., greiff, s., & funke, j. (2012). complex problem solving more than reasoning? intelligence, 40(1), 1–14. doi:10.1016/j.intell.2011.11.003 appendix appendix a: procedures study 1 and study 2 for further understanding, see also: http://www.aow .ruhr-uni-bochum.de/fue/gazeguiding.html.de st ep study 1 fixed-sequence task start-up procedure: 13 steps description 1 lic v9: flow rate 500 l/h 2 v2 deactivate follower control 3 valve v1: flow rate 500 l/h 4 wait until r1 > 200 l 5 valve v2: flow rate 500 l/h 6 wait until r1 > 400 l 7 valve v3: flow rate 1000 l/h 8 wait until hb1 > 100 l 9 activate heating hb1 10 wait until hb1 > 60◦c 11 activate column k1 12 valve v4: flow rate 1000 l/h 13 valve v6: flow rate 400 l/h st ep study 2 contingent-sequence task start-up procedure: 13 steps and 2x5 steps description 1 lic v9: flow rate 500 l/h 2 v2 deactivate follower control 3 valve v1: flow rate 500 l/h 4 wait until r1 > 200 l 5 valve v2: flow rate 500 l/h 6 wait until r1 > 400 l 7 valve v3: flow rate 1000 l/h 8 wait until hb1 > 100 l 9 activate heating hb1 10 wait until hb1 > 60◦c 11 activate column k1 12 valve v4: flow rate 1000 l/h 13 valve v6: flow rate 400 l/h 14 w1 > 15◦c or w2 > 70◦c 15 lic v8 deactivate lic v8 deactivate 16 lic v9 700 l/h lic v9 600 l/h 17 lic v8 500 l/h lic v8 400 l/h 18 heating w1 15◦c heating w2 70◦c 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 11 https://doi.org/10.1002/sres.653 https://doi.org/10.1177/154193129003400435 https://doi.org/10.1016/j.intell.2011.11.003 http://www.aow.ruhr-uni-bochum.de/fue/gazeguiding.html.de http://www.aow.ruhr-uni-bochum.de/fue/gazeguiding.html.de http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 frank & kluge: adaptive transfer in work settings appendix b: adaptive transfer example solution procedure appendix b shows one possible strategy to solve the adaptive task after two weeks. however, this procedure is only one of many possible solutions on how to handle the adaptive transfer. adaptive transfer example solution procedure step description 1 lic v9: flow rate 500 l/h 2 v2 deactivate follower control 3 valve v3: flow rate 500 l/h 4 activate heating hb1 5 activate column k1 6 valve v4: flow rate 500 l/h 7 valve v2: flow rate 700 l/h 8 valve v3: flow rate 700 l/h 9 valve v4: flow rate 700 l/h 10 simulation step 240: wait until ba > 400 l 11 valve v1: flow rate 600 l/h 12 valve v3: flow rate 1400 l/h 13 valve v4: flow rate 1080 l/h 14 valve v4: flow rate 700 l/h 15 valve v6: flow rate 400 l/h 10.11588/jddm.2017.1.40004 jddm | 2017 | volume 3 | article 4 | 12 http://journals.ub.uni-heidelberg.de/index.php/jddm/article/view/40004 original research a study on the training of complex problem solving competence andré kretzschmar1 and heinz-martin süß2 1eccs research unit, university of luxembourg, luxembourg, and 2institute of psychology i, otto von guericke university magdeburg, germany this study examined whether experience with different computer-based complex problem situations would improve complex problem solving (cps) competence in an unknown problem situation. we had n = 110 university students take part in a control group study. they were trained in five different complex problem situations for up to 7 hr, and their performance was tested in a sixth complex problem situation. the data analyses revealed that the training influenced the cps process of knowledge acquisition. however, the cps process of knowledge application was not impacted by experience with other problem situations. implications for the concept of cps as a trainable competence as well as the training of cps in general are discussed. keywords: complex problem solving, cognitive training, transfer, flexibility training, experience, fsys complex problem solving (cps) 1 was introduced into european psychology by dörner and colleagues (e.g., dörner, kreuzig, reither, & stäudel, 1983) and immediately attracted attention as a new cognitive ability that was applicable to real-life demands (dörner, 1986). the handling of these real-life demands (e.g., the interconnectedness of problem areas or the dynamic development of a problem situation) has been an integral part of cps performance and was thus included in buchner’s definition of cps: the successful interaction with task environments that are dynamic (i.e., change as a function of user’s intervention and/or as a function of time) and in which some, if not all, of the environment’s regularities can only be revealed by successful exploration and integration of the information gained in that process. (frensch & funke, 1995, p. 14) decades later, in a time of rapid technological and scientific advances, cps became a rising fundamental issue in science, industry, and education (e.g., funke, 1999; neubert, mainert, kretzschmar, & greiff, 2015; oecd, 2014). for example, in the educational context, cps has shown its utility in several respects (e.g., greiff et al., 2013; kretzschmar, neubert, & greiff, 2014; scherer & tiemann, 2012; sonnleitner, keller, martin, & brunner, 2013), and it is currently an important part of national and international large-scale assessments such as the programme for international student assessment (pisa; oecd, 2014). especially in this context, cps is considered to be a crosscurricular and knowledge-based competence (oecd, 2014). according to the understanding of cps as a competence, cps consequently differs in one considerable feature from related cognitive abilities such as intelligence (see süß, 1996; wittmann & hattrup, 2004; wüstenberg, greiff, & funke, 2012). whereas cognitive abilities are relatively stable over time and are not viewed as trainable, competencies are per definition modifiable through interventions (weinert, 2001). this article adopts the perspective of viewing cps as a competence and aims to examine its trainability. the training of complex problem solving competence in general, the training and transfer of different cognitive achievements has been an exciting research topic that is highly relevant to real life. for example, recent studies have demonstrated that training with video games may improve cognitive performance outside the game context (e.g., basic visual attention or executive control; e.g., anguera et al., 2013; green & bavelier, 2003; strobach, frensch, & schubert, 2012). although the findings and especially the transfer to untrained tasks have been critically discussed (e.g., boot, blakely, & simons, 2011), the rationale behind such cognitive training approaches is that training does not improve performance on only a single task, but rather that training improves achievements on other cognitive tasks or with regard to real-life criteria, too. naturally, a cps competence training should also reflect such an approach. this means that the training should not be limited to better performance only in the problem situations that are trained but should rather manifest in better performance in unknown problem situations. surprisingly, after almost 40 years of research on cps, the inclusion of cps as a competence in educational large-scale assessments (oecd, 2014), as well as a steadily increasing interest in the trainability of cps competence (dörner, 1976; funke, 2003), there is a remarkable lack of research that has provided an understanding and empirical examinations of cps competence training — especially with regard to the transfer of cps skills to unknown problem situations. this is all the more astonishing as more or less explicit recommendations for how to increase an individual’s cps competence and how to change school practices and educational polices in order to foster cps competence have been documented (see oecd, 2014). therefore, the purpose of this study was to reduce this gap in research and to shed some light on the issue of the extent to which cps competence is trainable. corresponding author: andré kretzschmar, eccs unit, university of luxembourg, 11, porte des sciences, 4366 esch-belval, luxembourg; phone: +352-466644-9245; fax: +352-466644-5376. e-mail: kretzsch.andre@gmail.com; orcid id: 0000-0002-7290-1145. 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 1 mailto:kretzsch.andre@gmail.com http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence the flexibility training approach deliberate practice is defined as engagement in activities that are specifically designed to improve performance in a domain (meinz & hambrick, 2010). in this sense, dörner (1989) proposed the flexibility training approach in accordance with the assumption that a cps competence training should also have an effect on untrained complex problems. the aim of the flexibility training is to develop general problem solving knowledge (gpsk) about how to solve complex problems. gpsk (or heuristic knowledge; schaub & strohschneider, 1992) is knowledge about the need to explore the problem (e.g., to acquire situation-specific knowledge), how to conduct interventions (e.g., careful interventions in unstable systems), and how to reach goals (e.g., how to effectively organize a series of interventions). gpsk can therefore be considered meta-problem-solving knowledge that is applicable across different situations. according to dörner and other reseraches, gpsk can be developed by gaining experience with different problem situations (dörner, 1976; schaub & strohschneider, 1992; strohschneider, 1990). this means that comprehensive experience with problem situations involving heterogeneous demands leads to the successive abstraction of problem solving procedures. due to this abstraction process, it can be assumed that problem solving knowledge will be less dependent on concrete problem situations and, thus, generally applicable (weinert & waldmann, 1988). for example, imagine that you have just bought a computer with a new operating system (e.g., ubuntu linux). a problem situation may develop if you want to install a software program. your (situation-specific) knowledge about how to install new software in your old operating system (e.g., microsoft windows) would be less useful because the procedures for installing software differ between the two operating systems. therefore, an appropriate way to proceed would be first to acquire knowledge about the operating system’s installation procedure by consulting the help pages. consulting the help pages or, more generally, knowing how to acquire essential information about a problem is an example of gpsk. experience with different problem situations would lead to the gpsk: “if i do not know the specific procedure for solving the problem, then i should use the support that is offered to get the information.” consequently, if your office computer has another operating system (e.g., os x), neither your specific knowledge about linux nor about microsoft windows will be sufficiently helpful. however, the gpsk of using the help pages will increase your chances of successfully installing new software on your office computer too. moreover, having experience with different operating systems would also increase a person’s knowledge about common solutions (i.e., how to apply knowledge about software installation). although the specific procedures differ between the operating systems, the problem solver might realize the similarity of the principal steps (e.g., start the procedure, configure some features, check the success of the procedure). experience with different problem situations should, therefore, increase gpsk in numerous ways. however, such experience might not be limited to the contexts of software installation in different operating systems but should also be useful in different contexts (e.g., using an unfamiliar mobile phone). basically, dörner’s (1989) flexibility training approach follows the assumption that a person’s experience with different complex problem situations will help the person develop a higher gpsk, which leads to a better cps performance in an unknown problem. however, as always in training contexts, the essential question here is how to teach gpsk, especially with regard to its transferability to unknown problem situations. previous research has shown that the direct teaching of gpsk (e.g., general problem solving strategies) is rather unrewarding, whereas a learning-by-doing approach (anzai & simon, 1979) seems to be generally more efficient (e.g., friedrich & mandl, 1992; putz-osterloh, 1988; stern, 1993). consequently, the flexibility training should be primarily based on direct interactions between the problem solver and different complex problem situations (i.e., learning-by-doing). in summary, the flexibility training approach followed the principle of developing gpsk by gaining experience with different problem situations. whereas specific problem solving knowledge (e.g., installing a software program in a new operating system) is less advantageous in unknown situations, the use of gpsk (e.g., how to acquire knowledge and follow the common procedures) promotes the solving of new and unknown problem situations. previous empirical findings on cps competence training the impact of experience on complex problem situations has been examined with expert-novice comparisons (e.g., putz-osterloh, 1987; schaub & strohschneider, 1992). however, to our knowledge, no studies have aimed to investigate the training of general cps competence (i.e., the development of gpsk). in fact, the vast majority of studies that have had the goal of increasing cps competence have used only a single complex problem situation without considering whether competence in cps could be transferred to novel problem situations. the consistent, albeit quite trivial, finding of these studies is that people who are trained in a specific problem situation perform better when confronted with the same one (funke, 2006). unfortunately, such studies cannot contribute to answering the question of whether cps competence is trainable. the effects of practice have been shown several times in different contexts (e.g., kulik, kulik, & bangert, 1984), but this does not necessarily imply an improved competence. with regard to the few studies that have effectively focused on effects that can be successfully transferred to new problem situations, such studies have used only two problem situations (i.e., one for the training and one for evaluating the transfer) with limited training time (bakken, 1993; jensen, 2005; putz-osterloh & lemme, 1987). for example, in jensen’s (2005) study, people were trained with the rabbit-and-fox task, and the transfer performance was evaluated with the reindeer-and-lichen task. the results indicated that people were able to learn from experience with complex problem situations and use that knowledge in a new problem situation (jensen, 2005). however, the two tasks were quite similar, and thus, the results could not be used to determine whether the participants had developed gpsk that could be applicable to a less similar problem situation or whether they had acquired only taskspecific knowledge instead. some indirect evidence for the development of gpsk through experience has come from research on cps assess1there are several synonyms for cps in the literature, for example, dynamic decision making (e.g., gonzalez, thomas, & vanyukov, 2005), interactive problem solving (e.g., fischer et al., 2015), dynamic problem solving (e.g., greiff, wüstenberg, & funke, 2012), and creative problem solving (e.g., oecd, 2014). 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 2 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence ment tools using the multiple-item approach (e.g., microdyn; greiff et al., 2012). these tools evaluate the use of exploratory behavior in a sequence of more or less different problem situations. with these tools, problem solvers are given no feedback concerning their exploration behavior and learn only from their experience with the tasks. the empirical findings (e.g., schweizer, wüstenberg, & greiff, 2013; wüstenberg et al., 2012) have clearly shown an increase in the use of the specific exploration strategy votat (i.e., vary one thing at a time; vollmeyer, burns, & holyoak, 1996) across the task sequences. although the use of votat was not taught during the assessment and no feedback was provided, problem solvers discovered the advantages of that strategy and used it more often at the end than at the beginning of the assessment. however, the tasks presented in microdyn are again highly similar, and thus, it is unknown whether problem solvers would be able to apply the knowledge they learned (i.e., use of the votat strategy) to solve other, less familiar cps tasks. to sum up, the previous findings point to the possibility of learning through experience with problem situations as well as to the ability to transfer one’s experience to similar complex problem situations. however, the question of whether experience with different problem situations leads to an improved cps competence that is applicable to new and unknown problem situations is still open. the present study the aim of the current study was to examine the extent to which cps competence is trainable. therefore, we chose dörner’s (1989) flexibility training approach, which can theoretically be applied to develop gpsk. we specifically hypothesized that problem solvers who were allowed to gain experience from different problem situations would perform better in an unknown complex problem situation than a control group. method participants one hundred fifty-nine students from a german university participated in the experimental study. participants in both the training and control groups were given the same incentives. in detail, psychology students were given partial credit for course requirements, whereas all other participants took part in a book raffle. furthermore, all were given individual feedback on their performance. one hundred ten students completed the entire training and were included in the analyses. a screening of the available information indicated selective dropout. in detail, nonparticipants were mainly nonpsychology students (92%). this might indicate that the incentive of partial credit for course requirements was stronger than the book raffle incentive. further evidence of selective dropout came from significantly better performances on some cognitive measures by the participants in comparison with the nonparticipants. the possible consequences of this selection process are discussed below. of the final sample, 47% studied psychology, 22% mechanical engineering, 17% economics, and the rest another field of study. the mean age of the final sample was 23.28 years (sd = 4.01), and 49% were female. gender was equally distributed (50% female) within each group. design and general procedure participants were equally recruited, meaning that the study was described as having a study length of up to 12 hr for all participants. half of the participants completed the training, whereas the others provided a no-contact control group. participants were randomly allocated to the training or control group when they registered for the study. with regard to group equivalence, we screened for important determinants of cps competence (e.g., bühner, kröner, & ziegler, 2008; greiff, kretzschmar, müller, spinath, & martin, 2014; süß, 1996; wittmann & hattrup, 2004). therefore, we used several subtests from a comprehensive test of the berlin intelligence structure model (bis test; jäger, süß, & beauducel, 1997; for a description in english, see süß & beauducel, 2015) to measure processing capacity (i.e., reasoning, 9 tasks) and perceptual speed (9 tasks). three different tasks from oberauer, süß, wilhelm, & wittmann (2002) and sander (2005) were used to ascertain working memory capacity. in addition, we assessed computer skills with the short version of the computer knowledge questionnaire startc (wagener, 2007), a questionnaire on computer experience, and a computer-based simple reaction-time task by sander (2005). finally, we administered a new questionnaire (18 multiple-choice items) to gather prior domain-specific knowledge of the complex problem situation that was used to evaluate the success of the training. each group was tested separately. in the first session, participants filled out a questionnaire asking for personal data and were tested for baseline measures in groups of 10 to 20 persons. in the second session, the control group (on the following day) and the training group (after 1 week) were tested for cps and domainspecific knowledge. all participants spent about 2.5 hr in each test session. training intervention phase material. on the basis of the flexibility training approach, we used five different microworlds (i.e., computer-based complex problem solving situations) as training tools in order to provide problem-situation demands that were as heterogeneous as possible. the selection of microworlds was guided by a literature review on training purposes, but no computer program that was specifically designed for training was available. therefore, we consulted the cps research literature to choose the following microworlds, which differed sufficiently among themselves (as a prerequisite for the flexibility training) and with regard to the preselected microworld for the training evaluation (i.e., no equal semantic embedding, no equal user interface, etc.). in general, there is not yet a convincing way to objectively compare microworlds. wagener (2001) 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 3 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence provided a very elaborate and comprehensive framework with 43 features to classify a broad range of microworlds. unfortunately, most of the features are not applicable to each microworld and, even more important, the classification is rather subjective. however, table 1 provides an overview of the formal features that are roughly based on wagener’s framework in order to provide additional information about the microworlds we chose and their comparability. colorsim. the microworld colorsim (kluge, 2008; see figure 1) functioned as the initial simulation and was intended to get participants started. it is easy to understand and, thus, it provided a good demonstration of the basic principles of solving complex problem situations. colorsim has no real-world embedding. problem solvers have to manipulate three slide switches to reach the target values of three output parameters. the different levels of the microworld are implemented through an increasingly complex network comprised of the different variables. we used the easiest level as well as a moderate level according to kluge (2008). in our study, each level consisted of nine tasks. colorsim is turn-based and does not have a strict time limit. k4. the aim of k4 (wagener, 2001; see figure 2) is to manage a publishing house, in particular, to produce and sell magazines. thus, the problem solver has to control the price, the quality, the circulation, and so forth depending on demand and customer satisfaction. this microworld has three levels realized through the numbers of variables and manipulable parameters and the interconnections between them. we used all three levels; a level took between 15 and 75 min. powerplant. the microworld powerplant (wallach, 1998; see figure 3) provides a realistic simulation of a coal-fired power station. problem solvers have to control the system by manipulating two actuating elements: the supply of coal and the opening of a steam valve. they have to follow a target demand energy curve. the difficulty between the sessions changes with different target profiles of the demand curve. in our study, each session consisted of an introductory phase (15 decisions) followed by a performance phase of approximately 20 min. tailorshop. we used the version of this microworld presented by süß (1996; see figure 4). in the tailorshop, problem solvers have to obtain economic success by manipulating prices, buying raw material, controlling wages, and so forth. two different tailorshop start conditions were used. in the current study, each of the two sessions consisted of 2 (simulated) months of tutorials, an exploration phase of 30 min, and a turn-based performance phase of 12 months. networked fire chief. in networked fire chief (omodei & wearing, 1998; see figure 5), problem solvers have to fight fires and coordinate their task forces. the task is to move units to the location of the fire while simultaneously managing water resources. for this study, we developed a new level that considered the special demands of cps (e.g., assigning priorities, observing different targets). the duration of the level was approximately 20 min. procedure. at the end of the first session, people in the training group received a short introduction to the technical details of the computer training and completed the training individually at home. table 2 shows the training plan, consisting of 10 sessions with five different microworlds. each microworld had to be handled at least twice with each including an exploration phase and a performance phase. in the exploration phase, the problem solvers had no taskrelated objectives. thus, subjects had the opportunity to acquire task-related knowledge, check out different strategies, learn from their mistakes, and evaluate the process without the pressure to perform (see osman, 2010; vollmeyer & funke, 1999). no additional hints or feedback were given during the training phase; that is, the training was exclusively based on learning-bydoing (see anzai & simon, 1979). furthermore, the training was designed with increasing difficulty and complexity and with an alternating presentation of microworlds (see goettl, yadrick, connolly-gomez, regian, & shebilske, 1996). the training plan was designed with a time period of 1 week and a daily training between 40 and 90 min. on average, participants spent about 6.5 hr on the training. however, this is only a rough estimate of the total training time because the training software (see below) did not provide precise time-on-task information. we designed a training system that permitted the participants to complete all of the training sessions at home via the internet at any time. to do this, every trainee received a training manual with general and software-related information describing the basic functions used in each microworld. for every training session, the participants had to log into an online environment with their personal log-in data2. next, a software program that coordinated and recorded the entire training automatically started a microworld according to the individual’s training progress. no training session could be skipped or repeated, but under certain circumstances (e.g., a disrupted internet connection), it was possible to start only the previous session again. after completing the last session, no further training was possible. 2the microworlds used for the present training were initially developed to run on a local computer system. in order to provide training that could be applied online, we emulated such a local computer system with the help of virtual machines. participants used the remote desktop protocol (rdp) to connect to the virtual machines via the internet. 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 4 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence figure 1. screenshot of the colorsim microworld. figure 2. screenshot of the k4 microworld. 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 5 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence figure 3. screenshot of the powerplant microworld. figure 4. screenshot of the tailorshop microworld. 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 6 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence figure 5. screenshot of the network fire chief microworld. table 1. selection of formal features of microworlds roughly based on wagener’s (2001) framework. features colorsim k4 powerplant tailorshop networked fire chief fsys semantic embedding no/abstract publishing company/ management energy production/ engineering clothing factory/ management fire fighting forestry/ management impact of prior knowledge no moderate low-moderate moderate low low content presentation numerical, in part figural numerical figural, in part numerical numerical figural numerical, in part figural turn-based yes yes no yes no yes time limit no no yes yes yes no real-time simulation no no no no yes no number of variables 6 23/31/56 11 24 85 connections between variables linear, multiplicative, logistic linear, multiplicative, logistic differential equations linear, exponential linear, exponential, logistic eigendynamics yes yes no yes, exponential yes yes time delay of feedbacks yes yes yes no, exponential no yes hidden/indirect effects yes yes yes yes, exponential yes yes random influences no no no yes (pseudo) no no note. impact of prior knowledge was estimated. all other information is based on the literature and test descriptions. 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 7 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence training evaluation phase material. we used the microworld fsys 2.0 (wagener, 2001; see figure 6) to measure the training outcome. fsys is a reliable and well-validated cps competence assessment tool that is based on dörner et al.’s (dörner, 1986; dörner et al., 1983) theoretical complex problem solving framework. the semantic embedding of fsys is composed of a forest enterprise. however, to achieve good performance, no previous knowledge is required because the program uses fantasy names, an integrated information system, and very general cause-effect relations. the aim was to achieve economic success within 50 simulated months. thereby, the problem solver had to manage five different wooded areas by planting and lumbering trees, fertilizing, and fighting vermin. wagener and wittmann (2002) demonstrated that fsys offered incremental predictive validity beyond general intelligence with regard to job-related performance indicators (e.g., inbasket exercises, case studies). stadler, becker, greiff, and spinath (2015) reported that fsys explained incremental variability in university grade point average when they controlled for high-school gpa and a short test of general intelligence, even though their sample size was rather small. furthermore, we used wagener’s (2001) questionnaire to assess fsys-specific knowledge after the participants completed the 50 simulated months. the questionnaire addressed all relevant fields in the microworld and was composed of 11 multiple-choice questions of factual and action knowledge. for each question, five different answer options were provided (i.e., four distractors in addition to the right answer). example questions are: "which tree has the highest yield?" and "a forest is infested by vermin xy. which procedure would you apply?" wagener (2001) reported an internal consistency of .41, but this is an inaccurate estimate of the reliability because of the heterogeneity of the scale. dependent variables. the total score on the fsysspecific knowledge test (wagener, 2001) was used as an indicator of the process of knowledge acquisition. each item was scored dichotomously, resulting in a sum score that ranged from 0 to 11. for the process of knowledge application, we used the control performance in fsys3, that is, the total amount of property amassed by the end of the simulation (original name skapkor). according to the standard scoring procedure, the property value was logistically transformed into a scale ranging from 0 to 100 such that higher values indicated better performance (wagener, 2001). wagener (2001) reported substantial manifest correlations (r = .41 to .44) between knowledge acquisition (i.e., knowledge test) and knowledge application (i.e., control performance) for fsys. the correlation in the present study was comparable (r = .53, p < .001) and fell within the range commonly obtained for these two processes (goode & beckmann, 2010). the cps processes of knowledge acquisition and knowledge application were analyzed separately to examine potential differential effects of the training. results an alpha level of .05 was used for all statistical tests. all significance tests were one-tailed except when noted otherwise. in addition to the significance levels, cohen’s (1988) effect sizes are reported. the sample size was adequate for detecting at least medium-sized effects (d = 0.5) with a power of .80 for simple mean comparisons (faul, erdfelder, lang, & buchner, 2007). participants’ gender (50% female within each group), age, perceptual speed, working memory capacity, computer skills, and domain-specific knowledge did not differ between the two groups (ts ≤ 1.5, ps, two-tailed > .10). however, there was a significant difference in reasoning, t(108) = 2.65, ptwo-tailed = .01, d = 0.51, with superior performance in the training group. therefore, group equivalence was not completely achieved, and all further analyses were additionally run with reasoning as a covariate. the descriptive statistics are shown in table 3. we had expected better cps performance in the training group than in the control group. we first analyzed cps competence with regard to the process of knowledge acquisition. the training group (m = 5.91, sd = 1.93) showed better performance than the control group (m = 4.89, sd = 1.79). the difference was statistically significant, t(108) = 2.87, pone-tailed < .01, with a moderate effect size (d = 0.55). the results did not change even when reasoning was controlled for. in a second step, we analyzed cps competence with regard to the process of knowledge application. the two groups showed almost identical performance in controlling fsys. the mean score for the training group was 58.36 (sd = 20.77) and was 57.14 (sd = 24.86) for the control group. thus, there was no statistically significant difference, t(108) = 0.28, pone-tailed = .39, d = 0.05. again, controlling for reasoning did not change the findings. in summary, our findings only partly supported our hypothesis. after the flexibility training, the training group participants were significantly better at acquiring knowledge in comparison with the control group participants. however, no significant difference was found in knowledge application (i.e., in solving the problem). discussion solving nonroutine problems is highly relevant in our rapidly changing world, and consequently, cps competence plays an important role, especially in the educational context (oecd, 2014). although cps competence is considered to be trainable (oecd, 2014), 3 fsys also provides additional behavioral scales that were not used in this study. most of the behavioral scales did not exhibit acceptable psychometric quality, and thus, we could not interpret them without reservations (see wagener & conrad, 2001). 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 8 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence figure 6. screenshot of the tailorshop microworld. table 2. training program. session microworld version/level of difficulty estimated duration 1 colorsim 1 35 2 colorsim 2 45 3 k4 1 35 4 powerplant a 30 5 tailorshop a 90 6 k4 2 60 7 powerplant b 20 8 fire chief — 20 9 k4 3 75 10 tailorshop b 60 note. each session was composed of an exploration phase that did not involve a performance evaluation and a performance phase. the letters a and b indicate different versions; the numbers indicate the level of difficulty. the estimated duration in minutes is based on information from the associated literature. 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 9 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence table 3. descriptive statistics (means, confidence intervals, standard deviations) for both groups. measure control group training group m 95% ci sd m 95% ci sd ω age 23.96 [22.73, 25.19] 4.55 22.80 [21.90, 23.70] 3.34 — reasoning -0.14 [-0.30, 0.01] 0.58 0.13 [0.00, 0.27] 0.51 .88 perceptual speed -0.07 [-0.25, 0.10] 0.65 0.10 [-0.07, 0.27] 0.63 .79 working memory capacity 0.04 [-0.50, 0.57] 1.97 0.58 [0.07, 1.09] 1.88 .891 computer knowledge 18.36 [17.37, 19.36] 3.67 18.71 [17.73, 19.68] 3.6 .79 computer experience in years 10.27 [9.5, 11.04] 2.84 10.51 [9.54, 11.47] 3.57 — simple reaction-time 255.08 [249.92, 260.23] 19.07 256.24 [251.10, 261.38] 19.01 .92 domain-specific prior knowledge 8.18 [7.52, 8.85] 2.46 8.40 [7.56, 9.24] 3.09 .59 cps: knowledge acquistion 4.89 [4.41, 5.38] 1.79 5.91 [5.39, 6.43] 1.93 .53 cps: knowledge application 57.14 [50.42, 63.86] 24.86 58.36 [52.74, 63.97] 20.77 — note. for reasoning (z scores), perceptual speed (z scores), working memory capazity (z scores), computer knowledge and domainspecific prior knowledge, the sum scores of the single tasks were used. for simple reaction-time the mean score was used. 1only for the subtask reading span; for the other two tasks (dot span and memory updating numerical) a single total score was used. ω: mcdonald’s omega. empirical support for this supposition has been rather weak. the purpose of the current study was to deepen the understanding of the extent to which cps competence can be improved through a training intervention. the findings of this study showed that cps competence was only partly influenced by flexibility training. whereas training significantly improved the cps process of acquiring relevant knowledge about the problem situation, the process of knowledge application was not affected. aside from the differential effects of training on the two cps processes, the general implications for the trainability of cps competence will be discussed. according to fischer, greiff, & funke (2012), cps involves at least two consecutive processes. first, the problem solver has to acquire problem-specific knowledge (e.g., how to install software in a new operating system), and second, the problem solver has to apply this knowledge in order to solve the problem (e.g., complete the software installation process). apparently, handling different problem situations improves the first step involved in solving complex problem situations. that is, in our study, trained problem solvers were able to acquire more knowledge about an unknown problem situation. according to the flexibility training approach, this finding can be interpreted as improvements in gpsk concerning the utility of knowledge acquisition. when frequently confronted with different problem situations where no prior knowledge is applicable, the problem solver recognizes that he or she must acquire knowledge about a problem in order to solve it. moreover, processes through which people can effectively acquire knowledge (e.g., different exploration strategies such as votat; vollmeyer et al., 1996) might also be improved through experience with different problem situations. this issue is especially important when considering problem solving in real life. real-life problems differ widely and, thus, situation-specific knowledge might be not available for each problem (e.g., an unfamiliar mobile phone or a new operating system). however, when problem solvers can rely on gpsk, their chances of acquiring situation-specific knowledge that will help them solve the problem (e.g., through comprehensive exploration) increase. in this sense, flexibility training can be considered successful with regard to the prerequisite of actually solving complex problems, that is, acquiring knowledge about an unknown problem situation. on the other hand, gpsk consisting of how to apply the acquired information in order to solve the problem does not seem to be improved by the training. this finding is crucial when evaluating cps training in general. although acquiring relevant information is necessary for solving an unknown problem, the actual goal is to solve the problem. returning to the introductory example, knowing how to obtain information that explains how to install a software program but to still be unable to actually install it will not lead to a satisfactory outcome. the reason for the lack of impact from the cps training might lie in different issues. in general, the flexibility training is aimed at improving cps competence independent of a particular problem situation in terms of gpsk. however, it might be the case that the process of knowledge application is rather situation-specific. this means that every problem situation primarily requires specific knowledge about how to act in the situation, and thus, gpsk that is independent from the actual problem situation might play only a minor role in a person’s ability to actually solve the problem. the importance of previous knowledge in a specific problem situation (see süß, 1996) might support this perspective. furthermore, our flexibility training was less focused on addressing the special demands of knowledge application (e.g., careful interventions in unstable systems). in fact, trainees were encouraged to extensively explore each problem situation during the training to understand how to solve the problem. although the process of knowledge application (i.e., actually solving the problem) was covered in each training session, it is unknown whether trainees primarily used the per10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 10 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence formance phase to solve the problem (e.g., to be a successful tailorshop manager) or whether they used the performance phase to also learn more about the problem situation. the latter would mainly lead to an increase in gpsk related to the process of knowledge acquisition rather than the process of knowledge application. in this respect, it is important to note that the training was an unguided training; that is, we provided no feedback, no explicit teaching of problem solving strategies, or any similar guidance. instead, the training was completely based on the learningby-doing approach (anzai & simon, 1979). a more guided training (e.g., emphasizing training goals, problem solving phases, and processes, etc.) might lead to an increase in the training effect also for the process for knowledge application. therefore, training studies that are tailored to address the specific demands of applying knowledge combined with a more explicitly guided training approach might be able to shed some light on the question of whether gpsk can be improved with respect to the problem solver’s ability to actually solve the given problem. another explanation for the findings can be found in assessments of the cps processes. the assessment of knowledge application often has limitations in terms of reliability; that is, the assessment using microworlds is actually a single-item measure with limited reliability (beckmann & goode, 2014; greiff et al., 2012; süß, 1999). although the estimated internal consistency of fsys is rather high (.80; wagener, 2001), its test-retest reliability and parallel-test reliability remain unknown. thus, limitations in the reliability of fsys might prohibit the ability to use it to detect the success of the training with respect to the process of knowledge application. recently developed measurement tools focusing in particular on psychometric criteria (e.g., neubert, kretzschmar, wüstenberg, & greiff, 2014; sonnleitner et al., 2012) might be used to resolve this issue as they provide reliable scores for knowledge acquisition and knowledge application. limitations and recommendations for further research some limitations of this study need to be discussed, especially with respect to future cps training studies. first, the no-contact control group did not complete any pseudo-training. we had expected that pseudotraining would not have affected the training outcome. the few previous training studies had not shown substantial transfer effects regardless of whether participants had received training of any kind or a control intervention. furthermore, the intensive use of computers by many students has become commonplace (prensky, 2001), which was confirmed by the analyses of the background questions used in this study. thus, we did not expect training to produce any improvements due solely to the use of computers. finally, both groups were informed up front that the study duration could be up to 12 hr so that the motivation in both groups could be assumed to be equal. in summary, an additional intervention for the control group seemed dispensable, but it might be beneficial to include an active control group in future research. furthermore, despite the use of random assignment, group equivalence was not achieved. the training group outperformed the control group in reasoning, which has repeatedly been shown to be a substantial predictor of cps performance (e.g., kretzschmar, neubert, wüstenberg, & greiff, 2016; sonnleitner et al., 2013; süß, 1996; wittmann & süß, 1999; wüstenberg et al., 2012). although we statistically controlled for the difference in reasoning on the pretest, we have to consider that the training group may have benefitted from their higher reasoning ability (matthew effect; e.g., walberg & tsai, 1983). however, we can only speculate about the reasons for the differences between the groups. although random assignment to the training or control group was applied, it does not ensure group equivalence, especially when sample sizes are small or moderate (e.g., saint-mont, 2015). thus, we have to consider the possibility that even additional group differences (e.g., in motivation) may have influenced participants’ performance in our study. another issue involves the sample characteristics. the participants were recruited from a sample with above-average cognitive performance. although university students are often used in psychological studies, such findings should be generalized only with caution (henrich, heine, & norenzayan, 2010). in fact, the selective dropout in the present study even reduced the heterogeneity of the sample with notable consequences. that is, the effect of the present training may have been underestimated due to a ceiling effect for the participants with above-average cognitive abilities. therefore, it is possible that the effect of the training would be stronger in a less selective sample. future cps research should aim to avoid such selection biases in order to capture the full range of cps competence in a variety of different contexts and populations. finally, the assessment of cps competence in general needs to be discussed. typical cps assessment tools as used in this study indicate whether a problem solver has acquired specific knowledge about a problem situation (i.e., the cps process of knowledge acquisition) or whether a goal was reached (i.e., the cps process of knowledge application). but strictly speaking, participants’ behavior in the complex problem situation goes beyond these two core processes (see dörner, 1986. in fact, in addition to the processes of knowledge acquisition and knowledge application, more specific processes are also considered important for solving complex problems. some of these processes consist of engaging in strategic exploration to gather information, reducing and integrating information into a representation of knowledge, anticipating future developments and making plans, setting priorities and balancing goals, or evaluating and modifying problem solving behavior (e.g., fischer et al., 2012; funke, 2001; greiff & fischer, 2013; wagener, 2001). although there are already theoretical considerations about how to fur10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 11 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence ther develop the assessment of cps to gain deeper insights into cps competence (e.g., greiff & fischer, 2013), its practical implementation is still outstanding. in fact, more research on the evaluation of cps performance beyond the two processes of knowledge acquisition and knowledge application is still needed (e.g., based on logfile analyses). moreover, no direct measurement of gpsk was used in this study. that is, we only assumed that a better knowledge acquisition performance was based on a higher gpsk, but strictly speaking, we do not know whether this is truly the case. as a consequence, possible effects of the flexibility training used in the present study beyond the two cps processes of knowledge acquisition and knowledge application (e.g., whether trainees had an increased gpsk, resulting in a larger number of tested hypotheses than participants without training) could not be evaluated. future cps training studies will therefore benefit considerably from the further development of cps assessment tools that can capture additional and more specific cps processes (e.g., müller, kretzschmar, & greiff, 2013; wüstenberg, stadler, hautamäki, & greiff, 2014). furthermore, in order to examine whether the increase in cps competence was based on gpsk, future studies should (develop and) include a corresponding measure. conclusion as noted earlier, the trainability of cps competence is an exciting issue and has become even more important since cps competence was added to international educational large-scale assessments. however, it is not sufficient to theoretically assume its trainability and to give advice about how to improve cps competence (e.g., oecd, 2014) if the empirical evidence is still missing. this study aimed to shed some light on the issue of the extent to which cps is trainable. the improvement of cps competence through experience with different problem situations might be possible. however, differential effects concerning the different cps processes have to be considered. if cps competence training were found to be effective only for the cps process of knowledge acquisition (i.e., obtaining relevant information about an unknown problem) but not for the process of knowledge application (i.e., actually solving the problem), then it will be highly questionable whether cps competence training would improve problem solving in real life (i.e., where the aim is to buy a ticket, not just to know about the functions of a ticket machine). most important, further studies are needed to deepen our knowledge, especially in terms of the long-term effects of the training, the extent to which the training can be transferred to real-world problems, and the determinants of training success. in this respect, some crucial points for future research were highlighted in this study. acknowledgements: we thank dietrich wagener, anette kluge, as well as mary m. omodei for providing the software programs and eigbert riewald for his remarkable technical support. furthermore, we thank maik böttcher, samuel greiff, marcus mund, and cornelia vogt for their helpful comments on earlier versions of this article. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: the authors contributed equally to this work. supplementary material: the data are publicly available via the open science framework and can be accessed at https://osf.io/n2jvy. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: kretzschmar, a. & süß, h.-m. (2015). a study on the training of complex problem solving competence. journal of dynamic decision making, 1, 4. doi:10.11588/jddm.2015.1.15455 received: 07 august 2015 accepted: 03 december 2015 published: 12 december 2015 references anguera, j. a., boccanfuso, j., rintoul, j. l., al-hashimi, o., faraji, f., janowich, j., & gazzaley, a. (2013). video game training enhances cognitive control in older adults. nature, 501(7465), 97–101. doi: 10.1038/nature12486 anzai, y., & simon, h. a. (1979). the theory of learning by doing. psychological review, 86, 124–140. doi: 10.1037/0033295x.86.2.124 bakken, b. e. (1993). learning and transfer of understanding in dynamic decision environments. massachusetts institute of technology. beckmann, j. f., & goode, n. (2014). the benefit of being naïve and knowing it: the unfavourable impact of perceived context familiarity on learning in complex problem solving tasks. instructional science, 42(2), 271–290. doi: 10.1007/s11251-013-9280-7 boot, w. r., blakely, d. p., & simons, d. j. (2011). do action video hames improve perception and cognition? frontiers in psychology, 2, 226. doi: 10.3389/fpsyg.2011.00226 bühner, m., kröner, s., & ziegler, m. (2008). working memory, visual–spatial-intelligence and their relationship to problem-solving. intelligence, 36, 672–680. doi: 10.1016/j.intell.2008.03.008 cohen, j. w. (1988). statistical power analysis for the behavioral sciences. hillsdale, nj: erlbaum. 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 12 https://osf.io/n2jvy http://dx.doi.org/10.1038/nature12486 http://dx.doi.org/10.1037/0033-295x.86.2.124 http://dx.doi.org/10.1037/0033-295x.86.2.124 http://doi.org/10.1007/s11251-013-9280-7 http://doi.org/10.3389/fpsyg.2011.00226 http://doi.org/10.1016/j.intell.2008.03.008 http://doi.org/10.1016/j.intell.2008.03.008 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence dörner, d. (1976). problemlösen als informationsverarbeitung [problem solving as information processing]. stuttgart: kohlhammer. dörner, d. (1986). diagnostik der operativen intelligenz [assessment of operative intelligence]. diagnostica, 32, 290–208. dörner, d. (1989). die logik des mißlingens: strategisches denken in komplexen situationen [the logic of failure: recognizing and avoiding error in complex situations]. reinbek bei hamburg: rowohlt. dörner, d., kreuzig, h. w., reither, f., & stäudel, t. (1983). lohhausen: vom umgang mit unbestimmtheit und komplexität [lohhausen: dealing with uncertainty and complexity]. bern: huber. faul, f., erdfelder, e., lang, a.-g., & buchner, a. (2007). g* power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. behavior research methods, 39(2), 175–191. doi: 10.3758/bf03193146 fischer, a., greiff, s., & funke, j. (2012). the process of solving complex problems. journal of problem solving, 4(1), 19–41. doi: 10.7771/1932-6246.1118 fischer, a., greiff, s., wüstenberg, s., fleischer, j., buchwald, f., & funke, j. (2015). assessing analytic and interactive aspects of problem solving competency. learning and individual differences, 39, 172–179. doi: 10.1016/j.lindif.2015.02.008 frensch, p. a., & funke, j. (1995). definitions, traditions, and a general framework for understanding complex problem solving. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective. (pp. 3–25). hillsdale, nj: erlbaum. friedrich, h. f., & mandl, h. (1992). lernund denkstrategien ein problemaufriß [learning and thinking strategies a view on the problem]. in h. f. friedrich & h. mandl (eds.), lernund denkstrategien (pp. 3–54). göttingen: hogrefe. funke, j. (ed.). (1999). themenheft “komplexes problemlösen” [special issue: complex problem solving]. psychologische rundschau, 50(4). funke, j. (2001). dynamic systems as tools for analysing human judgement. thinking & reasoning, 7(1), 69–89. doi: 10.1080/13546780042000046 funke, j. (2003). problemlösendes denken [problem-solving thinking]. stuttgart: kohlhammer. funke, j. (2006). komplexes problemlösen [complex problem solving]. in j. funke (ed.), denken und problemlösen (pp. 375–446). göttingen: hogrefe. goettl, b. p., yadrick, r. m., connolly-gomez, c., regian, j. w., & shebilske, w. l. (1996). alternating task modules in isochronal distributed training of complex tasks. human factors: the journal of the human factors and ergonomics society, 38, 330–346. doi: 10.1177/001872089606380213 gonzalez, c., thomas, r. p., & vanyukov, p. (2005). the relationships between cognitive ability and dynamic decision making. intelligence, 33(2), 169–186. doi:10.1016/j.intell.2004.10.002 goode, n., & beckmann, j. f. (2010). you need to know: there is a causal relationship between structural knowledge and control performance in complex problem solving tasks. intelligence, 38(3), 345–352. doi: 10.1016/j.intell.2010.01.001 green, c. s., & bavelier, d. (2003). action video game modifies visual selective attention. nature, 423(6939), 534–537. doi: 10.1038/nature01647 greiff, s., & fischer, a. (2013). measuring complex problem solving: an educational application of psychological theories. journal for educational research online, 5(1), 38–58. retrieved from: http://nbn-resolving.de/urn:nbn:de:0111-opus-80196 greiff, s., kretzschmar, a., müller, j. c., spinath, b., & martin, r. (2014). the computer-based assessment of complex problem solving and how it is influenced by students’ information and communication technology literacy. journal of educational psychology, 106(3), 666–680. doi: 10.1037/a0035426 greiff, s., wüstenberg, s., & funke, j. (2012). dynamic problem solving: a new assessment perspective. applied psychological measurement, 36, 189–213. doi: 10.1177/0146621612439620 greiff, s., wüstenberg, s., molnár, g., fischer, a., funke, j., & csapó, b. (2013). complex problem solving in educational contexts—something beyond g: concept, assessment, measurement invariance, and construct validity. journal of educational psychology, 105, 364–379. doi: 10.1037/a0031856 henrich, j., heine, s. j., & norenzayan, a. (2010). the weirdest people in the world? behavioral and brain sciences, 33(2-3), 61–83. doi: s0140525x0999152x jäger, a. o., süß, h.-m., & beauducel, a. (1997). berliner intelligenzstruktur-test. form 4 [berlin intelligence-structure test. version 4]. göttingen: hogrefe. jensen, e. (2005). learning and transfer from a simple dynamic system. scandinavian journal of psychology, 46, 119–131. doi: 10.1111/j.1467-9450.2005.00442.x kluge, a. (2008). what you train is what you get? task requirements and training methods in complex problemsolving. computers in human behavior, 24, 284–308. doi: 10.1016/j.chb.2007.01.013 kretzschmar, a., neubert, j. c., & greiff, s. (2014). komplexes problemlösen, schulfachliche kompetenzen und ihre relation zu schulnoten [complex problem solving, school competencies and their relation to school grades]. zeitschrift für pädagogische psychologie, 28(4), 205–215. doi: 10.1024/1010-0652/a000137 kretzschmar, a., neubert, j. c., wüstenberg, s., & greiff, s. (2016). construct validity of complex problem solving: a comprehensive view on different facets of intelligence and school grades. intelligence, 54, 55-69. doi: 10.1016/j.intell.2015.11.004 kulik, j. a., kulik, c.-l. c., & bangert, r. l. (1984). effects of practice on aptitude and achievement test scores. american educational research journal, 21(2), 435–447. doi: 10.3102/00028312021002435 meinz, e. j., & hambrick, d. z. (2010). deliberate practice is necessary but not sufficient to explain individual differences in piano sight-reading skill: the role of working memory capacity. psychological science, 21(7), 914–919. doi: 10.1177/0956797610373933 müller, j. c., kretzschmar, a., & greiff, s. (2013). exploring exploration: inquiries into exploration behavior in complex problem solving assessment. in d’mello, s. k., calvo, r. a., and olney, a. (eds.), proceedings of the 6th international conference on educational data mining (pp. 336–337). neubert, j. c., kretzschmar, a., wüstenberg, s., & greiff, s. (2014). extending the assessment of complex problem solving to finite state automata: embracing heterogeneity. european journal of psychological assessment, 31(3), 181–194. doi: 10.1027/1015-5759/a000224 neubert, j. c., mainert, j., kretzschmar, a., & greiff, s. (2015). the assessment of 21st century skills in industrial and organizational psychology: complex and collaborative problem solving. industrial and organizational psychology, 8(02), 238–268. doi: 10.1017/iop.2015.14 oberauer, k., süß, h.-m., wilhelm, o., & wittmann, w. w. (2002). the multiple facets of working memory storage, processing, supervision, and coordination. intelligence, 140, 1–27. doi: 10.1016/s0160-2896(02)00115-0 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 13 http://doi.org/10.3758/bf03193146 http://doi.org/10.7771/1932-6246.1118 http://doi.org/10.7771/1932-6246.1118 http://doi.org/10.1016/j.lindif.2015.02.008 http://doi.org/10.1080/13546780042000046 http://doi.org/10.1080/13546780042000046 http://doi.org/10.1177/001872089606380213 http://doi.org/10.1016/j.intell.2004.10.002 http://doi.org/10.1016/j.intell.2010.01.001 http://doi.org/10.1038/nature01647 http://doi.org/10.1038/nature01647 http://nbn-resolving.de/urn:nbn:de:0111-opus-80196 http://doi.org/10.1037/a0035426 http://doi.org/10.1177/0146621612439620 http://doi.org/10.1037/a0031856 http://doi.org/10.1017/s0140525x0999152x http://doi.org/10.1111/j.1467-9450.2005.00442.x http://doi.org/10.1111/j.1467-9450.2005.00442.x http://doi.org/10.1016/j.chb.2007.01.013 http://doi.org/10.1016/j.chb.2007.01.013 http://doi.org/10.1024/1010-0652/a000137 http://doi.org/10.1016/j.intell.2015.11.004 http://doi.org/10.3102/00028312021002435 http://doi.org/10.3102/00028312021002435 http://doi.org/10.1177/0956797610373933 http://doi.org/10.1177/0956797610373933 http://doi.org/10.1027/1015-5759/a000224 http://doi.org/10.1027/1015-5759/a000224 http://doi.org/10.1017/iop.2015.14 http://doi.org/10.1017/iop.2015.14 http://doi.org/10.1016/s0160-2896(02)00115-0 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence oecd. (2014). pisa 2012 results: creative problem solving: students’ skills in tackling real-life problems (volume v). paris: oecd publishing. omodei, m. m., & wearing, a. j. (1998). network fire chief. la trobe university. osman, m. (2010). controlling uncertainty: a review of human behavior in complex dynamic environments. psychological bulletin, 136, 65–86. doi: 10.1037/a0017815 prensky, m. (2001). digital natives, digital immigrants. part i. on the horizon, 9(5), 1–6. doi: 10.1108/10748120110424816 putz-osterloh, w. (1987). gibt es experten für komplexe probleme? [are there experts for complex problems?]. zeitschrift für psychologie, 195, 63–84. putz-osterloh, w. (1988). wissen und problemlösen [knowledge and problem solving]. in h. mandl, h. spada, & h. aebli (eds.), wissenspsychologie (pp. 247–263). münchen: psychologie verlags union. putz-osterloh, w., & lemme, m. (1987). knowledge and its intelligent application to problem solving. german journal of psychology, 11, 286–303. saint-mont, u. (2015). randomization does not help much, comparability does. plos one, 10(7), e0132102. doi: 10.1371/journal.pone.0132102 sander, n. (2005). inhibitory and executive functions in cognitive psychology: an individual differences approach examining structure and overlap with working memory capacity and intelligence. aachen: shaker. schaub, h., & strohschneider, s. (1992). die auswirkungen unterschiedlicher problemlöseerfahrungen auf den umgang mit einem unbekannten komplexen problem [the impact of expertise on the solving of an unknown, complex problem]. zeitschrift für arbeitsund organisationspsychologie, 36, 117–126. scherer, r., & tiemann, r. (2012). factors of problem-solving competency in a virtual chemistry environment: the role of metacognitive knowledge about strategies. computers & education, 59, 1199–1214. doi: 10.1016/j.compedu.2012.05.020 schweizer, f., wüstenberg, s., & greiff, s. (2013). validity of the microdyn approach: complex problem solving predicts school grades beyond working memory capacity. learning and individual differences, 24, 42–52. doi: 10.1016/j.lindif.2012.12.011 sonnleitner, p., brunner, m., greiff, s., funke, j., keller, u., martin, r., & latour, t. (2012). the genetics lab: acceptance and psychometric characteristics of a computer-based microworld assessing complex problem solving. psychological test and assessment modeling, 54(1), 54–72. doi: 10.1037/e578442014-045 sonnleitner, p., keller, u., martin, r., & brunner, m. (2013). students’ complex problem-solving abilities: their structure and relations to reasoning ability and educational success. intelligence, 41(5), 289–305. doi: 10.1016/j.intell.2013.05.002 stadler, m. j., becker, n., greiff, s., & spinath, f. m. (2015). the complex route to success: complex problem-solving skills in the prediction of university success. higher education research & development, 1–15. doi: 10.1080/07294360.2015.1087387 stern, e. (1993). kognitives training: was verändert sich? fragestellungen, methoden und neuere ergebnisse [cognitve training: what changed? issues, methods and recent findings]. in l. montada (ed.), bericht über den 38. kongreß der deutschen gesellschaft für psychologie in trier 1992 (vol. 2, pp. 975–977). göttingen: hogrefe. strobach, t., frensch, p. a., & schubert, t. (2012). video game practice optimizes executive control skills in dual-task and task switching situations. acta psychologica, 140(1), 13–24. doi: 10.1016/j.actpsy.2012.02.001 strohschneider, s. (1990). wissenserwerb und handlungsregulation [knowledge acquisition and action regulation]. wiesbaden: deutscher universitäts-verlag. süß, h.-m. (1996). intelligenz, wissen und problemlösen: kognitive voraussetzungen für erfolgreiches handeln bei computersimulierten problemen [intelligence, knowledge and problem solving: cognitive prerequisites for successful behavior in computersimulated problems]. göttingen: hogrefe. süß, h.-m. (1999). intelligenz und komplexes problemlösen: perspektiven für eine kooperation zwischen differentiellpsychometrischer und kognitionspsychologischer forschung [intelligence and complex problem solving: perspectives for a cooperation between differential psychometric and cognition psychological research]. psychologische rundschau, 50(4), 220–228. doi: 10.1026//0033-3042.50.4.220 süß, h.-m., & beauducel, a. (2015). modeling the construct validity of the berlin intelligence structure model. estudos de psicologia (campinas), 32(1), 13–25. doi: 10.1590/0103166x2015000100002 vollmeyer, r., burns, b. d., & holyoak, k. j. (1996). the impact of goal specificity on strategy se and the acquisition of problem structure. cognitive science, 20(1), 75–100. doi: 10.1207/s15516709cog2001_3 vollmeyer, r., & funke, j. (1999). personenund aufgabenmerkmale beim komplexen problemlösen [person and task effects within complex problem solving]. psychologische rundschau, 50, 213–219. wagener, d. (2001). psychologische diagnostik mit komplexen szenarios taxonomie, entwicklung, evaluation [psychological assessment with complex scenarios taxonomy, development, evaluation]. lengerich: pabst. wagener, d. (2007). start-c. testbatterie für berufseinsteiger computer [test battery for young professionals — computers]. göttingen: hogrefe. wagener, d., & conrad, w. (2001). fsys 2.0. testmanual. university of mannheim. wagener, d., & wittmann, w. w. (2002). personalarbeit mit dem komplexen szenario fsys [human resource management using the complex scenario fsys]. zeitschrift für personalpsychologie, 1, 80–93. walberg, h. j., & tsai, s.-l. (1983). matthew effects in education. american educational research journal, 20(3), 359–373. doi: 10.3102/00028312020003359 wallach, d. (1998). komplexe regelungsprozesse: eine kognitionswissenschaftliche analyse [complex control processes: a cognitive analysis]. wiesbaden: deutscher universitäts-verlag. weinert, f. e. (2001). concept of competence: a conceptual clarification. in d. s. rychen & l. h. salganik (eds.), defining and selecting key competencies (pp. 45–65). seattle, wa: hogrefe. weinert, f. e., & waldmann, m. r. (1988). wissensentwicklung und wissenserwerb [development and acquisition of knowledge]. in h. mandl & h. spada (eds.), wissenspsychologie (pp. 161–202). münchen: psychologie verlags union. wittmann, w. w., & hattrup, k. (2004). the relationship between performance in dynamic systems and intelligence. systems research and behavioral science, 21(4), 393–409. doi: 10.1002/sres.653 wittmann, w. w., & süß, h.-m. (1999). investigating the paths between working memory, intelligence, knowledge, and complex problem-solving performances via brunswik symmetry. in p. l. ackerman, p. c. kyllonen, & r. d. roberts (eds.), learning and individual differences: process, trait and content determinants (pp. 77–104). washington, d.c.: apa. 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 14 http://doi.org/10.1037/a0017815 http://doi.org/10.1108/10748120110424816 http://doi.org/10.1371/journal.pone.0132102 http://doi.org/10.1371/journal.pone.0132102 http://doi.org/10.1016/j.compedu.2012.05.020 http://doi.org/10.1016/j.lindif.2012.12.011 http://doi.org/10.1037/e578442014-045 http://doi.org/10.1016/j.intell.2013.05.002 http://doi.org/10.1080/07294360.2015.1087387 http://doi.org/10.1016/j.actpsy.2012.02.001 http://doi.org/10.1016/j.actpsy.2012.02.001 http://doi.org/10.1026//0033-3042.50.4.220 http://doi.org/10.1590/0103-166x2015000100002 http://doi.org/10.1590/0103-166x2015000100002 http://dx.doi.org/10.1207/s15516709cog2001_3 http://dx.doi.org/10.1207/s15516709cog2001_3 http://dx.doi.org/10.3102/00028312020003359 http://dx.doi.org/10.3102/00028312020003359 http://doi.org/10.1002/sres.653 http://doi.org/10.1002/sres.653 http://dx.doi.org/10.11588/jddm.2015.1.15455 kretzschmar & süß: training of complex problem solving competence wüstenberg, s., greiff, s., & funke, j. (2012). complex problem solving — more than reasoning? intelligence, 40(1), 1–14. doi: 10.1016/j.intell.2011.11.003 wüstenberg, s., stadler, m., hautamäki, j., & greiff, s. (2014). the role of strategy knowledge for the application of strategies in complex problem solving tasks. technology, knowledge and learning, 19(1), 127-146. doi: 10.1007/s10758-014-9222-8 10.11588/jddm.2015.1.15455 jddm | 2015 | volume 1 | article 4 | 15 http://doi.org/10.1016/j.intell.2011.11.003 http://doi.org/10.1016/j.intell.2011.11.003 http://doi.org/10.1007/s10758-014-9222-8 http://dx.doi.org/10.11588/jddm.2015.1.15455 original research valence matters in judgments of stock accumulation in blood glucose control and other global problems cleotilde gonzalez1, maria-isabel sanchez-segura2, german-lenin dugarte-peña2, fuensanta medinadominguez2 1carnegie mellon university, usa and 2 universidad carlos iii de madrid, spain stock-flow failure is a reasoning error in dynamic systems that has great societal relevance: people misjudge a level of accumulation (i.e., stock) considering the information on flows that increase (i.e., inflow) or decrease (i.e., outflow) over time. many interventions, including the use of analogies and graphical manipulations, to counteract this failure and help people integrate the flow information have been tested with little or no success. we suggest that this error relates to the valence of a problem: the framing of the inflow or outflow direction as “good” or “bad” is associated with the direction of its accumulation over time. to explore the effects of valence on accumulation judgments, we employ a scenario of a common health problem: blood glucose control through sugar consumption and insulin flows. we also investigate improvements of performance in a second scenario that result after viewing a video of a dynamic system demonstrating the effects of the correct accumulation trend. we discuss the implications of our findings for the blood glucose example and other global problems. keywords: stock-flow failure, correlation heuristic, system behaviour, valence effect dynamic systems are present in many daily life situa-tions. generally, people rarely even notice that they have the property of changing continuously over time (that is, their dynamics). possible examples of dynamic systems are: population growth, learning processes in supply chains, inventory management, and savings and debt accumulation. while dynamic systems are very common in real life, research into complex problem solving and dynamic decision making have shown that people have great difficulty learning and making decisions with respect to dynamic tasks, even after lengthy practice over long or unlimited time periods and with performance incentives (diehl & sterman, 1995; frensch & funke, 1995). to understand these difficulties, researchers have relied on simple abstractions of complex systems in order to study the basic elements of every dynamic system (booth sweeney & sterman, 2000; cronin & gonzalez, 2007): stocks and flows (inflow, outflow). inflow is the kind of flow that adds units to the stock, and outflow is the kind of flow that subtracts units from the stock. a stock increase occurs only if the inflow rate is higher than the outflow rate, and a stock decrease occurs if the inflow rate is lower than the outflow rate. this relationship between flows and stocks over time defines a behavioural pattern that is often represented in an x-y plot. in fact, graphs are commonly used to illustrate the behavioural patterns of dynamic systems, where time is plotted on the x-axis. for example, the world population growth is often represented on an x-y plot, with the year on the x-axis and the accumulation of billions of people or the annual growth rate (i.e., inflow minus outflow) on the y-axis. evidence from laboratory experiments using graphical methods and simple representations of dynamic tasks over the last decade suggests that people generally misunderstand the basic behavioural patterns of dynamic systems: the stock-flow failure (sf failure). the sf failure is evidence that people misinterpret the accumulation of a quantity (“stock”) and often reason that there should be a direct relationship between the accumulation and the direction of the rates of change that it accumulates (i.e., the flows: inflow or outflow) (i.e., they should be positively correlated) (cronin, gonzalez, & sterman, 2009). this is equivalent, for example, to deciding that the world population should decrease if the annual growth rate of the world population decreases. this is, of course, incorrect, and the world population will continue to increase as long as the net rate of growth is greater than zero (e.g., more people are born than die). booth-sweeney and sterman (2000) were the first to show that, when asked to plot the trajectory of an accumulating stock, people often draw a curve that matches the pattern of the flows. this phenomenon, later termed the correlation heuristic, was found to be very resistant to a wide range of interventions (actions designed to bring about changes in people) altering motivation, context, and mode of information representation (cronin, gonzalez, & sterman, 2009). this fundamental misunderstanding of how system inflows and outflows accumulate over time contributes to a wide range of real-world problems at the personal, organizational and global levels, such as maintaining a healthy body weight (abdel-hamid et al., 2014), reducing atmospheric co2 (dutt & gonzalez, 2012a; dutt & gonzalez, 2012b; newell, kary, moore, & gonzalez, 2016; sterman, 2008; sterman & booth sweeney, 2002), and balancing budgets (booth sweeney & sterman, 2000; newell et al., 2016). one often-cited example of the correlation heuristic is the misunderstanding of how the concentration corresponding author: cleotilde gonzalez, 5000 forbes ave, porter hall 208, pittsburgh, pa, 15213, usa. e-mail: coty@cmu.edu 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 1 mailto:coty@cmu.edu https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments of co2 in the atmosphere (the stock) relates to the co2 emitted into the atmosphere (the inflow) and the co2 absorbed via natural processes (the outflow). if the co2 concentration rises gradually and then stabilizes, people reason that the co2 inflow pattern (emissions) will increase similarly when absorptions are lower than emissions and stable over time. in reality, co2 emissions need to decrease over time and converge with the level of absorptions in order to achieve stabilization. this simple but very impactful misunderstanding often leads to erroneous decision making and policies when addressing climate change (dutt & gonzalez, 2012a; dutt & gonzalez, 2012b; sterman, 2008). while the potential policy implications of sf failure in applied societal problems like climate change may be clear, an understanding of the causes and possible ways to combat sf failure are often missing. in this research, we experimentally investigate a valence effect, first suggested by newell and colleagues (2016), and we also explore the potential of a video demonstration as an intervention for the improvement of performance in judgments of stock and flows. as discussed in the literature review below, the valence hypothesis suggests that the way a problem is framed (i.e., the “goodness” or “badness” of a situation) determines the applicability of the correlation heuristic, and thus sf problem behaviour. to test this hypothesis, we use a relevant societal health stock-flow problem as an example: blood glucose control. in this problem, people often fail to understand how sugar accumulates in the bloodstream over time as a function of sugar intake (via food) and sugar absorption (via insulin or exercise). below we also review the literature suggesting that video interventions have the potential to improve performance in sf tasks. the valence effect newell et al. (2016) first hinted at a possible “valence” explanation for sf failure that emerges from the framing of a problem, the accumulation trend, and the direction of the flows modifying a stock. in an effort to improve performance in sterman and booth-sweeney’s (2007) co2 task, newell and colleagues (2016) used financial debt management and savings management isomorphs of the co2 task. applying the same methods as sterman and boothsweeney (2007), newell et al. (2016) showed participants a graph of a stock (savings, debt, or co2) that increased over time and stabilized over a future time period, as well as a graph of the corresponding flows (expenses and earnings for the debt and savings contexts; emissions and absorptions for the co2 context). however, one of the flow curves in the flows graph was incomplete (whereas the other one was stationary at a lower level), and participants were asked to provide the value that the incomplete flow curve should equal at the end of the time period for the stock to stabilize as shown. in all contexts, the correct answer entailed decreasing the value of the incomplete flow to match the value of the other (stationary and lower) flow. newell et al. (2016) suspected that performance would be better for the debt/savings isomorphs than for the co2 task because people are more familiar with the debt and savings contexts. however, their results suggest that familiarity with the context alone does not support reasoning. familiarity with the debt and savings isomorphs was not enough for participants to do well in these tasks, supporting the conclusions reached by brunstein, gonzalez, & kanter (2010) and others regarding context familiarity. using sf problems in the medical domain and comparing the performance of medical and general students, brunstein et al. (2010) concluded that domain experience “is not a strong indicator for overcoming the sf failure” (page 352). the sf failure remained latent regardless of participant “expertise”, taking into account that expertise itself is a complex cognitive phenomenon that depends on the domain in which it is considered and on the view or perspective used for its interpretation (sternberg, 1995). instead, newell et al. (2016) found that participants were more accurate when the problem was framed as increasing “debt” (and participants needed to decrease the “spending” to match the “earnings”) than when the problem was framed as increasing “savings” (and participants needed to decrease “earnings” to match “spending”), even though the two problems were equivalent. newell et al. (2016) suggested problem valence as an explanation, reasoning that, to do well in the sf task, the valence of the user-controlled flow should match the direction of the correct solution of the problem. their conjecture was that people were more accurate with the debt framing, because participants reasoned that debt is “bad” and spending should decrease, whereas they were less accurate with the savings framing because savings are “good” and earnings should increase. in the former case, the valence of the user-controlled flow matched the direction of the correct solution (i.e., spending is “bad”), leading participants to correctly decrease the spending flow, overriding the correlation heuristic. in the latter case, the valence of the user-controlled flow opposed the direction of the correct solution (i.e., savings are “good”), leading participants to incorrectly increase the earnings flow in accordance with the correlation heuristic. as newell et al. (2016) had not originally aimed at testing the valence hypothesis, they used only increasing accumulation scenarios (framed as debt or savings) where the correct response required participants to decrease the usercontrolled flow (spending or earnings). therefore, we do not know the full extent to which the valence effect can counteract the correlation heuristic. for example, the performance improvement in the “debt” scenario in newell et al. (2016) may have been fortuitous rather than the result of an actual understanding of stock-flow relationships, because the valence of the user-controlled “debt” flow matched the direction of the correct solution. in this paper, we test the valence hypothesis by controlling for (1) the behaviour of the stock (increasing or decreasing), and (2) whether the user-controlled inflow or outflow matches or opposes the direction of the correct solution. we use a blood glucose management example because blood glucose increases and decreases can actually be both “good” or “bad”; high blood glucose (hyperglycemia) and low blood glucose (hypoglycemia) are major health concerns; established methods for controlling and stabilizing blood glucose entail both increasing and decreasing sugar consumption (inflow) and insulin (outflow); and many people fail to understand how sugar accumulates in the bloodstream over time as a function of sugar intake (via food) and sugar absorption (via insulin or exercise). thus, blood glucose management is an ideal context to test the valence hypothesis. according to the correlation heuristic, we hypothesize that, regardless of the direction of the stock (increasing or decreasing accumulation), when the valence of the usercontrolled flow agrees with the direction of the correct response, performance will be better than when the user10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 2 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments controlled flow opposes the direction of the correct response. furthermore, when the user-controlled flow opposes the direction of the correct response, we expect the valence of the user-controlled flow (i.e., soda or insulin intake) to influence the effect of the correlation heuristic. for example, when the valence of the user-controlled flow is “good” (i.e., decrease soda consumption), performance would be better than when the valence of the usercontrolled flow is “bad” (i.e., increase soda consumption). potential of video demonstrations to improve sf failure as mentioned above, people’s ability to reason about stocks and flows has traditionally been tested by showing a graph of the system inflow and outflow and asking them to plot the resulting stock curve (booth-sweeney & sterman, 2000; sterman & booth sweeney, 2002). however, sf failure cannot entirely be explained by a failure to interpret graphs (see, for example, cronin, gonzalez, & sterman, 2009; fischer, degen, & funke, 2015). also, sf failure persists even when the problem is posed as the physical, naturalistic task of pouring water through a funnel into a beaker to meet a target goal (strohhecker & größler, 2015). key studies have attempted to improve people’s understanding of accumulation focusing on the use of analogies (gonzalez & wong, 2012; newell et al., 2016), priming (fischer & gonzalez, 2016), and alternative sf problem presentation styles (fischer, degen, & funke, 2015; fischer & gonzalez, 2016), with limited success. importantly, research suggests that interventions involving simulations and learning tools may be more effective in reducing sf failure (dutt & gonzalez, 2012a; moxnes & saysel, 2009; sterman et al., 2012). a possible explanation for the improvement is that interactive simulations are experiential rather than descriptive. for example, dutt and gonzalez (2012b) exposed participants to descriptive and experiential co2 stabilization task conditions. the descriptive version emulated the task designed by sterman and boothsweeney (2007), whereas participants used a dynamic climate change simulator to perform the same co2 stabilization task in the simulated version. they found that people’s misconceptions decreased significantly when they practiced the task using the simulation. this finding highlights the potential for using experience-based simulation tools to improve the understanding of the dynamics of climate change. in this study, we employ a pretest-posttest design to test the valence hypothesis and explore the effects of a video demonstration in phase 2. our hypothesis is that the observation of a video demonstrating the correct solution of what they have just experienced can result in an improved judgement of stock and flow patterns in subsequent novel problems. specifically, participants are expected to perform better in a novel sf problem after observing a video that demonstrates that it is possible to control the stock even when the valence of the user-controlled flow opposes the direction of the correct response. method participants participants were recruited through amazon mechanical turk (mturk) to complete a “decision making & health” task. they were paid us $1.50 for completing the survey and received a bonus of us $0.50 for each question answered correctly up to a maximum payment of us $2.50. of the 403 recruited participants, two failed to complete the study and two failed to watch the video in full, leaving a total of 399 for analysis. of these participants (m age = 34.25, sdage = 10.14), 58.40% identified as male, 41.35% identified as female, and 0.25% identified as intersex. furthermore, 94.49% reported never having suffered from diabetes (not including prediabetes) and 15.04% reported never having helped care for someone with diabetes. participants were randomly assigned to one of four groups (described below) as follows: 98 participants to group a, 101 participants to group b; 101 participants to group c; and 99 participants to group d. we did not record the time spent on each part of the experiment, but collected the total time for each participant recorded from amazon mturk. the average time was 14 minutes and the median was 12 minutes. experimental design this study was composed of two phases: phase 1 and phase 2, divided by the presentation of a video. we used a 2 (stock behaviour: increasing or decreasing) x 2 (flow decision: matching, opposing the correct solution) full factorial design in each phase, that is, before and after showing the video. the first factor was the direction of the accumulation (i.e., stock behaviour), which increased or decreased over time. the increasing stock matched the hypoglycemia (low starting blood glucose) scenario, and the decreasing stock matched the hyperglycemia (high starting blood glucose level) scenario. the second factor was the valence of the user-controlled flow and whether it matched or opposed the correct response needed to control the blood glucose at a target level at the end of the 100minute period. the user-controlled flow valence either matched or opposed the direction of the correct response, whereas the other flow was fixed at a constant level. this 2x2 factorial design yielded four scenarios. scenario 1 involved an increasing stock (i.e., blood glucose) in which the user-controlled flow matches the increasing direction of the correct response and is “good” (increasing insulin to control the blood glucose); scenario 2 involved an increasing stock in which the usercontrolled flow opposes the increasing direction of the correct response and is “good” (decreasing the consumption of soda to control the blood glucose); scenario 3 involved a decreasing stock in which the usercontrolled flow matches the decreasing direction of the correct response and is “bad” (decreasing insulin to control the blood glucose); and scenario 4 involved a decreasing stock in which the user-controlled flow opposes the decreasing direction of the correct response and is “bad” (increasing the consumption of soda to control the blood glucose). 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 3 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments phase 1 video phase 2 group a scenario 1 scenario 1 video scenario 2 factor 1: increasing stock factor 1: increasing stock factor 2: user-controlled outflow matches the correct solution (outflow should increase) factor 2: user-controlled inflow opposes the correct solution (inflow should decrease) group b scenario 2 scenario 2 video scenario 1 factor 1: increasing stock factor 1: increasing stock factor 2: user-controlled inflow opposes the correct solution (inflow should decrease) factor 2: user-controlled outflow matches the correct solution (outflow should increase) group c scenario 3 scenario 3 video scenario 4 factor 1: decreasing stock factor 1: decreasing stock factor 2: user-controlled outflow matches the correct solution (outflow should decrease) factor 2: user-controlled inflow opposes the correct solution (inflow should increase) group d scenario 4 scenario 4 video scenario 3 factor 1: decreasing stock factor 1: decreasing stock factor 2: user-controlled inflow opposes the correct solution (inflow should increase) factor 2: user-controlled outflow matches the correct solution (outflow should decrease) table 1. experimental groups exposed to one of the four experimental conditions in phase 1, followed by a video showing a demonstration of a correct solution to the phase 1 scenario, and then by exposure to a different scenario of experimental conditions in phase 2. appendix 1 shows the four scenarios that were used in the study. participants were shown two graphs: a graph of the stock trend (i.e., blood glucose) over the course of a 100-minute period, and a graph of the corresponding inflow and outflow (insulin and sugar consumption) trends; one of the flows was fixed at the same level over the 100-minute time period, and the trend of the other flow was shown up to minute 50. participants were asked to decide how to stabilize the blood glucose level at minute 100 as shown in the stock graph using a sliding bar to decide on the level of the incomplete flow (insulin or sugar consumption). experimental groups four experimental groups were designed so that participants would receive one of the four phase 1 scenarios, followed by a video illustrating the dynamics of the solution to the question that they had just been asked to solve in phase 1. they were then asked to deal with a different scenario in phase 2. these groups are illustrated in table 1. participants exposed to scenario 1 in phase 1 were asked to solve scenario 2 in phase 2; those exposed to scenario 2 in phase 1 were exposed to scenario 1 in phase 2; those exposed to scenario 3 in phase 1 were exposed to scenario 4 in phase 2, and those exposed to scenario 4 in phase 1 were exposed to scenario 3 in phase 2. these shifts from phase 1 to phase 2 were designed so that we could observe the effect of the video demonstration on the valence effect: the ability of participants to handle scenarios in which the usercontrolled flow matches and opposes the direction of the correct response, respectively. they should also enable us to test the valence effect with respect to both increasing and decreasing stock scenarios. procedure and video demonstration after providing informed consent, participants were randomly assigned to one of the four scenarios. participants were asked to imagine a diabetic person who was trying to control blood glucose (see appendix 1 for exact instructions). participants were presented with the phase 1 scenario and were asked to indicate their flow decision by manipulating a slider that ranged from 0 mg/dl per 10 minutes to 200 mg/dl per 10 minutes. next, they were shown a video that demonstrated how the problem stated in phase 1 was correctly solved dynamically. a five-to-six-minute narrated training video was created using the simulated water tank used in gonzalez & dutt’s (2007) dynamic stock and flows (dsf) task (see figure 1 for a screenshot of the simulation used in the video). the dsf program displayed a two-dimensional water tank with two inflow pipes (representing the user inflow and the environmental inflow) and two outflow pipes (representing the user outflow and the environmental outflow). the amount associated with each flow was shown numerically on the screen, as was the current amount in the tank. a red horizontal line drawn across the tank represented the target level. at the bottom of the screen, there was a blank field into which the narrator entered the user inflow (outflow) for each time period, where each time period represented 10 minutes. the goal of the task was to adjust the user inflow (outflow) over 10 time periods so that the amount in the tank reached the target level by the end of the tenth time period. the training video presented the same scenario as the problem solved in phase 1 — but with different values1. as 1in the dsf program, the unit of time was set by default to “hour.” we blanked out this unit and intended to replace it with “minute,” but we accidentally neglected to do so. 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 4 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments figure 1. screenshot of the training video for group a. the scenario was scenario 1: hypoglycemia (increasing stock) and the environmental inflow was fixed at 95 mg/dl/minute. in the graph problems, either the environmental inflow or the environmental outflow was set at a fixed value, whereas the environmental outflow (inflow) was programmed to vary across time periods. the narrator explained each visual component of the dsf task and then acted out the task for the participant, entering values into the user inflow (outflow) box and describing their effect on the level in the tank at each decision point. as the videos progressed, the narrator developed and explained a strategy for reaching the target level (see appendix 2 for a transcript of the narrator’s script). in all four training videos, the narrator attained the goal stock level at the end of the tenth time period. after watching the training video, the phase 2 scenario was introduced to participants, and they were asked to make their flow decision using the same slider as in phase 1. finally, participants completed a brief demographic questionnaire that assessed the participants’ firstand second-hand experience with diabetes (see appendix 3 for full questionnaire). importantly, participants did not receive any direct feedback about the correctness of responses after phase 1 or after phase 2. for payment purposes, they were only informed of the total bonus amount they earned at the end of the experiment. participants were not informed of which of the questions they responded correctly to in which of the two phases. results performance in both phase 1 and phase 2 was measured in terms of the accuracy gap, calculated by subtracting the correct response for each scenario from the participant’s response. therefore, positive values indicated that participants “overshot” the correct response, negative values indicated that participants “undershot” the correct response, values closer to zero indicated higher accuracy, and values that are equal to zero indicate exact correct response. using this measure, we can quantify the direction and strength of the valence effect. participants are expected to overshoot the goal when something is “good” and undershoot the goal when something is “bad”. furthermore, when something is “good” to a greater degree, participants are expected to overshoot the goal more than when something is “good” to a lesser degree; similarly, when something is “bad” to a greater degree, participants are expected to undershoot the goal more than when something is “bad” to a lesser degree. phase 1 accuracy figure 2 shows the accuracy distributions for each scenario and table 2 gives the exact measures of central tendency and the proportion of participants who responded correctly in phase 1. we found that, regardless of the direction of the stock (increasing or decreasing), participants were more accurate when the valence of the user-controlled flow agrees with the direction of the correct response (scenarios 1 and 3) than when the user-controlled flow opposes the direction of the correct response (scenarios 2 and 4). these observations support the correlation heuristic, although large standard deviations were observed in all scenarios. in terms of frequency of correct responses, we found that scenarios 1 and 3 have higher exact accuracy (21.43% and 16.83% respectively) compared to scenarios 2 and 4 (8.91% and 9.08% respectively). in accordance with the valence hypothesis, we found that, on average, participants in scenarios 1 and 3 overshoot the goal. thinking that insulin is “good” 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 5 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments figure 2. deviations from the correct response for each of the four scenarios in phase 1. each dot in the figure represents a response. horizontal lines in the box plots represent the first quartile, median, and third quartile. the horizontal line at y = 0 indicates perfect accuracy. m sd median frequency of correct response phase 1 scenario 1: increasing stock, user-controlled outflow matches the correct solution 13.56 52.30 0.00 21.43% scenario 2: increasing stock, user-controlled inflow opposes the correct solution -11.59 62.77 -6.00 8.91% scenario 3: decreasing stock, user-controlled outflow matches the correct solution 19.13 45.00 10.00 16.83% scenario 4: decreasing stock, user-controlled inflow opposes the correct solution -60.61 44.76 -70.00 9.09% table 2. measures of central tendency and the proportion of participants who responded correctly in phase 1. values are means, standard deviations, and medians for deviations from the correct response. a value of 0 indicates perfect accuracy. the frequency of correct responses was calculated by dividing the number of participants who gave correct responses in a scenario by the total number of participants in that scenario. for controlling blood glucose, they used more insulin than necessary to control the increasing blood glucose in scenario 1 and to control the decreasing blood glucose in scenario 3. similarly, on average, participants undershoot the goal in scenarios 2 and 4. in the belief that soda consumption is “bad” for controlling blood glucose, they used less soda than necessary to control the increasing blood glucose in scenario 2 and to control the decreasing blood glucose in scenario 4. furthermore, as hypothesized by the valence effect, we found that undershooting was greater in scenario 4 than scenario 2, that is, the two scenarios in which the user-controlled flow (i.e., soda) opposes the direction of the correct response. people were more reluctant to increase soda consumption (it is “bad”) in order to control the decreasing blood glucose in scenario 4 than to decrease soda consumption (it is “good”) in order to control for the increasing blood glucose in scenario 2. to test these observations, we conducted a 2 (stock behaviour: increasing, decreasing) x 2 (participantcontrolled flow: matching, opposing) analysis of variance (anova) on the accuracy scores from phase 1. this analysis revealed that there was a significant main effect on the flow valence. participants were more accurate when the valence of the user-controlled flow matched the correct response (scenarios 1 and 3; m = 16.39, sd = 48.68) than when the correct flow direction opposed the behaviour of the stock (scenarios 2 and 4; m = -35.86, sd = 59.75), f(1, 395) = 102.40, p < .001, ηp2 = .21. also, stock direction was found have a main effect. participants were more accurate at increasing stock scenarios (1 and 2; m = 0.79, 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 6 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments sd = 59.07) compared to decreasing stock scenarios (3 and 4; m = -20.34, sd = 60.01), f(1, 395) = 17.57, p < .001, ηp2 = .04. furthermore, a significant interaction effect emerged between the user-controlled flow and the stock direction, f(1, 395) = 27.73, p < .001, ηp2 = .07. as shown in figure 3, while there was no difference in the degree of overshooting in scenarios in which the user-controlled flow valence matched the stock direction (scenarios 1 and 3), there was a significant difference in the amount of undershooting in scenarios in which the valence of the participant-controlled flow opposed the correct solution (scenarios 2 and 4). these observations were confirmed with simple effects tests using the bonferroni correction to adjust for multiple comparisons. participants in scenario 4 undershot significantly more (m = -60.61, sd = 44.76) than participants in scenario 2 (m = -11.59, sd = 62.77), p < .001. no significant difference in accuracy emerged between participants in scenario 1 and scenario 3, p = .45. phase 2 accuracy figure 4 illustrates the accuracy distributions for each scenario and table 3 gives the exact measures of central tendency and the proportion of participants who responded correctly in phase 2. the observed results are very similar to findings in phase 1. again we observe that the valence of the user-controlled flow has an effect: accuracy is higher in scenarios 1 and 3 (21.78% and 28.28%, respectively) than in scenarios 2 and 4 (23.47% and 9.09%, respectively). we also observed overshooting and undershooting of the goal in scenarios 1 and 3 and scenarios 2 and 4, respectively. again undershooting was greater in scenario 4 than in scenario 2. we conducted a 2 x 2 anova on phase 2 accuracy scores. as in phase 1, a significant main effect of flow valence emerged: f(1, 395) = 112.84, p < .001, ηp 2 = .22. specifically, participants performed better in scenarios 1 and 3 (m = 10.27, sd = 34.75) than in scenarios 2 and 4 (m = -31.73, sd = 46.37). also, stock direction had a significant main effect, where participants who were shown an increasing stock (scenarios 1 and 2; m = -0.41, sd = 42.86) performed better than those shown a decreasing stock (scenarios 3 and 4) (m = -20.90, sd = 47.04), f(1, 395) = 25.92, p < .001, ηp2 = .06. there was also a significant interaction between the user-controlled flow and the stock direction: f(1, 395) = 12.17, p = .001, ηp2 = .03. figure 5 illustrates this interaction. again, there was a significant difference in the degree of undershooting between scenarios 2 and 4, in which the participantcontrolled flow valence opposed the correct solution, although there was no significant difference in the degree of overshooting between scenarios 1 and 3, in which the participant-controlled flow valence matched the correct solution. to confirm these observations, a test of simple effects was performed using the bonferroni correction to adjust for multiple comparisons. participants in scenario 4 undershot significantly more (m = -48.33, sd = 43.71) than participants in scenario 2 (m = -14.63, sd = 42.86), p < .001. there was no significant difference in the amount of overshooting between participants in scenario 1 and scenario 3, p = .26. effects of video demonstration and demographics while the separate analyses of phase 1 and phase 2 reported above suggest that there is little or no difference between conditions in phase 1 and phase 2, this section describes an analysis of the effects for each participant group (see table 1). we tested whether accuracy of participants in phase 2 would improve after a video demonstration illustrating the correct response to the scenario experienced in phase 1. of particular interest is the potential improvement in groups a and c, where participants perform scenarios 2 and 4 in phase 2, respectively, because they are the scenarios in which the user-controlled flow opposes the correct solution, where accuracy was found to be lower. we compared average improvements in accuracy between phase 1 and phase 2 across conditions. improvement was calculated by subtracting the absolute value of deviations from the correct response in phase 2 from the absolute value of deviations from the correct response in phase 1 for each condition. more positive values indicate improvement, whereas more negative values indicate degradation. on average, across all conditions, participant accuracy improved by 14.80 units (sd = 52.29) from phase 1 to phase 2 after the presentation of the video. figure 6 presents the average improvement for each of the four groups. we performed a 2 (stock direction in video: increasing, decreasing) x 2 (participant-controlled flow in video: matching, opposing) anova2 on accuracy improvement. a significant main effect of user-controlled flow valence emerged: f(1, 395) = 64.18, p < .001, ηp2 = .14. specifically, there was a significant improvement for participants in groups b and d from phase 1 to phase 2 (m = 34.14, sd = 47.29), while performance for participants in groups a and c dropped from phase 1 to phase 2: m = -4.63, sd = 49.92. stock direction did not have a significant main effect: p = .66. however, the interaction between flow valence and stock direction was significant: f(1, 395) = 7.88, p = .005, ηp2 = .02. tests of simple effects with bonferroni corrections revealed that the accuracy for participants in group a (solving scenario 1 in phase 1 and scenario 2 in phase 2) improved (m = 3.34, sd = 50.44), whereas accuracy dropped in group c (solving scenario 3 in phase 1 and scenario 4 in phase 2) (m = -12.37, sd = 48.41): p = .02. there was no significant difference in improvement between group b and group d: p = .10. in the presence of 2recall that the stock behavior and flow decision in the training video matched the stock behavior and flow decision that participants saw in phase 1. 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 7 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments figure 3. significant interaction between flow decision (matching, opposing) and stock behaviour (increasing, decreasing) in phase 1. deviations were measured in mg/dl per 10 minutes. error bars indicate standard errors of the mean. figure 4. deviations from the correct response for each of the four scenarios in phase 2. each dot in the figure represents a response. horizontal lines in the box plots represent the first quartile, median, and third quartile. the horizontal line at y = 0 indicates perfect accuracy. (see figure 2). a video demonstration there were significant improvements for groups b and d (i.e., in scenarios with inflows matching the correct solution after working on scenarios with inflows opposing the correct solution). finally, a demographic analysis regarding factors of possible interest — educational level, being a diabetes sufferer, and diabetes caring experience — showed that none of these factors influenced the accuracy of participants’ responses in phase 1 or phase 2 or the improvement from phase 1 to phase 2. no variables were significant. discussion overall, we found that the valence of the usercontrolled flow and the stock direction had significant main effects, and there was an interaction between valence and stock direction influencing accuracy in phase 1 and phase 2. regardless of the direction of the stock (increasing, decreasing), participants responded more accurately when the valence of the usercontrolled flow matched the direction of the correct solution to the problem than when the valence op10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 8 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments m sd median frequency of correct response phase 2 scenario 1: increasing stock, user-controlled outflow matches the correct solution 13.39 37.76 0.00 21.78% scenario 2: increasing stock, user-controlled inflow opposes the correct solution -14.63 42.86 -5.00 23.47% scenario 3: decreasing stock, user-controlled outflow matches the correct solution 7.09 31.25 0.00 28.28% scenario 4: decreasing stock, user-controlled inflow opposes the correct solution -48.33 43.71 -55.00 9.09% table 3. measures of central tendency and the proportion of participants who responded correctly in phase 1. values are means, standard deviations, and medians for deviations from the correct response. a value of 0 indicates perfect accuracy. the frequency of correct responses was calculated by dividing the number of participants who gave the correct responses in a scenario by the total number of participants in the corresponding group (see table 2). figure 5. significant interaction between flow decision (matching, opposing) and stock behaviour (increasing, decreasing) in phase 2. deviations were measured in mg/dl per 10 minutes. error bars indicate standard errors of the mean. posed the direction of the correct response. this finding is consistent with research that demonstrates that people more easily learn positive relationships among variables (i.e., variables moving in the same direction) than negative relationships (i.e., variables moving in opposing directions) (brehmer, 1980). it also provides strong support for the correlation heuristic (cronin, et al., 2009). these results also corroborate and expand the results reported by newell et al. (2016) regarding the valence effect. first, we replicated newell et al.’s findings with respect to the valence effect on increasing stocks. when the user-controlled flow valence matched the direction of the correct solution (scenario 1, i.e., increase of insulin to control the blood glucose accumulation), performance was better than when the user-controlled flow valence opposed the direction of the correct solution (scenario 2, i.e., decrease soda consumption to control the blood glucose accumulation). second, we also expanded this result to decreasing stocks. we found that when the valence matched the direction of the correct solution (scenario 3, i.e., decrease insulin in order to control the blood glucose accumulation), performance was better than in scenario 4 (i.e., increase soda consumption in order to control the blood glucose accumulation). as suggested by newell et al. (2016), performance was found to be good not as a result of the actual understanding of stock-flow relationships but because the valence of the user-controlled flow matched the direction of the correct solution and the correlation heuristic reinforced this relationship. consistently, and regardless of the direction of the stock, participants responded worse to situations where the valence of the user-controlled flow opposed the direction of the correct solution, that is, only a few participants were able to overcome the correlation heuristic. 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 9 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments figure 6. mean improvement in accuracy between phase 1 and phase 2 for the four experimental groups. error bars represent standard errors of the mean. furthermore, the stock direction also had a main effect. in both phase 1 and phase 2, participants responded more accurately to problems involving increasing stocks (i.e., scenarios 1 and 2) than to problems involving decreasing stocks (i.e., scenarios 3 and 4), a result that supports past findings that increasing relationships are easier to learn than decreasing ones (gonzalez & dutt, 2007). however, a significant interaction explains the relationships between the valence of the problem and the direction of the stock. there is significant amount of overshooting of the goal in scenarios 1 and 3 and a significant amount of undershooting the goal in scenarios 2 and 4. according to the valence effect, thinking that the use of insulin is “good” for controlling blood glucose resulted in the use of more insulin than needed, whereas participants considering that soda ingestion is “bad” for controlling blood glucose ended up using less soda than needed to achieve the target blood glucose level. interestingly, and as predicted by the valence hypothesis, we found significantly larger undershooting in scenario 4 than in scenario 2. this is explained by the interaction between the direction of the stock and the opposition of the valence of the user-controlled flow. in scenario 2, the blood glucose level increases over time. this is “bad”, and a decrease in the soda consumption would appear to be a “good” option for counteracting this trend. thus, there is undershooting in scenario 2, but it is less severe than in scenario 4. in scenario 4, the blood glucose level decreases over time, this is “good” and an increase in the soda consumption would appear to be a “bad” option for counteracting this trend, increasing the undershooting of the goal. in actual fact, the correct solution is to decrease and increase soda consumption in scenarios 2 and 4, respectively. regarding the improvement in accuracy from phase 1 to phase 2, while the observed improvements cannot be attributed exclusively to the observation of the video demonstration, it is interesting to discuss the different effects and how they relate to the valence effect. we observed a substantial, significant and similar improvement for participants in groups b and d, a slight improvement for participants in group a; and a deterioration of performance in group c. one explanation for the improvement observed in groups b and d is that the participants switched from scenarios in which solutions opposed the correlation heuristic in phase 1 to scenarios whose solutions matched the correlation heuristic in phase 2. in general, participants performed more accurately in phase 2 than in phase 1, but this improvement could be due to the use of the correlation heuristic rather than to the video demonstration. therefore, the improvement in performance from phases 1 to 2 may be fortuitous as in the case reported by newell et al. (2016). as the correlation heuristic is a robust tendency, it is unlikely, in view of the results for groups a and b, that the observed improvement can be attributed to the video demonstration alone. group a participants performed more accurately in scenario 2 of phase 2 than they did in scenario 1 of phase 1. this is interesting as it is consistent with the idea that the video demonstration may have influenced the participants to accept the decrease in soda consumption (a “good” action) to stabilize an increasing blood glucose level in scenario 2. however, the video demonstration did not clearly influence accuracy in scenario 4 after the demonstration of scenario 3. although the undershooting in scenario 4 of phase 2 decreased compared to phase 1, the drop is insufficient to suggest that participants considered the increase in soda consumption as a “good” option for stabilizing a decreasing blood glucose level. 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 10 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments implications, limitations and future work a main implication of these results is that one needs to carefully select how to frame the context of a stock and flow problem. to encourage correct responses, one needs to: (1) frame a problem so that the correct response matches the correlation heuristic; (2) ensure that the valence of the scenario matches the valence of the correct response if the correct response cannot match the correlation heuristic; and (3) avoid situations in which the correct response matches neither the correlation heuristic nor the valence of the scenario, because this will result in extremely poor performance. a demonstration of the first strategy is discussed in dutt and gonzalez (2013). this “information presentation” strategy proposed to take advantage of the human reliance on the correlation heuristic to encourage people to pay eco-taxes. participants judged that larger eco-tax increases would cause proportionally greater reductions in co2 emissions yet preferred smaller tax increases because of their lesser cost. this finding suggests that it would be beneficial for eco-tax policy makers to present information in terms of eco-tax increases such that smaller than current eco-tax increases (which are more attractive and are likely to be chosen by people) cause greater co2 emissions reductions. in future research, this practical conclusion and the effects of the scenario valence matching or opposing the direction of the correct response need to be tested in naturalistic cases of blood glucose control and other global problems. also, it is important in future research to address the limitations of the design of this study to test the effectiveness of the video demonstration and possible interventions to improve performance in sf tasks. as discussed above, the improvement of the performance from phase 1 to phase 2 that we observed in some of the groups (b and d) may have been accidental, as in the case reported in newell et al. (2016), especially considering the fact that there was little improvement in group a and a decline in performance for group c. in addition, the observed improvements may not be exclusively due to the video demonstration. there could be a practice effect, due to the exposure to a similar (albeit not identical) problem in phase 1. future research should focus on experimentally manipulating decision aids or interventions in the form of video demonstrations or dynamic simulations, including a control condition, namely, the absence of such intervention. in this manner, we could test the effect of a video demonstration as an intervention for improving accuracy in sf failure. acknowledgements: this work has been funded by the national science foundation, decision risk and management science, award number 1530479. we thank research assistant, nalyn sriwattanakomen, who helped with initial preparation of materials and data collection. note: an older version of this work appeared as an abstract in the proceedings of the 36th international conference of the system dynamics society, reykjavik, iceland, august 6-10, 2018. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: gonzalez contributed to the development ideas, experimental design, materials and data collection, data analyses, and writing. sanchez-segura, dugarte-peña, and medina-dominguez contributed to data analyses, interpretations of results, and writing. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: gonzalez, c., sanchez-segura, m.-i., dugarte-peña, g.-l., & medina-dominguez, f. (2018). valence matters in judgments of stock accumulation in blood glucose control and other global problems. journal of dynamic decision making, 4, 3. doi:10.11588/jddm.2018.1.49607 received: 19 jul 2018 accepted: 18 dec 2018 published: 26 dec 2018 references abdel-hamid, t., ankel, f., battle-fisher, m., gibson, b., gonzalez-parra, g., jalali, m., . . . murphy, p. (2014). public and health professionals’ misconceptions about the dynamics of body weight gain/loss. system dynamics review, 30(1–2), 58–74. doi:10.1002/sdr.1517 booth sweeney, l. & sterman, j.d. (2000). bathtub dynamics: initial results of a systems thinking inventory. system dynamics review, 16, 249–286. doi:10.1002/sdr.198 brehmer, b. (1980). in one word: not from experience. acta psychologica, 45(1-3), 223–241. doi:10.1016/0001-6918(80)900347 brunstein, a., gonzalez, c., & kanter, s. (2010). effects of domain experience in the stock–flow failure. system dynamics review, 26(4), 347–354. doi:10.1002/sdr.448 cronin, m., gonzalez, c., & sterman, j.d. (2009). why don’t well-educated adults understand accumulation? a challenge to researchers, educators and citizens. organizational behaviour and human decision processes, 108, 116–130. doi:10.1016/j.obhdp.2008.03.003 diehl, e., & sterman, j. d. (1995). effects of feedback complexity on dynamic decision making. organizational behavior and human decision processes, 62(2), 198 –215. doi:10.1006/obhd.1995.1043 dutt, v., & gonzalez, c. (2012a). human control of climate change. climatic change, 111(3–4), 497–518. doi:10.1007/s10584-011-0202-x 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 11 https://doi.org/10.11588/jddm.2018.1.49607 http://dx.doi.org/10.1002/sdr.1517 http://dx.doi.org/10.1002/sdr.198 http://dx.doi.org/10.1016/0001-6918(80)90034-7 http://dx.doi.org/10.1016/0001-6918(80)90034-7 http://dx.doi.org/10.1002/sdr.448 http://dx.doi.org/10.1016/j.obhdp.2008.03.003 https://doi.org/10.1006/obhd.1995.1043 http://dx.doi.org/10.1007/s10584-011-0202-x https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments dutt, v., & gonzalez, c. (2012b). decisions from experience reduces misconceptions about climate change. journal of environmental psychology, 32(1), 19–29. doi:10.1016/j.jenvp.2011.10.003 dutt, v., & gonzalez, c. (2013). enabling eco-friendly choices by relying on the proportional-thinking heuristic. sustainability, 5(1), 357–371. doi:10.3390/su5010357 fischer, h., degen, c., & funke, j. (2015). improving stock-flow reasoning with verbal formats. simulation & gaming, 46(3–4), 1–15. doi:10.1177/1046878114565058 fischer, h., & gonzalez, c. (2016). making sense of dynamic systems: how our understanding of stocks and flows depends on a global perspective. cognitive science, 40, 496–512. doi:10.1111/cogs.12239 frensch, p., & funke, j. (eds.). (1995). complex problem solving: the european perspective. hillsdale, nj: lawrence erlbaum associates. retrieved from http://works.bepress.com/ joachim_funke/13/ gonzalez, c., & dutt, v. (2007). learning to control a dynamic task: a system dynamics cognitive model of the slope effect. in proceedings of the iccm eighth international conference on cognitive modeling, ann arbor, mi. gonzalez, c., & wong, h. (2012). understanding stocks and flows through analogy. system dynamics review, 28(1), 3–27. doi:10.1002/sdr.470 moxnes, e., &saysel, a.k. (2009). misperceptions of global climate change: information policies. climatic change, 93(1–2), 15–37. doi:10.1007/s10584-008-9465-2 newell, br., kary, a., moore, c., & gonzalez, c. (2016). managing the budget: stock-flow reasoning and the co2 accumulation problem. topics in cognitive science, 8(1), 138–159. doi:10.1111/tops.12176 sterman, j.d. (2008). risk communication on climate change: mental models and mass balance. science, 322, 532–533. doi:10.1126/science.1162574 sterman, j.d., & booth sweeney l. (2002). cloudy skies: assessing public understanding of global warming. system dynamics review, 18(2), 207–240. doi:10.1002/sdr.242 sterman, j.d, & booth sweeney, l. (2007). understanding public complacency about climate change: adults’ mental models of climate change violate conservation of matter. climatic change, 80(3–4), 213–238. doi:10.1007/s10584-006-9107-5 sterman, j.d., fiddaman, t., franck, t., jones, a., mccauley, s., rice, p., . . . siegel, l. (2012). management flight simulators to support climate negotiations. environmental modeling and software, 44, 122–135. doi:10.1016/j.envsoft.2012.06.004 sternberg, r. j. (1995). expertise in complex problem solving: a comparison of alternative conceptions. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective (pp. 295–321). hillsdale, nj: lawrence erlbaum associates. strohhecker, j., größler, a. (2015). performance in tangible and in cognitive stock-flow tasks: closer than expected. simulation & gaming, 46(3-4), 230–254. doi:10.1177/1046878115577160 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 12 http://dx.doi.org/10.1016/j.jenvp.2011.10.003 http://dx.doi.org/10.3390/su5010357 http://dx.doi.org/10.1177/1046878114565058 http://dx.doi.org/10.1111/cogs.12239 http://works.bepress.com/joachim_funke/13/ http://works.bepress.com/joachim_funke/13/ http://dx.doi.org/10.1002/sdr.470 http://dx.doi.org/10.1007/s10584-008-9465-2 http://dx.doi.org/10.1111/tops.12176 http://dx.doi.org/10.1126/science.1162574 http://dx.doi.org/10.1002/sdr.242 http://dx.doi.org/10.1007/s10584-006-9107-5 http://dx.doi.org/10.1016/j.envsoft.2012.06.004 http://dx.doi.org/10.1177/1046878115577160 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments appendix 1 this appendix describes the four scenarios used in the experiment. all graphs illustrate information on a stock accumulation or a flow. in all cases, the independent variable (shown on the x-axis) is the time (measured in minutes), and the dependent variable is either the stock accumulation value (measured in mg/dl) or the inflow or outflow value (measured in mg/dl/min). the stock accumulation case (hypoor hyperglycemia) and the explanation of the related fixed and estimated flows (soda consumption – glucose input – or insulin action – glucose output–) are described for each scenario. graph problems scenario 1: hypoglycemia (increasing stock) x user-controlled outflow below are two graphs representing a hypothetical scenario of blood glucose control. the graph on the left shows your blood glucose levels over a 100-minute period. the graph on your right shows the rate of your insulin production, which reduces your blood glucose, and the rate of your soda consumption, which increases your blood glucose. suppose that you have a variety of diabetes that makes your body produce insufficient insulin. while exercising, you begin to feel very faint. you measure your blood glucose at 0 minutes and find that it is extremely low at 60 mg/dl, so you ingest a bottle of sugary soda that will continuously put 50 mg/dl of glucose per minute into your bloodstream for the next 100 minutes. however, you soon realize that if you do not start burning off glucose, your blood glucose will become dangerously high. to remedy this problem, you begin injecting insulin at 10 minutes that removes glucose from your bloodstream. the rate at which you inject insulin is shown for only the first 50 minutes. if your goal is to stabilize your blood glucose at 235 mg/dl by 100 minutes as shown in the left graph, what does the rate of glucose removal (by insulin injection) need to be at 100 minutes? scenario 2: hypoglycemia (increasing stock) x user-controlled inflow below are two graphs representing a hypothetical scenario of blood glucose control. the graph on the left shows your blood glucose levels over a 100-minute period. the graph on your right shows the rate of your insulin production, which reduces your blood glucose, and the rate of your soda consumption, which increases your blood glucose. suppose that you have a variety of diabetes that prevents your body from producing insulin. while exercising, you begin to feel very faint. before you started, you had connected yourself to an insulin pump that will continuously remove 100 mg/dl of glucose per minute from your bloodstream for the next 100 minutes. now, at 0 minutes, you find that your blood glucose is extremely low at 30 mg/dl. 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 13 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments to remedy this problem, you begin drinking a sugary soda at 10 minutes that puts glucose into your bloodstream. the rate at which you intake glucose through soda is shown for only the first 50 minutes. if your goal is to stabilize your blood glucose at 235 mg/dl by 100 minutes as shown in the left graph, what does the rate of glucose intake (by soda) need to be at 100 minutes? scenario 3: hyperglycemia (decreasing stock) x user-controlled outflow below are two graphs representing a hypothetical scenario of blood glucose control. the graph on the left shows your blood glucose levels over a 100-minute period. the graph on your right shows the rate of your insulin production, which reduces your blood glucose, and the rate of your soda consumption, which increases your blood glucose. suppose that you have a variety of diabetes that makes your body produce insufficient insulin. you are drinking a bottle of sugary soda that will continuously put 60 mg/dl of glucose per minute into your bloodstream for the next 100 minutes. you measure your blood glucose at 0 minutes and find that it is extremely high at 360 mg/dl. you soon realize that if you do not start burning off glucose, you will need to be hospitalized. to remedy this problem, you begin injecting insulin at 10 minutes that removes glucose from your bloodstream. the rate at which you inject insulin is shown for only the first 50 minutes. if your goal is to stabilize your blood glucose at 105 mg/dl by 100 minutes as shown in the left graph, what does the rate of glucose removal (by insulin injection) need to be at 100 minutes? scenario 4: hyperglycemia (decreasing stock) x user-controlled inflow below are two graphs representing a hypothetical scenario of blood glucose control. the graph on the left shows your blood glucose levels over a 100-minute period. the graph on your right shows the rate of your insulin production, which reduces your blood glucose, and the rate of your soda consumption, which increases your blood glucose. 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 14 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments suppose that you have a variety of diabetes that prevents your body from producing insulin. you measure your blood glucose at 0 minutes and find that it is extremely high at 360 mg/dl, so you put yourself on an insulin pump that will continuously remove 120 mg/dl of glucose per minute from your bloodstream for the next 100 minutes. however, you soon realize that if you do not start ingesting sugar, your blood glucose will end up dangerously low. to remedy this problem, you start drinking a sugary soda at 10 minutes that puts glucose into your bloodstream. the amount of glucose that you intake through soda is shown for only the first 50 minutes. if your goal is stabilize your blood glucose at 100 mg/dl by 100 minutes as shown in the left graph, what does the amount3 of glucose intake (by soda) need to be at 100 minutes? 3this term is misleading; it should be “rate,” not “amount.” this confusion between “amount” and “rate” might account for some of the observed incorrect responses. 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 15 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments appendix 2 narrator script for video in scenario 1 (increasing stock x user-controlled outflow) here we have a tank that symbolizes the human body. the blue liquid in this tank stands for the amount of glucose in the bloodstream. we are looking at the bloodstream of a diabetic person who is hypoglycemic. right now, the blood glucose is very low at 55 mg/dl, and we want to increase it. the red line stands for the target level, which is 280 mg/dl. this goal is definitely high, but for someone who is about to exercise and use up blood glucose in the process, 280 mg/dl is a reasonable level. we will be controlling the outflow of glucose into the blood by entering numbers into this field here. we’re going to be doing this over a span of 100 minutes that will be broken down into 10 periods of 10 minutes each. at each of the 10 time periods, we can choose how much glucose to remove from the body. the glucose will leave through this tube here. however, the human body has its own checks and balances. hormones can release extra glucose into the bloodstream through the tube marked “environment inflow,” and hormones can also absorb glucose through the tube marked “environment outflow.” we don’t know the rate at which the body will receive or absorb glucose, but we’ll try at each step to get the glucose level closer to that red line—the target glucose level. let’s start out by removing, uh, 105 units of glucose from the body. some glucose is entering the body through this tube, and then we see that the 105 units of glucose we entered is leaving the body here. it looks like 95 units of glucose entered the body and there are now 45 mg/dl. we removed more glucose than the body received, so the blood glucose level decreased over this first time period! that’s not what we want. so 10 minutes have passed. it’s time for us to make a decision about how much glucose to remove during the next 10 minutes. let’s try removing 95 units. some glucose is entering the body through this tube, and then we see that the 95 units of glucose we entered is leaving the body here. it looks like another 95 units of glucose entered the body and there are now 45 mg/dl. we removed as much glucose as the body received, so the blood glucose level remained the same! all right, another 10 minutes have passed. let’s make a decision about how much glucose to remove for the next time period. let’s try removing 70 units. glucose is entering the body through here, and then the 70 units are leaving from here. so, it appears that another 95 units of glucose entered the body and there are now 70 mg/dl. we removed less glucose than the body received, so the blood glucose level increased! we’re getting a little closer to the target level. time to make another decision about how much glucose to remove during the next time period. let’s try removing even less than before: 50 units. glucose enters the body here, and then the 50 units leave. again, another 95 units of glucose entered, and there are now 115 mg/dl. we removed less glucose than the body received, so the level is continuing to increase. we’re getting even closer! time to make another decision about how much glucose to remove during the next time period. let’s try removing less than before: 20 units this time. glucose is entering; 20 units leave. again, another 95 units of glucose entered, and there are now 190 mg/dl. because we removed less glucose than the body received, the level is increasing. we’re halfway through the 100-minute time span now! this time, let’s try removing more than last time: 45 units. glucose is entering; 45 units leave. again, 95 units of glucose entered, and there are now 240 mg/dl! because we still removed less glucose than the body received, the level increased. we’re much closer to the target now. this time, let’s remove more glucose: 70 units. glucose enters; 70 units leave. yes, 95 units of glucose entered, and there are now 265 mg/dl. the same principle applies: if the glucose removed is less than the amount received, the overall level in the body increases. all right, let’s try removing even more: 85 units. glucose enters, our 85 units leave, and now there are 275 mg/dl. that’s very, very close to the target amount. we should continue to increase the amount of glucose we’re removing. otherwise, we’ll undershoot the goal. so let’s remove 90 units. 95 units enter—the blood glucose is too high—but then our 95 units leave, and we’re at exactly 280 mg/dl. perfect! now all we need to do is maintain it at that level for the final time period. so we’ve seen that the body receives 95 units during each time period. if we want to keep the blood glucose at the same level, we have to remove exactly as much as the body receives; the two amounts will cancel out. so let’s remove 95 units. yes, 95 enter, 95 leave, and yes! we’re at exactly 280 mg/dl at the end of the 100-minute span. success! 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 16 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments appendix 3 demographic questionnaire 1. what is your sex? [choose one.] a. male [58.40%] b. female [41.35%] c. intersex [0.25%] 2. what is your age? [free response.] (m = 34.25, sd = 10.14) 3. what is the highest level of education that you have completed? [choose one.] a. some high school [0.75%] b. high school [13.28%] c. some college [26.32%] d. associate’s degree [13.53%] e. bachelor’s degree [38.60%] f. master’s degree [5.76%] g. professional or doctoral degree [1.75%] 4. do you have or have you ever had any form of diabetes (not including prediabetes)? [choose one.] a. yes [3.01%] b. no [94.49%] c. i don’t know [2.51%] 5. what kind of diabetes do you have or have you ever had? (please check all that apply.) [this question was shown only if the participant responded “yes” to question 4.] a. type 1 [8.31% of those who said “yes”] b. type 2 [75.08%] c. gestational [24.92%] 6. when were you diagnosed with diabetes? [choose one. this question was shown only if the participant responded “yes” to question 4.] a. less than a year ago [16.67% of those who said “yes”] b. 1 to 3 years ago [16.67%] c. 4 to 6 years ago [25.00%] d. 7 or more years ago [41.67%] 7. are you currently receiving treatment or have you ever received treatment for your diabetes? treatments include insulin shots, insulin pumps, and medications. [choose one. this question was shown only if the participant responded “yes” to question 4.] a. yes [83.33% of those who said “yes” to question 4] b. no, but i have made lifestyle changes [16.67%] c. no, and i have not made lifestyle changes [0.00%] 8. if you currently have diabetes, how well-controlled would you say it is? [choose one. this question was shown only if the participant responded “yes” to question 4.] a. extremely well [0.00% of those who said “yes” to question 4] b. very well [41.67%] c. moderately well [0.00%] d. very poorly [16.67%] e. extremely poorly [25.00%] f. i do not currently have diabetes [16.67%] 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 17 https://doi.org/10.11588/jddm.2018.1.49607 gonzalez et al.: valence matters in stock accumulation judgments 9. does anyone who is close to you (such as a relative, a spouse, or a close friend) have diabetes? [choose one.] a. yes [37.59%] b. no [62.41%] 10. have you ever helped this person(s) manage their diabetes? [choose one. this question was shown only if the participant responded “yes” to question 9.] a. yes [28.67% of those who said “yes” to question 9] b. no [71.33%] 11. have you ever had to help care for someone who has diabetes? [choose one.] a. yes [15.04%] b. no [84.96%] 10.11588/jddm.2018.1.49607 jddm | 2018 | volume 4 | article 3 | 18 https://doi.org/10.11588/jddm.2018.1.49607 original research with a little help. . . : on the role of guidance in the acquisition and utilisation of knowledge in the control of complex, dynamic systems natassia goode1 and jens f. beckmann2 1centre for human factors and sociotechnical systems, faculty of arts, business and law, university of the sunshine coast, australia and 2school of education, durham university, durham, uk many situations require people to acquire knowledge about, and learn how to control, complex dynamic systems of inter-connected variables. numerous studies have found that most problem solvers are unable to acquire complete knowledge of the underlying structure of a system through an unguided exploration of the system variables; additional instruction or guidance is required. this paper examines whether providing structural information following an unguided exploration also improves control performance, and the extent to which any improvements are moderated by problem solvers’ fluid intelligence as measured via raven’s apm. a sample of 98 participants attempted to discover the underlying structure of a computer-simulated complex dynamic system. after initially controlling the system with their independently acquired knowledge, half of the sample received information and an explanation of the underlying structure of the system. all participants then controlled the system again. in contrast to the results of previous studies, the provided information resulted in immediate improvements in control performance. fluid intelligence as measured via apm moderated the extent to which participants benefited from the intervention. these results indicate that guidance in the form of structural information is critical in facilitating knowledge acquisition and subsequent use or application of such knowledge when controlling complex and dynamic systems. keywords: complex problem solving, dynamic systems, knowledge acquisition, fluid intelligence, discovery learning many situations require us to acquire knowledgeabout, and learn to control, dynamic systems of causally connected variables. learning how to heat food in a microwave, respond to emails and buy train tickets are just a few of the many examples that might be encountered in everyday life. a significant body of research has examined the conditions that facilitate the acquisition of knowledge about complex and dynamic systems (de jong & van joolingen, 1998; de jong, linn, & zacharia, 2013). a question that has been addressed less frequently is how is knowledge best acquired to most effectively control such systems? this paper examines whether structural information (i.e., an explanation of how each input affects each output with a diagram that depicts the system variables, the direction and strengths of their interrelation) confers any advantage over an unguided exploration of the system, its variables and their interconnectedness. we also investigate the role of fluid intelligence as measured via raven’s advanced progressive matrices (apm; raven, raven, & court, 1998) in utilising this information. the complex problem solving (cps) approach to investigate how people learn how to control complex and dynamic systems in the real world, a wide variety of computer-based problem-solving scenarios have been developed (e.g. berry & broadbent, 1984; dörner, 1980; funke, 1992; for a review see osman, 2010). the study presented in this article was underpinned by the dynamis or complex problem solving (cps) approach, introduced by funke (1992; 2001; see blech & funke, 2005 for a review). cps tasks consist of a number of inputs (variables that the problem solver intervenes on) and outputs (outcomes that are generated by the system) that are governed by a set of linear equations (this is referred to as the underlying structure of the system). the systems are dynamic in the sense that the current output value depends on the value of the input selected by the problem solver, and the previous value of the output. some cps tasks also include autonomic changes, so that the values of particular output variables change independently on each trial. a typical experimental procedure using this approach consists of an initial exploration phase in which problem solvers are required to diagnose the underlying structure of the system. in a subsequent control phase they are instructed to control the system by manipulating the input variables to reach and maintain specific goal values of the output variables. this means that separate measures of structural knowledge and control performance can be derived, and the cognitive processes of knowledge acquisition and knowledge application can be studied independently. corresponding author: : dr. natassia goode, centre for human factors and sociotechnical systems, faculty of arts, business and law, university of the sunshine coast, locked bag 4, maroochydore dc qld 4558, australia. e-mail: ngoode@usc.edu.au 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 1 mailto:ngoode@usc.edu.au https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems the role of structural knowledge in controlling a complex, dynamic system the acquisition of knowledge through an unguided exploration of a system and its interrelated variables can be characterised as discovery (de jong & van joolingen, 1997; 1998) or inquiry-based learning (lazonder & harmsen, 2016). in this approach, the learner is seen as an independent and active agent in the process of knowledge acquisition, as they must develop hypotheses, design experiments to test them, and appropriately interpret the data (de jong & van joolingen, 1998). in educational settings, the problems that learners experience with unguided inquiry-based learning are well documented. a recent meta-analysis of 164 studies found that across domains, unguided inquirybased learning is less effective than explicit instruction for acquiring knowledge. however, the advantage is reversed when learners receive adequate guidance during inquiry-based learning; they learn more than those taught using explicit instruction (alfieri, brooks, aldrich, & tenenbaum, 2011). numerous studies in educational settings have also found that learners need at least some guidance during exploration in order to facilitate the acquisition of complete structural knowledge (de jong & van joolingen, 1998; de jong, 2005, 2006; kirschner, sweller & clark, 2006; lazonder & harmsen, 2016; mayer, 2004). similarly, in research with cps tasks, it has been found that most problem solvers are unable to acquire a complete or accurate representation of the underlying structure of the system through an unguided exploration of the system variables (beckmann, 1994; beckman & guthke, 1995; burns & vollmeyer, 2002; funke & müller, 1988; kluge, 2008; kröner, 2001; kröner, plass & leutner, 2005; müller, 1993; osman, 2008; schoppek, 2002; vollmeyer, burns, & holyoak, 1996). these studies also report a consistent positive relationship between the amount of structural knowledge that is acquired and the quality of problem solvers’ control performance (see knowledge hypothesis). it is worth noting that the majority of these studies are correlational and therefore do not allow for causal interpretations of the reported association between knowledge and control performance. the study of goode and beckmann (2010) is one of the rather rare examples where an experimental design was adopted to test the causal nature of this association. they found that control performance improved systematically as the amount of structural information available to participants increased, and that at least some structural knowledge was required to perform better than simulated random control interventions. this study illustrates that control performance is causally dependent on the amount of knowledge that is acquired about the underlying structure of the system. the impact of providing structural information on control performance in this study we are interested in determining whether supplementing problem solvers exploration of a cps task with structural information results in better control performance than an unguided exploration of the system variables. a study conducted by süß (1996, p. 166-177) suggested that providing structural information benefits structural knowledge, but confers no advantage for control performance. this study used a dynamic decision-making task called "tailorshop", which is intended to simulate a small business that produces and sells shirts. the system consists of 24 variables inter-connected by 38 relations. the values of twentyone of these variables are represented on the userinterface, and three are invisible. twelve of the variables can be manipulated directly by participants, and the goal is to increase the value of the variable "company value" (danner et al., 2011). the underlying structure is intended to reflect problem solvers’ prior knowledge of similar "real world" scenarios (see beckmann & goode, 2014 for a discussion of the problems associated with this assumption). in süß’s (1996) study, one group of participants explored "tailorshop" while another group studied a causal diagram for the same time period, before both performing the control task. a control group performed the control task without any prior exploration or intervention. structural knowledge was assessed prior to, and after, interacting with the task. the group that studied the causal diagram acquired more structural knowledge than the exploration group; there was no difference between the exploration and control group. surprisingly, there was no differences in control performance across the conditions. thus, the causal diagram appeared to benefit structural knowledge but not control performance. however, these results should be interpreted with caution, as a strong associative link was found between structural knowledge prior to interacting with the task and control performance across all conditions. on the one hand, this could be interpreted as an indicator of the "ecological validity" of the system. on the other hand, the already substantial correlation between prior knowledge and control performance limits the potential impact of the interventions (i.e. knowledge acquisition through causal diagram or exploration), and introduces a potential source of individual differences among participants. a number of studies using abstract systems (putzosterloh, 1993; preußler, 1996, 1998) also report similar results to süß (1996). these studies suggest that problem solvers require a period of active practice applying structural information before they demonstrate an advantage over knowledge acquired through unguided exploration (putz-osterloh, 1993; preußler, 1996; preußler, 1998). these studies used the cps task "linas", which contains four inputs and seven outputs interconnected by fifteen linear relations. the 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 2 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems labels given to the system variables did not refer to objects in the real world (e.g. "bulmin", "ordal", "trimol") to control for the influence of prior knowledge. in putz-osterloh’s (1993) study an experimental group (given a causal diagram) and a control group were first instructed to diagnose the underlying structure of a system by exploring the system variables. the causal diagram illustrated the input and output variables as rectangles linked by arrows to indicate the relationships between them; the meaning of the diagram was verbally explained by the experimenter. against expectations, the experimental group performed no better than the control group in a subsequent control task. however, in a follow-up study six months later, the experimental group had better control performance than the control group. given that the advantage to performance was only evident after participants had considerable exposure to the task, putz-osterloh (1993) suggested that problem solvers might need a period of practice applying their knowledge in order to benefit from structural information. however, caution is urged in interpreting these findings due to the relatively small sample size (n = 16 25 per condition). putz-osterloh’s (1993) interpretation of her findings found further support in a series of studies conducted by preußler (preußler, 1996, 1998). in the first experiment, participants in an experimental group were instructed using standardised examples as to how each input affected each output; a control group explored the system without assistance. no differences in control performance were found. in line with putzosterloh’s (1993) study, it was argued that the structural information did not provide an advantage to control performance because participants did not have a chance to practice applying it (preußler, 1996). therefore, in a later experiment, preußler (1998) gave an experimental group a causal diagram, and in addition they completed practice tasks in which goal values had to be attained by manipulating the input variables. each task was repeated until the problem solver reached the target values. the control group had to perform the same practice tasks, although without having the diagram available and without having the chance of retries until the correct response was found. this time the experimental group had better control performance (preußler, 1998). these findings have been interpreted as demonstrating that structural knowledge needs to be either actively acquired or practiced in the context of application in order to benefit control performance (preußler, 1996, 1998; schoppek, 2004), a notion that resonates with the broader literature on "learning by doing" and cognitive skill acquisition (e.g. anderson, 1993). an alternative explanation is that the "guidance" given to participants in these studies was not sufficient to immediately promote structural knowledge (goode & beckmann, 2010). specifically, goode and beckmann (2010) argue that in putz-osterloh’s (1993) study participants may not have understood how the diagram related to changing the input and output variables. in order to understand the meaning of the diagram, problem solvers may require a direct demonstration of how the inputs affect the output. whilst problem solvers in preußler’s, (1996) study did receive an explanation as to how the inputs affect the outputs, they did not receive a structural diagram. it is likely that they may have been unable to recall this information during the control task. goode and beckmann (2010) developed instructional material to overcome these limitations, and compared control performance under conditions of complete, partial or no structural information. they used a cps task with three inputs and three outputs, interconnected by six linear relations. as in putz-osterloh’s (1993) and preußler’s (1996, 1998) studies, the labels given to the system variables did not refer to objects in the real world (e.g. "a", "b", "c"). the instructional material included an audio-visual demonstration of how the inputs affect the outputs, and the formation of a causal diagram as a result of the interventions shown. this information was then available on screen during the subsequent control phase. they found that problem solvers who received complete information were significantly better at controlling the system than those who received partial or no information. this study illustrates that structural information can have an immediate positive impact on the quality of control performance, without a period of goal-orientated practice or prior exposure to the system. the findings suggest that the effectiveness of providing structural information is more a matter of accessibility (i.e., instructional design), rather than practice. another factor that appears to influence the application of structural information is the complexity of the underlying structure of the cps task. in a follow-up study to goode and beckmann (2010) using the same methodology, goode (2011) provided participants with either complete, partial or no structural information regarding the underlying structure of one of four cps tasks, which varied in system complexity. the complexity of the tasks was manipulated by increasing the number of relations that had to be processed in parallel in order to make a decision about a particular goal state (i.e., the connectivity of the goal state). the study showed that it was more difficult for subjects to understand and utilise information as system complexity increased; floor effects on performance were observed when three relations had to be considered in parallel to make a decision about a goal state. this may also explain why previous studies have found that structural information did not benefit control performance (putz-osterloh, 1993; preußler, 1996, as "linas" is at this level of complexity. süß’s (1996) study employed a task with many more variables and relations. this may have made it more difficult for subjects to understand and utilise the information that they were given. nevertheless, the issue of whether the provision of structural information results in immediate improvements for control performance after knowledge has already been acquired through an unguided exploration remains unresolved. goode and beckmann’s (2010) 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 3 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems and goode’s (2011) studies did not include a comparison with a group who acquired knowledge through an unguided exploration of the system variables. the aim of the current study is to determine whether structural information can directly benefit control performance. to allow direct comparisons to control performance scores from goode and beckmann (2010), this study will use the same cps task, intervention and performance goals. in our proposed design, participants will first explore a cps task, and try to independently acquire knowledge about its underlying structure. they will then try to control the system to reach specific goal values of the output variables. participants in an experimental condition will then watch an audio-visual demonstration of how the inputs affect the outputs, and will observe the formation of a causal diagram as a result of the interventions shown (as per the procedure reported in goode and beckmann, 2010 and goode, 2011). both the experimental and the control group will then control the system again. if structural information can be immediately utilised, then problem solvers who receive structural information should show an improvement in their control performance, and should be better controlling the system than those who have to rely on the knowledge they acquired independently (see information hypothesis). the role of fluid intelligence in benefiting from structural information a second issue addressed by this study is whether benefiting from structural information is dependent on the cognitive abilities of the problem solver. goldman (2009) has argued that learner characteristics, such as prior knowledge and cognitive ability, determine whether benefits are derived from instructional settings. the cps task employed in the current study uses abstract variable labels and a domain-neutral cover story. this aims at minimising confounding effects of individual differences in domain-specific knowledge on the results (for a detailed discussion of the argument for using abstract systems in complex problem solving research see beckmann & goode, 2014). consequently, the guidance information provided (i.e., the intervention) is expected to be relatively novel for all participants, so that individual differences in utilising it can largely be attributed to the cognitive abilities of the problem solver. previous findings show that when explicit information about system structure is provided, control performance is consistently moderately to strongly correlated with fluid intelligence (bühner, kröner, & ziegler, 2008; goode & beckmann, 2010; kröner et al., 2005; putz-osterloh, 1981; putz-osterloh & lüer, 1981; wüstenberg et al., 2012). therefore, in the current study it is predicted that under conditions where participants receive information, the extent of improvements in control performance will be a function of their fluid intelligence as measured via apm. in comparison, the extent of improvements in control performance when participants do not receive additional information should be due to practice applying their partial representations of the underlying structure, and therefore less strongly related to fluid intelligence as measured via apm (see intelligence hypothesis). aims and hypotheses in summary, the main goal of this paper is to determine whether guidance in the form of structural information results in an immediate improvement in controlling a cps task after knowledge has already been acquired through an unguided exploration of the system variables. a secondary aim is to examine whether any improvements are moderated by fluid intelligence as measured via apm. firstly, it is hypothesised that participants who acquire more knowledge during the exploration phase about the underlying structure of the system should show better control performance prior to any instructional intervention (knowledge hypothesis). secondly, participants who receive structural information should improve their control performance more than those who receive no additional information (information hypothesis). thirdly, under conditions where participants receive information, the magnitude of this improvement will be a function of their fluid intelligence as measured via apm (see intelligence hypothesis). method participants ninety-eight first year psychology students at the university of sydney, australia, participated for course credit. nine participants failed to complete all tasks therefore their data were excluded from further analysis. the available sample size of about 100 would guarantee sufficient statistical power (1 − β ≥ .80) in identifying at least medium effects (d = 0.50) at a significance level of α ≤ .05 (one-tailed) in the planned analyses. design participants were randomly assigned to one of two conditions (45 participants in the information condition, 44 participants in no information condition). as problem solvers were required to control the system on two occasions this resulted in a 2 x 2 design. the within-subjects factor was control performance (cycle 1 and cycle 2). the between-subjects factor was whether or not they received structural information (information and no information). the aim of the intervention in the information condition was to encourage participants to develop a complete and accurate representation of the underlying structure of the system. the no information condition represented a passive control group. participants were assessed on 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 4 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems their structural knowledge, control performance for cycle 1, control performance for cycle 2 and performance in a test of fluid intelligence. vary-one-thing-at-time (votat) strategy use during the exploration phase was also assessed as part of this study; this measure is not reported in this paper. figure 1 displays the procedure of the experiment for each condition and indicates which performance measures were collected in each phase of the experiment. figure 1. diagram for the procedure of the experiment, illustrating the phases of the experiment by condition and indicating which performance measures were collected in each phase. description of cps task the cps task was programmed using adobe flash 8 and captivate 3, and administered on pcs (see (goode, 2011 for an extensive description of all of the cps task elements, including step-by-step screenshots of the instructional intervention and cps task, and transcript of the explanation). the underlying structure was originally developed by beckmann (1994, see also beckmann & goode, 2014; goode & beckmann, 2010; goode, 2011), and is based on the approach to complex problem solving that was developed by funke (1992) in his dynamis research project. it consists of three input and three output variables that are connected by a set of linear equations: xt+1 := 1.0 ∗ xt + 0.8 ∗ at + 0.8 ∗ bt + 0.0 ∗ ct yt+1 := 0.8 ∗ yt + 1.6 ∗ at + 0.0 ∗ bt + 0.0 ∗ ct zt+1 := 1.2 ∗ zt + 0.0 ∗ at + 0.0 ∗ bt + a.0 ∗ ct xt,yt and zt denote the values of the output variables and at,bt and ctdenote the values of the input variables during the present trial whilst xt+1,yt+1 and zt+1 denote the values of the output variables in the subsequent trial. important for the operationalization of knowledge acquisition, the system can be considered balanced, i.e., from 12 possible relationships between variables 6 do exist and among the three output variables one is subject to a "positive" eigendynamic (i.e., an autoregressive dependency that results in a monotone increase), one is subject to a negative eigendynamic (i.e., an autoregressive dependency that results in a monotonic decrease) and one is subjected to no eigendynamic and all three output variables have a double dependency. previous research has found that the presence of a semantically meaningful context has an unpredictable, often negative, effect on acquisition of structural knowledge (beckmann, 1994, see also beckmann & goode, 2014; burns & vollmeyer, 2002; lazonder, wilhelm, & hagemans, 2008; lazonder, wilhelm & van lieburg, 2009.) therefore, in order to ensure that the system was relatively novel for all participants and thus control the potential influence of prior knowledge, the input and output variables are labelled with letters. as can be seen in figure 2, the output variables are labelled x, y and z, whilst the input variables are labelled a, b, and c. the user-interface is in a non-numerical graphical format, in order to encourage the formation of mental representations more aligned with the development of causal diagrams. in accordance with the principles of cognitive load theory (clt), this should minimise the cognitive activities that are not directly relevant to the task (extraneous cognitive load, e.g. sweller, 1994; for a review see beckmann, 2010; sweller, 2010). figure 2 shows that the values of the input variables are represented as bars of varying heights in the boxes on the input variables, where positive values are shown above the input line and negative values are shown below. each box represents the value of the input variable on a single trial, and in total seven trials can be conducted before the values are reset (representing a single cycle). although the numerical values of the inputs are not available to participants, the inputs are varied in increments of one unit, within the range of -10 to 10. on each trial, participants have to set value of each input variable. this is done step by step, such that after they set the value for input a, they then have to set the value for input b and finally input c before the resulting values of the output variables are displayed as line graphs below. previous inputs and there subsequent effects remained on the screen (decision history) for each cycle of seven trials. there was no time limit. during the control phase of the task, the goals are indicated as dotted lines on the graphs for the output variables (as show in figure 2). the target values used for control cycle 1 and 2 were of comparable in difficulty, i.e., the euclidean distance (see operationalization of control performance) between start values and target values was the same for both cycles. dependent variables and individual differences measure structural knowledge participants’ structural knowledge was assessed by asking them to create causal diagrams of the relationships between the input and output variables at the end of each trial during the exploration phase. the 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 5 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems figure 2. screenshot of the system interface, as presented in the information condition after the instructional phase. the goals are indicated as dotted lines on the graphs for the output variables. the underlying structure of the system is represented on screen as a causal diagram, where the arrows represent the relationships between the variables, while the positive and negative signs denote the direction of the relationship, and the letters the relative strength. in this example, for the fifth trial of seven all input variables were increased (a half the strength and b and c using the maximum); as a result, output x and y increased whilst output y decreased slightly. diagram that was generated on the final exploration trial (after 2 cycles of 7 trials), before the control phase was used to derive a structural knowledge score. using a procedure introduced by beckmann (1994), the operationalisation of the knowledge acquisition performance is based on a threshold model for signal detection (snodgrass & corwin, 1988). the proportion of correctly identified relationships was adjusted for guessing by subtracting the proportion of incorrectly identified relationships. the final score has a theoretical range from -.98 to .98, where a score below zero indicates inaccurate knowledge, whilst a score above zero indicates more accurate knowledge. control performance the scoring procedure used was based on beckmann’s (1994) scoring system. control performance was calculated by determining the euclidean distance between the vectors of actual and optimal values of the input variables. the ideal values for each input variable, i.e., the intervention that would result in the system reaching the goal state, were calculated by using the values of the output variables on the previous trial and the goal output values to solve the set of linear equations underlying the system. as the range of possible input values is restricted for the system used in this study (i.e., between -10 and 10), it might not be possible to bring the system into the goal state by a single intervention. in cases as these, i.e., when the ideal values fall outside this range, the values were adjusted to the nearest possible values, which then constituted the optimal values. in cases when the ideal values are within the range of possible inputs, the ideal values were used as the optimal input. for the system at hand the theoretical range of this score is 0 to 34, where a lower score indicated a smaller deviation from optimal control interventions and therefore better performance. fluid intelligence the percentage of correct responses on an abridged version of the raven’s apm (raven et al., 1998) was used as an indicator of fluid intelligence. this version of the apm included 20 items from the original 36item test, created using the odd numbered items plus 2 additional even numbered ones from the most complex items (i.e. items 34 and 36). procedure the cps task and the apm were presented to participants on pcs, over two separate sessions. the cps task for each condition was installed on alternate computers at the study venue. on arrival at the first session, participants chose a computer, which determined their condition. the cps task began with a set of instructions that explained the user-interface, how to change the values of the input variables and how to record and alter the causal diagram. at the end of the instructions, participants were informed that the task consists of two phases. firstly, they had to explore the system to discover the underlying structure of the system and then control the system to reach certain values of the output variables. the goal values were not revealed until the beginning of each control cycle. the exploration phase then began, in which participants were prompted to explore the system for two cycles of 7 trials each by changing any of the input variables and observing the effect on the output variables displayed in the graphs. at the end of each trial, participants had to record what they had learned about the system using the causal diagram construction feature that was displayed on the screen. the causal diagram could be altered using a set of twelve buttons (one for each possible relationship in the system) at the bottom of the screen. each button referred to a particular relationship in the system. using these buttons, participants could record if they thought there was a relationship between two variables or not, or if they thought the output variables changed independently (or not). they could also specify the direction of the effect, and its perceived strength. after the exploration phase, participants then had to control the system by manipulating the inputs to reach set values of the outputs for seven trials, which were indicated as dotted lines on the output graphs (control cycle 1). the causal diagrams they had constructed during the exploration phase remained on screen, providing access to the structural information they had individually extracted. in the information condition, participants then watched an instructional video that explained the actual underlying structure of the system. the instruc10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 6 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems tions were designed in accordance with the principles of clt, and the aim was to reduce the amount of cognitive activities that problem solvers would have to undertake to translate the information provided into knowledge about the system (minimising extraneous cognitive load). in particular, previous research has shown that learning is facilitated when explanations of graphical information is presented aurally, rather than as text (modality effect, tabbers, martens, & van merriënboer, 2004). therefore, the instructions consisted of a recording of seven intervention trials with an accompanying audio narration, which explained the actual underlying structure of the system. after each trial, the narrator explained how each of the outputs had changed, and how this reflected the underlying structure of the system. the respective causal diagram was constructed on screen in parallel, to record this information. participants in the no information condition did not receive any additional information during this phase; representing a passive control group. all participants then had to control the system again for seven trials, with different goals indicated on the output variables (control cycle 2). in the information condition the causal diagram displayed onscreen was the correct and complete one. in the no information condition, the causal diagram that participants had constructed in the initial exploration cycles was displayed onscreen. in a subsequent session, approximately one week later, participants completed the apm. data analysis to test our main hypotheses we conducted a series of hierarchical linear modelling analyses using the hlm software package (raudenbush, bryk, cheong, & congdon, 2000). this approach allows us to model individuals’ change in performance from control cycle 1 to control cycle 2 as function of person-level variables (see raudenbush & bryk, 2002)). we used a two level model in which performance in control cycle 1 and control cycle 2 (level 1) were clustered within individuals (level 2). the specific analyses that we performed to test each hypothesis are discussed in the results section. results the following sections first present preliminary analyses undertaken to test whether the random assignment to condition was effective, and justify our treatment of the variables in the following analyses. the findings in relation to the three hypotheses are then presented. intercorrelations (pearson) between the variables used in this study as well as descriptive statistics and distributions are presented in table 1. the distributions of the variables indicate that assumptions of normality were met. equivalence between the conditions prior to the intervention to examine the effect of the intervention and fluid intelligence on control performance, firstly, it was necessary to check whether the conditions differed prior to the intervention. the amount of structural knowledge acquired by participants during the exploration phase did not differ by condition; t(87) = -.09, p = .93, d = 0.02, nor did their control performance scores in cycle 1; t(87) = -.05, p = .96, d = 0.01, or scores on the apm; t(87) = 1.59, p = .12, d = 0.34. this suggests that the procedure used to randomly allocate participants to the conditions was effective. structural knowledge acquired during the exploration phase for both conditions, the amount of structural knowledge that was acquired during the exploration phase was significantly greater than zero; m = .22, sd = .34, t(88) = 6.00, p < .01. this indicates that on average, participants had acquired some knowledge of the underlying structure of the system prior to the first control cycle. however, the range of structural knowledge scores, -.49 to .98, indicates that participants differed widely in the amount of knowledge that they were able to acquire about the underlying structure of the system during the initial exploration phase. that is, while some participants were able to acquire complete knowledge of the underlying structure of the system (one participant in the no information condition, and two participants in the information condition), others acquired a rather incorrect representation of the underlying structure. this also suggests that for the majority of problem solvers the provision of structural information could potentially represent a significant source of new information about the underlying structure of the system. internal consistencies internal consistency analyses were conducted to determine the variability in control performance scores across the trials and for different goal states as an estimate of the reliability of the dependent variables. internal consistency was good across the first control cycle (ω = .85, 95% ci [.79, .89]) and the second control cycle (ω =.93, 95% ci [.89, .95]) (dunn, baguley, & brunsden, 2014). this indicates that problem solvers are rather consistent in their performance and it justifies averaging the scores across each control cycle. a further analysis indicated that the reliability of the apm scores was acceptable across the 20 items (ω = .76, 95% ci [.59, .83]) (dunn et al., 2014). knowledge hypothesis in support of the knowledge hypothesis, across the conditions, there was a significant moderate negative relationship between structural knowledge scores and control performance in cycle 1 (r = -.34, p < .01). 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 7 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems table 1. descriptive statistics, distributions and inter-correlations (pearson) between the variables for each condition. m min. max. kurtosis skewness 2 3 4 (sd) (se) (se) no information condition 1. structural knowledge .21 -.49 .98 -.40 .38 -.38* -.51** .45** (.33) (.70) (.36) n = 44 2. cycle 1 13.88 4.52 20.77 -.54 -.58 . . . .56** -.24 (4.20) (.70) (.36) 3. cycle 2 13.21 2.50 25.01 -.58 -.31 . . . . . . -.18 (5.28) (.70) (.36) 4. apm 63.75 30 95 -.23 -.07 . . . . . . . . . (16.32) (.70) (.36) information condition 1. structural knowledge .22 -.33 .98 -.47 .29 -.31* -.36* .26 (.35) (.69) (.35) n = 45 2. cycle 1 13.92 2.82 24.09 .75 -.49 . . . .27 -.11 (4.14) (.69) (.35) 3. cycle 2 10.24 2.10 22.33 -1.03 .35 . . . . . . -.52** (5.29) (.70) (.35) 4. apm 58.00 30 95 -.85 .06 . . . . . . . . . (17.75) (.69) (.35) note. ∗ p < .05. ∗∗ p < .01. this indicates that participants who acquired more knowledge about the underlying structure of the task produced smaller deviations from the set of optimal control interventions, and were therefore better at controlling the system (i.e., reaching and maintain the goal values). this advantage persisted in cycle 2 even when participants received additional instructions with regard to the underlying structure of the system (r information = -.36, p < .01, r no information = -.51, p < .01). information and intelligence hypotheses in order to determine whether the provision of structural information facilitates control performance (information hypothesis) and whether the extraction of knowledge from information in this context is determined by fluid intelligence (intelligence hypothesis) we conducted a series of two-level hlm analyses. firstly, a random coefficient regression analysis was conducted to assess whether control performance changed across the two control cycles. at level 1, each participant’s performance was represented by an intercept term, which denoted their mean performance across control cycle 1 and control cycle 2, and a slope, that represented their change in performance from control cycle 1 to 2. control cycle (1 or 2, effect coded as -.5 and .5, respectively) was entered as an independent variable at this level. the mean control performance scores and the change in control performance then became the outcome variables in a level-2 model, in which they were modelled as random effects. the results of this analysis are presented in the top section of table 2. this analysis indicated that the mean control performance score was 12.81 across control cycle 1 and 2 and on average, control performance scores improved by 2.19 points from control cycle 1 to 2. the change in control performance was significantly different from zero; t(88) = -3.86, p < .001. there was significant differences between problem solvers in terms of their mean control performance scores and the change in their control performance; χ2 = 2867895259.6, df = 88, p < .001 and χ2 = 1274245149.4, df = 88, p < .001, respectively. variability in problem solvers’ change in control performance from control cycle 1 to 2 accounted for 64% of the total variability in control performance scores. these findings are an important prerequisite for the subsequent analyses, as they indicate that individuals show substantial variability in their mean control performance and the extent to which their control performance changed across the two cycles. we conducted an interceptand slope-as-outcomes regression analysis in which mean control performance and the change in control performance from control cycle 1 to 2 were modelled as a function of condition (as an effect coded variable indicating condition: -.5 = no information, .5 = information) and scores on the apm at level 2. the level 1 model was the same as in the random coefficients regression analysis. the results of this analysis are presented in the middle panel of table 2. with regard to the information hypothesis, this analysis indicated that information had a significant impact on average control performance scores, and the change in control performance from control cycle 1 to 2, controlling for the effects of fluid intelligence; t(86) = -2.32, p < .05, ∆r2 = 5% and t(86) = -3.19, p < .01, ∆r2 = 9%, respectively. participants in the information condition had an average control performance score 1.91 points better than those in the no information condition. similarly, the change in control performance for participants in the information condition was 3.42 points better than those in the no information condition. in support of the information hypothesis, these results indicate that participants who received additional information with regard to the underlying structure of the system performed better on average, and improved at a greater rate from control cycle 1 to control cycle 2 than those who did not receive information. 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 8 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems with regard to the intelligence hypothesis, the analysis also indicated that apm scores were significantly linked to control performance scores as well as to their change from control cycle 1 to 2, controlling for the effects of condition; t(86) = -3.21, p < .01, ∆r2 = 7% and t(86) = -2.17, p < .05, ∆r2 = 2%, respectively. on average, a one-point increase in scores on the apm was associated with a 0.08 better score on average control performance, and a 0.07 better score on the change in performance scores from control cycle 1 to 2. these results indicate that on average, participants with a higher apm scores, tended to perform better overall, and improved more from control cycle 1 to 2. in order to determine whether the effect of fluid intelligence on control performance differed by condition a third analysis was conducted in which an interaction term (apm x condition) was added to the main effects of the variables at level 2. the results are presented in the bottom panel of table 2. there was no evidence that fluid intelligence (as measured via apm scores) has an effect on mean control performance scores varied by condition, as the interaction term was small and insignificant; t(85) = -.68, p = .50, ∆r2 = 0%. however, the effect of fluid intelligence on the change in performance from control cycle 1 to control cycle 2 did vary significantly by condition; t(85) = -2.48, p < .05, ∆r2 = 3%. in further support of the intelligence hypothesis, this suggests that the change in performance scores for participants who received information was more strongly related to fluid intelligence than for participants who did not receive information. discussion this study examined whether: (1) providing guidance in the form of structural information results in an immediate improvement in controlling a cps task after knowledge has already been acquired through an unguided exploration of the system variables; and (2) any improvements are moderated by fluid intelligence as measured via apm. in summary, support was found for the knowledge hypothesis, as participants who acquired more structural knowledge during the exploration phase had better control performance in control cycle 1. support was also found for the information hypothesis, as participants who received structural information improved their control performance more than those who received no information. finally, support was found for the intelligence hypothesis, as when participants received information, their change in control performance scores from control cycle 1 to 2 was more strongly related to apm performance scores than the change in control performance scores in participants who did not receive structural information. these results suggest that guidance in the form of structural information does confer an additional advantage in controlling a complex system over independently acquired knowledge, and that problem solvers can translate such information into effective control actions without practice. however, the extent to which problem solvers can benefit from such information appears to be moderated by their fluid intelligence as measured via apm. as in previous studies, it was found that the amount of structural knowledge acquired by participants is strongly related to the quality of their control performance. in addition, in line with other studies, few participants were able to acquire complete knowledge of the underlying structure of the system during the exploration phase (beckmann, 1994; beckmann & guthke, 1995; burns & vollmeyer, 2002; funke & müller, 1988; müller, 1993; kröner, 2001; kröner et al., 2005; kluge, 2008; osman, 2008; schoppek, 2002; vollmeyer et al., 1996). these findings provide further evidence that learners require additional support or guidance to acquire complete and accurate knowledge about complex and dynamic systems; they are unlikely to do so through unguided discovery learning. this study also found that guidance in form of providing structural information resulted in an immediate improvement in control performance. in contrast to previous studies (preußler, 1998; putz-osterloh, 1993; süß, 1996), these findings suggest that a period of active practice is not required to translate knowledge into effective control actions. one caveat to this conclusion is, however, that the task used in other studies could be considered more complex than the task used in the current study. further studies are required to determine whether the findings observed in this study generalise to more complex tasks. nevertheless, the results support and extend upon the findings of goode and beckmann (2010) in important ways. as in goode and beckmann’s (2010) study, the results of the present study show that if problem solvers receive a direct demonstration as to how each input affects each output, and have access to this information in form of a causal diagram during control performance, then they will be able to immediately translate this information into the appropriate actions for controlling the system. this provides further support for the claim that supporting information should be available throughout the task (berry & broadbent, 1987; gardner & berry, 1995; leutner, 1993). indeed, comparing the findings from the current study with goode and beckmann’s (2010) study, which employed the same cps task, instructional method and participants drawn from the same university student population, suggests that an unguided, albeit "active" exploration of the system variables provides no advantage for control performance whatsoever. in goode and beckmann’s (2010) study, participants received structural information and then were required to immediately control the system variables; mean control performance scores were 10.33 (sd = 5.25) in the comparable control cycle. in the current study, mean control performance scores were 10.24 (sd = 5.29). this suggests that the actively acquired knowledge and practice controlling the system variables resulted in no net advantage for control performance over simply providing structural informa10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 9 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems table 2. results of the random coefficients regression (rcr) analysis and the interceptand slope-as-outcome regression (isaor) analyses variable parameter estimate se t ∆r2 rcr analysis mean control performance (β00) 12.81 0.43 30.10** mean change in control performance (β10) -2.19 0.57 -3.86** isaor analysis 1 intercept-as-outcome condition (β01) -1.91 0.82 -2.32* 5% apm (β02) -0.08 0.02 -3.21** 7% slope-as-outcome condition (β11) -3.42 1.07 -3.19** 9% apm (β12) -.07 0.03 -2.17* 2% isaor analysis 2 intercept-as-outcome condition (β01) -1.89 .82 -2.33* 5% apm (β02) -0.08 .02 -3.32** 7% condition x apm (β03) -0.03 .04 -0.68 0% slope-as-outcome condition (β11) -3.38 1.03 -3.29** 9% apm (β12) -0.06 .03 -2.37* 2% condition x apm (β13) -0.13 0.05 -2.48* 3% note. ∗ p < .05. ∗∗ p < .01. level 1 model (for all analyses): yti = π0i + πli(control cycle), where yti is person i’s control performance score at time t, π0i is their mean control performance score and πli is their change in control performance from control cycle 1 to control cycle 2. level 2 model for rcr analysis: π0i = β00 + r0i and πli = β10 + rli level 2 model for isaor analysis 1: π0i = β00 + β01(condition) + β02(apm) + r0i and πli = β10 + β11(condition) + β12(apm) + rli level 2 model for isaor analysis 2: π0i = β00 + β01(condition) + β02(apm) + β03(condition x apm) + r0i and πli = β10 + β11(condition) + β12(apm) + β13(condition x apm) + rli note: when intercepts are outcomes, ∆r2 is expressed as a percentage of the variability in mean control performance scores. when slopes are outcomes, ∆r2 is expressed as a percentage of the variability in the change in control performance. 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 10 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems tion. the finding that participants in the no information condition showed little improvement across the control cycles further reinforces this claim. this suggests that practice at controlling the system does not have a significant impact upon the quality of problem solvers’ control performance, especially if the control goals change. indeed, under both conditions the high level of internal consistency in control performance scores further suggests that problem solvers do not dramatically change their control behaviours through practice. subsequently, improvements in control performance with practice are rather limited. in other words, these results seem to suggest that no spontaneous optimisation of control behaviour (i.e. learning by doing) takes place. the question, however, of whether longer periods of active practice after exposure to guiding information, would lead to further improvements, could be of interest in future studies. these findings are consistent with recent findings regarding cps training. kretzschmar and süß (2015) trained participants using five different computerbased complex dynamic systems, and their performance was tested in a sixth system. interacting with each system involved a goal-free exploration phase and a control phase. they found that trained participants were able to acquire more knowledge about the final system than an untrained control group. however, there was no difference in control performance. in line with the findings from our study, this suggests that for each control intervention, the problem solver must apply their knowledge to generate the correct action for that specific situation. with regard to the relationship between fluid intelligence and control performance, it should first be acknowledged that the generalisability of the results from this study may be limited by the narrow operationalisation of fluid intelligence via apm. whilst the apm has been traditionally seen as the empirical reference point of fluid intelligence, more recent discussions (e.g., gignac, 2015) are critical of studies that rely on this single test score. this on-going debate should be kept in mind while reading the following interpretation of the findings. the results of this study are in line with previous studies that have shown that when structural information is provided, control performance is moderately to strongly correlated with fluid intelligence (bühner et al., 2008; goode & beckmann, 2010; kröner et al., 2005; putz-osterloh, 1981; putz-osterloh & lüer, 1981; wüstenberg et al., 2012). this suggests that more intellectually capable problem solvers are able to make use of structural information more effectively than individuals who are less so. this study extends on these previous findings, as it was also found that fluid intelligence as measured via apm had an impact on the acquisition of structural knowledge during the exploration phase, and subsequently in controlling the system when only incomplete knowledge was available. these results suggest that intellectually more capable problem solvers are at a double advantage in comparison to those who score lower on fluid intelligence with regard to acquiring and utilising structural knowledge: they are able to acquire more knowledge without assistance, and they also benefit more from guidance. this implies a necessity to tailor instructions to problem solvers’ intellectual capacity, an aspect often neglected in educational contexts. in other words, and as frequently advanced by snow (1986; 1989; snow & lohman, 1989; snow & yallow, 1982), individual differences among learners still "...present a pervasive and profound problem to educators" (snow, 1989, p. 1029). the results with regard to the role of fluid intelligence also provide support for the claim that in previous studies (preußler, 1996; putz-osterloh & lüer, 1981; putz-osterloh, 1993), the effect of structural information on control performance may have been masked by individual differences in the ability to understand and utilise the information. in addition, preußler’s (1998) finding that all of her participants were able to effectively utilise information after a period of active practice, may now be interpreted in a different light. it may be that practice per se is not the essential component, but rather that some problem solvers require more extensive guidance in order to be able to make sense of the information that is provided. overall, our results imply that guidance in the form of structural information has the potential to provide benefits over and above the effects of discovery learning. the crucial aspect of guidance, however, is that it is well designed. these findings are in line with those from other domains that show that learners experience many difficulties when they are required to independently acquire knowledge without guidance (de jong & van joolingen, 1998; mayer, 2004; de jong, 2005; 2006; kirschner et al., 2006). acknowledgements: this manuscript is based on a chapter from a doctoral thesis (study 1; goode, 2011). the research was undertaken at the university of sydney and funded through an australian postgraduate award. this research was also partly supported under australian research council’s linkage project funding scheme (project lp0669552). declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. handling editor: andreas fischer author contributions: the authors contributed equally to this work. supplementary material: supplementary material available online. 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 11 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: goode, n. & beckmann, j. f. (2016). with a little help. . . : on the role of guidance in the acquisition and utilisation of knowledge in the control of complex, dynamic systems. journal of dynamic decision making, 2, 5. doi:10.11588/jddm.2016.1.33346 received: 19 september 2016 accepted: 01 january 2017 published: 26 february 2017 references alfieri, l., brooks, p. j., aldrich, n. j., & tenenbaum, h. r. (2011). does discovery-based instruction enhance learning? journal of educational psychology, 103(1), 1–18. doi:10.1037/a0021017 anderson, j.r. (1993). rules of the mind. hillsdale, nj: erlbaum. doi:10.4324/9781315806938 beckmann, j. f. (1994). lernen und komplexes problemlösen: ein beitrag zur konstruktvalidierung von lerntests [learning and complex problem solving: a contribution to the construct validation of tests of learning potential]. bonn, germany: holos. beckmann, j. f. & goode, n. (2014). the benefit of being naïve and knowing it: the unfavourable impact of perceived context familiarity on learning in complex problem solving tasks. instructional science, 42(2), 271-290. doi:10.1007/s11251-013-9280-7 beckmann, j.f. & guthke, j. (1995). complex problem solving, intelligence and learning ability. in p.a. frensch & j. funke (eds.) complex problem solving: the european perspective (pp. 177–200). hillsdale, nj: erlbaum. doi:10.4324/9781315806723 beckmann, j.f. (2010). taming a beast of burden – on some issues with the conceptualisation and operationalisation of cognitive load. learning and instruction, 20, 250-264. doi:10.1016/j.learninstruc.2009.02.024 berry, d.c. & broadbent, d.e. (1984) on the relationship between task performance and associated verbalizable knowledge. the quarterly journal of experimental psychology section a, 36(2), 209-231. doi:10.1080/14640748408402156 berry, d.c. & broadbent, d.e. (1987) explanation and verbalization in a computer-assisted search task. the quarterly journal of experimental psychology section a, 39(4), 585-609. doi:10.1080/14640748708401804 blech, c. & funke, j. (2005). dynamis review: an overview about applications of the dynamis approach in cognitive psychology. heidelberg university: unpublished manuscript. retrieved from: https://www.psychologie.uni-heidelberg.de/ae/allg _en/forschun/dynamis/dynamis_review_08-2005.pdf bühner, m., kröner, s., & ziegler, m. (2008). working memory, visual-spatial intelligence and their relationship to problem-solving. intelligence, 36, 672–680. doi:10.1016/j.intell.2008.03.008 burns, b. & vollmeyer, r. (2002). goal specificity effects on hypothesis testing in problem solving. the quarterly journal of experimental psychology section a, 55(1), 241-261. doi:10.1080/02724980143000262 de jong, t. & van joolingen w.r. (1997). an extended dual search space model of scientific discovery learning. instructional science, 25, 307-346. doi:10.1023/a:1002993406499 de jong, t. & van joolingen w.r. (1998) scientific discovery learning with computer simulations of conceptual domains. review of educational research, 68, 179–202. doi:10.2307/1170753 de jong, t., linn, m. c., & zacharia, z. c. (2013). physical and virtual laboratories in science and engineering education. science, 340, 305–308. doi:10.1126/science.1230579 de jong, t. (2005). the guided discovery principle in multimedia learning. in r.e. mayer (ed.), cambridge handbook of multimedia learning (pp. 215–229). cambridge, uk: cambridge university press. doi:10.1017/cbo9780511816819 de jong, t. (2006). scaffolds for computer simulating based scientific discovery learning. in j. elen & r. e. clark (eds.), handling complexity in learning environments: theory and research (advances in learning and instruction) (pp. 107-128). oxford, uk; boston: elsevier. danner, d., hagemann, d., holt, d.v., hager, m., schankin, a., wüstenberg, s., & funke, j. (2011). measuring performance in dynamic decision making: reliability and validity of the tailorshop simulation. journal of individual differences, 32, 225-233. doi:10.1027/1614-0001/a000055 dörner, d. (1980). on the difficulties people have in dealing with complexity. simulation and games, 11, 87-106. doi:10.1177/104687818001100108 dunn, t. j., baguley, t., & brunsden, v. (2014). from alpha to omega: a practical solution to the pervasive problem of internal consistency estimation. british journal of psychology, 105(3), 399-412. doi:10.1111/bjop.12046 funke, j. & müller, h. (1988). eingreifen und prognostizieren als determinanten von systemidentifikation und systemsteuerung [intervention and prediction as determinants of system identification and system control]. sprache & kognition, 7, 176-186. funke, j. (1992). dealing with dynamic systems: research strategy, diagnostic approach and experimental results. german journal of psychology, 16(1), 24-43. retrieved from http:// cogprints.org/3004/1/funke_1992_gjp.pdf funke, j. (2001). dynamic systems as tools for analysing human judgment. thinking and reasoning, 7, 69–89. doi:10.1080/13546780042000046 gardner, p. h. & berry, d. c. (1995). the effect of different forms of advice on the control of a simulated complex system. applied cognitive psychology, 9(7). doi: doi:10.1002/acp.2350090706 gignac, g. e. (2015). raven’s is not a pure measure of general intelligence: implications for g factor theory and the brief measurement of g. intelligence, 52, 7179. doi:10.1016/j.intell.2015.07.006 gobert, j.d. & clement, j.j. (1999). effects of student-generated diagrams versus student-generated summaries on conceptual understanding of causal and dynamic knowledge in plate tectonics. journal of research in science teaching, 36(1), 39-53. doi:10.1002/(sici)1098-2736(199901)36:1<39::aidtea4>3.0.co;2-i goldman, s.r. (2009). explorations of relationships among learners, tasks, and learning. learning and instruction, 19, 451-454. doi:10.1016/j.learninstruc.2009.02.006 goode, n. & beckmann, j.f. (2010). you need to know: there is a causal relationship between structural knowledge and control performance in complex problem solving tasks. intelligence, 38(3), 345-352. doi:10.1016/j.intell.2010.01.001 goode, n. (2011). determinants of the control of dynamic systems: the role of structural knowledge (doctoral thesis). syd10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 12 http://dx.doi.org/10.1037/a0021017 http://dx.doi.org/10.4324/9781315806938 http://dx.doi.org/10.1007/s11251-013-9280-7 http://dx.doi.org/10.4324/9781315806723 http://dx.doi.org/10.1016/j.learninstruc.2009.02.024 http://dx.doi.org/10.1080/14640748408402156 http://dx.doi.org/10.1080/14640748708401804 https://www.psychologie.uni-heidelberg.de/ae/allg_en/forschun/dynamis/dynamis_review_08-2005.pdf https://www.psychologie.uni-heidelberg.de/ae/allg_en/forschun/dynamis/dynamis_review_08-2005.pdf http://dx.doi.org/10.1016/j.intell.2008.03.008 http://dx.doi.org/10.1080/02724980143000262 http://dx.doi.org/10.1023/a:1002993406499 http://dx.doi.org/10.2307/1170753 http://dx.doi.org/10.1126/science.1230579 http://dx.doi.org/10.1017/cbo9780511816819 http://dx.doi.org/10.1027/1614-0001/a000055 http://dx.doi.org/10.1177/104687818001100108 http://dx.doi.org/10.1111/bjop.12046 http://cogprints.org/3004/1/funke_1992_gjp.pdf http://cogprints.org/3004/1/funke_1992_gjp.pdf http://dx.doi.org/10.1080/13546780042000046 http://dx.doi.org/10.1002/acp.2350090706 http://dx.doi.org/10.1016/j.intell.2015.07.006 http://dx.doi.org/10.1002/(sici)1098-2736(199901)36:1<39::aid-tea4>3.0.co;2-i http://dx.doi.org/10.1002/(sici)1098-2736(199901)36:1<39::aid-tea4>3.0.co;2-i http://dx.doi.org/10.1016/j.learninstruc.2009.02.006 http://dx.doi.org/10.1016/j.intell.2010.01.001 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems ney, australia: university of sydney. retrieved from: http:// hdl.handle.net/2123/8967 hulshof, c.d. & de jong, t. (2006). using just-in-time information to support scientific discovery learning in a computer-based simulation. interactive learning environments, 14(1), 79-94. doi:10.1080/10494820600769171 kirschner, p.a., sweller, j., & clark, r.e. (2006). why minimal guidance during instruction does not work: an analysis of the failure of constructivist, discovery, problem-based, experiential and inquiry-based teaching. educational psychologist, 41(2), 7586. doi:10.1207/s15326985ep4102_1 kluge, a. (2008). performance assessments with microworlds and their difficulty. applied psychological measurement, 32(2), 156180. doi:10.1177/0146621607300015 kotovsky, k., hayes, j.r., & simon, h.a. (1985). why are some problems hard? evidence from tower of hanoi. cognitive psychology, 17, 248-294. doi:10.1016/0010-0285(85)90009-x kröner, s. (2001). intelligenzdiagnostik per computersimulation [intelligence assessment via computer simulation]. münster: waxmann. kröner, s., plass, j.l., & leutner, d. (2005). intelligence assessment with computer simulations. intelligence, 33, 347-368. doi:10.1016/j.intell.2005.03.002 kretzschmar, a., & süß, h. m. (2015). a study on the training of complex problem solving competence. journal of dynamic decision making, 1, 4. doi:10.11588/jddm.2015.1.15455 lazonder, a.w., wilhelm, p., & hagemans, m.g. (2008). the influence of domain knowledge on strategy use during simulationbased inquiry learning. learning and instruction, 18, 580-592. doi:10.1016/j.learninstruc.2007.12.001 lazonder, a.w., wilhelm, p., & van lieburg, e. (2009). unravelling the influence of domain knowledge during simulationbased inquiry learning. instructional science, 37, 437-451. doi:10.1007/s11251-008-9055-8 lazonder, a. w. & harmsen, r. (2016). meta-analysis of inquirybased learning: effects of guidance. review of educational research, 86, 681-718. doi:10.3102/0034654315627366 leutner, d. (1993). guided discovery learning with computerbased simulation games: effects of adaptive and non-adaptive instructional support. learning and instruction, 3, 113-132. doi:10.1016/0959-4752(93)90011-n mayer, r.e. (2004). should there be a three-strikes rule against pure discovery learning?. american psychologist, 59, 14-19. doi:10.1037/0003-066x.59.1.14 müller, h. (1993). komplexes problemlösen: reliabilität und wissen [complex problem solving: reliability and knowledge]. bonn: holos. osman, m. (2008). observation can be as effective as action in problem solving. cognitive science: a multi-disciplinary journal, 32(1), 162-183. doi:10.1080/03640210701703683 osman, m. (2010). controlling uncertainty: a review of human behavior in complex dynamic environments. psychological bulletin, 136(1), 65-86. doi:10.1037/a0017815 preußler, w. (1996). zur rolle expliziten und impliziten wissens bei der steuerung dynamischer systeme [on the role of explicit and implicit knowledge in controlling dynamic systems]. zeitschrift für experimentelle psychologie, 43, 399-434. preußler, w. (1998). strukturwissen als voraussetzung für die steuerung komplexer dynamischer systeme [structural knowledge as a precondition of controlling complex dynamic systems]. zeitschrift für experimentelle psychologie, 45, 218-240. putz-osterloh, w. (1981). über die beziehung zwischen testintelligenz und problemlöseerfolg. [the relation between test intelligence and problem solving success]. zeitschrift für psychologie mit zeitschrift für angewandte psychologie, 189(1), 79-100. putz-osterloh, w. (1993). strategies for knowledge acquisition and transfer of knowledge in dynamic tasks. in g. strube & k. wender (eds.) the cognitive psychology of knowledge (pp. 331350). amsterdam, netherlands: north-holland/elsevier science publishers. doi:10.1016/s0166-4115(08)x6106-0 putz-osterloh, w. & lüer, g. (1981). über die vorhersagbarkeit complexer problemlöseleistungen durch ergebnisse in einem intelligenztest [on the predictability of complex problem solving performance by intelligence test scores]. zeitschrift für experimentelle und angewandte psychologie, 28, 309-334. raudenbush, s. w. & bryk, a. s. (2002). hierarchical linear models: applications and data analysis method. newbury park, ca: sage. raudenbush, s. w., bryk, a. s., cheong, y., & congdon, r. t. (2000). hlm 6: hierarchical linear and nonlinear modeling. chicago, il: scientific software international. schoppek, w. (2002). examples, rules and strategies in the control of dynamic systems. cognitive science quarterly, 2, 6392. retrieved from: http://www.psychologie.uni-bayreuth.de/ de/documents/exrulstratscho.pdf schoppek, w. (2004). teaching structural knowledge in the control of dynamic systems: direction of causality makes a difference. in k. d. forbus, d. gentner, & t. regier (eds.), proceedings of the 26th annual conference of the cognitive science society. (pp. 1219-1224). mahwah, nj: erlbaum. snodgrass, j. g. & corwin, j. (1988). pragmatics of measuring recognition memory: applications to dementia and amnesia. journal of experimental psychology: general, 117(1), 34. doi:10.1037/0096-3445.117.1.34 sweller, j. (1994). cognitive load theory, learning difficulty, and instructional design. learning and instruction, 4, 295312. doi:10.1016/0959-4752(94)90003-5 sweller, j. (2010). element interactivity and intrinsic, extraneous, and germane cognitive load. educational research review, 22, 123-138. doi:10.1007/s10648-010-9128-5 raven, j., raven, j.c., & court, j. h. (1998). manual for raven’s progressive matrices and vocabulary scales. section 4: the advanced progressive matrices. san antonio, tx: harcourt assessment. snow, r.e. (1986). individual differences and the design of education programs. american psychologist, 41, 1029-1039. doi:10.1037/0003-066x.41.10.1029 snow, r.e. (1989). towards assessment of cognitive and conative structures in learning. educational researcher, 18(9), 8-14. doi:10.2307/1176713 snow, r.e. & lohman, d.f. (1989). implication of cognitive psychology for education measurement. in r.l. linn (ed.), educational measurement (3rd ed., pp. 263-331). new york: macmillan. snow, r.e. & yallow, e. (1982). education and intelligence. in r. j. sternberg (ed.), handbook of human intelligence (pp. 493586). london, uk: cambridge university press. süß, h.-m. (1996). intelligenz, wissen und problemlösen. kognitive voraussetzungen für erfolgreiches handeln bei computersimulierten problemen [intelligence, knowledge, and problem solving: cognitive prerequisites of successful performance in computer-simulated problems]. lehrund forschungstexte psychologie. göttingen, germany: hogrefe. tabbers, h. k., martens, r. l., & van merriënboer, j. j. g. (2004). multimedia instructions and cognitive load theory: effects of 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 13 http://hdl.handle.net/2123/8967 http://hdl.handle.net/2123/8967 http://dx.doi.org/10.1080/10494820600769171 http://dx.doi.org/10.1207/s15326985ep4102_1 http://dx.doi.org/10.1177/0146621607300015 http://dx.doi.org/10.1016/0010-0285(85)90009-x http://dx.doi.org/10.1016/j.intell.2005.03.002 http://dx.doi.org/10.11588/jddm.2015.1.15455 http://dx.doi.org/10.1016/j.learninstruc.2007.12.001 http://dx.doi.org/10.1007/s11251-008-9055-8 http://dx.doi.org/10.3102/0034654315627366 http://dx.doi.org/10.1016/0959-4752(93)90011-n http://dx.doi.org/10.1037/0003-066x.59.1.14 http://dx.doi.org/10.1080/03640210701703683 http://dx.doi.org/10.1037/a0017815 http://dx.doi.org/10.1016/s0166-4115(08)x6106-0 http://www.psychologie.uni-bayreuth.de/de/documents/exrulstratscho.pdf http://www.psychologie.uni-bayreuth.de/de/documents/exrulstratscho.pdf http://dx.doi.org/10.1037/0096-3445.117.1.34 http://dx.doi.org/10.1016/0959-4752(94)90003-5 http://dx.doi.org/10.1007/s10648-010-9128-5 http://dx.doi.org/10.1037/0003-066x.41.10.1029 http://dx.doi.org/10.2307/1176713 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 goode & beckmann: on the role of guidance in complex, dynamic systems modality and cueing. british journal of educational psychology, 74, 71 81. doi:10.1348/000709904322848824 vollmeyer, r., burns, b.d., & holyoak, k. (1996). the impact of goal specificity on strategy use and the acquisition of problem structure. cognitive science, 20, 75-100. doi:10.1016/s03640213(99)80003-2 wüstenberg, s., greiff, s., & funke, j. (2012). complex problem solving. more than reasoning? intelligence, 40, 147-168. doi:10.1080/13546783.2013.844729 10.11588/jddm.2016.1.33346 jddm | 2016 | volume 2 | article 5 | 14 http://dx.doi.org/10.1348/000709904322848824 http://dx.doi.org/10.1016/s0364-0213(99)80003-2 http://dx.doi.org/10.1016/s0364-0213(99)80003-2 http://dx.doi.org/10.1080/13546783.2013.844729 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.33346 original research the impact of moral motives on economic decision-making in relationally different situations katharina g. kugler, julia a. m. reif, gesa-kristina petersen, and felix c. brodbeck department of psychology, ludwig-maximilians-universität münchen, munich, germany to explore how “salient others” influence economic decisions, we tested the impact of moral motives on economic decision-making in three relationally different situations: (a) anonymous social one-shot interactions, where individuals should draw on situational cues to infer information about how to interpret their relationship to a salient other due to the absence of other sources of social information, (b) non-anonymous social situations within an ongoing interaction, in which the moral motive established in the relationship should override situational cues about moral motives, and (c) anonymous non-social one-shot interactions, in which moral motives should not have an effect given the absence of a salient other. in an experiment (n = 94 participants), we varied these relationally different decision situations and the moral motive framing (unity vs. proportionality). as hypothesized, the two moral motive framings influenced decision behavior, but only in the anonymous social one-shot interaction. by replicating that moral motives matter in economic decision-making and showing that people infer information about morally acceptable behavior in anonymous social situations from moral cues provided by the situation and from prior interactions in case of an ongoing relationship, we offer a moral-psychological explanation for why individuals decide differently in economic decision situations depending on the relationality of the situation. keywords: economic decisions, moral motives, tacit coordination, decision design consider a call to donate money for a crowdfund-ing initiative. would you donate money, if the founders of the initiative appealed to your solidarity? would you donate, if the founders suggested that you would receive something back in proportion to what you give? it probably depends on whether you believe the initiative deserves your solidarity or whether you believe that giving and receiving should be proportional. regardless of how you would decide, solidarity or proportionality considerations triggered by cues in the call for donations would influence your decision. such considerations are inherently social and are rooted in the moral motives you apply to the relationship between you and the founders of the initiative (fiske, 1992; rai & fiske, 2011). while economic decisions often seem rationally calculable, they are influenced by moral motives as soon as they become social in nature, that is, when other people are involved, affected or influenced by the decision. as situations in general provide opportunities or affordances to express individual preferences (kelley et al., 2003), social situations provide the context for relationship regulation. in such social contexts, peoples’ decision-making processes including the way they think, reason, and ultimately decide vary as a function of how they relate to “the other” person(s) involved in a given situation (larrick, 2016; reis, 2008). this regulation of relationships is inherently related to corresponding moral motives, which determine the morally required response in a situation (fiske, 1992; rai & fiske, 2011). in our paper, we add to the body of research addressing the question of how salient others influence economic decisions. we address this question from the perspective of moral psychology. more specifically, we draw on relationship regulation theory (rai & fiske, 2011), and its predecessor, relational models theory (fiske, 1992). the theory proposes four fundamental moral motives (unity, hierarchy, equality, and proportionality) that are used to regulate relationships and thereby influence individuals’ thinking, feeling, and behavior in social situations (fiske, 1992; rai & fiske, 2011). these four relational models guide all other-related behavior – even when making economic decisions, which was shown by brodbeck et al. (2013) in a series of experiments. with our paper, we replicated brodbeck et al.’s (2013) work by showing that different moral motives lead to different levels of solidarity shown in economic decision situations. however, we also extended brodbeck et al.’s (2013) work in the following ways: to show that the effect of moral motives on decision-making behavior is indeed limited to social interactions, we supplemented a new non-social interaction situation in which we expected no effect of moral framing. in contrast to brodbeck, our non-social situation involved a non-human partner, while brodbeck et al.’s (2013) non-social condition was an “interaction” with oneself. moreover, by adding a situation with an ongoing relationship (i.e., prior interaction), we went beyond brodbeck et al. (2013) to show, which kind of information individuals used to infer morally acceptable behavior. corresponding author: julia reif, ludwig-maximilians-universität münchen, department of psychology, economic and organisational psychology, leopoldstraße 13, 80802 münchen, germany. e-mail: julia.reif@psy.lmu.de 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 4 mailto:julia.reif@psy.lmu.de mailto:julia.reif@psy.lmu.de https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives the influence of social contexts on economic decision-making standard economic theory, which was long guided by the assumption of rational, self-interested agents, has now also recognized the influence of social contexts on economic decision-making (bolton & ockenfels, 2000; fehr & schmidt, 1999, 2006). for example, scholars have noted that people not only focus on own outcomes but consider others’ outcomes as well when evaluating choices (fiddick, cummins, janicki, lee, & erlich, 2013) and are motivated in doing so by other-regarding preferences (halali, kogut & ritov, 2017). in response to findings in experimental economics challenging the “legitimacy of the ‘rational agent’ model as a descriptive model of human behavior” (fiddick et al., 2013, p. 319), normative theories including other-regarding preferences and decision heuristics such as altruism, others’ well-being, fairness and reciprocity, the equal division rule (allison & messick, 1990) or noblesse oblige (fiddick et al., 2013) have been proposed (bolton & ockenfels, 2000; fehr & schmidt, 1999; levine, 1998; rabin, 1993, see also fiddick et al., 2013). rather than extending the economic research on other-regarding preferences and decision heuristics, we applied a psychological theory on moral motives and relationship regulation to understand other-regarding behavior in economic decisions (fiske, 1992; rai & fiske, 2011). in our opinion, the advantage of relationship regulation theory is that it offers a comprehensive, unifying framework concerning other-regarding behavior, and thus can be used to predict and explain moral motives in any type of social interaction. moral motives as mechanisms for relationship regulation in social situations moral motives represent moral obligations, or motivational forces to pursue acceptable behavior in relationships. as such, moral motives are mechanisms for relationship regulation and are thus inherently social. relationship regulation theory (rai & fiske, 2011, building on its predecessor, relational models theory, fiske, 1992; see also rai, 2020) proposes that in social situations – and only in social situations – individuals universally apply four – and only four – distinct moral motives (also see brodbeck, et al., 2013) for relationship regulation: unity, hierarchy, equality, and proportionality. unity serves as motivation to look after in-group members by avoiding threats and providing aid and protection when needed due to a sense of collective responsibility. decisions are made by consensus, and goods are divided according to needs. hierarchy serves as motivation to respect rank, where deference and obedience towards superiors is exchanged for leadership and guidance as well as protection of subordinates. decisions are made by the authority, and goods are divided depending on status. equality serves as moral motivation for balanced, in-kind reciprocity and equal treatment in the sense of “scratch my back and i will scratch yours” as well as “eye-for-an-eye forms of revenge” (rai & fiske, 2011, p. 63). decisions are made by majority, with everyone having the same vote, and goods are divided equally. proportionality serves as motivation for calculations and behavior based on ratios and making judgments according to a utilitarian calculus of costs and benefits. decisions are made by following market principles, and goods are divided in proportion to contributions. in asocial situations or in situations with null relationship, interactions are not coordinated with reference to a specific relational model or moral motive which leads to moral indifference (see fiske, 1992; rai & fiske, 2011). in general, humans use the four moral motives to develop, coordinate, evaluate and sustain social relationships. specific moral motives dominate specific relationships, and specific individuals prefer to regulate relationships with a specific moral motive (forsyth, 1995; haslam, 2004). however, all humans use and “understand” all four moral motives. in short, the four moral motives apply to all humans in all cultures. note, however, that the way moral motives are expressed varies cross-culturally (fiske, 1992; rai & fiske, 2011). in any specific social situation, an individual’s behavior can be attributed to one of the four moral motives. thus, any economic decision that involves others can be attributed to one of the four moral motives as well (individuals’ behaviors might vary across situations). cues eliciting moral motives in social situations moral motives guide (economic decision-making) behavior in social situations. but from where do individuals infer information about “morally correct” behavior? in an ongoing relationship within a context that provides culturally formed prescriptions about acceptable behavior, norms are established and guide individuals’ behaviors. at work, for example, managers and subordinates usually establish a relationship of hierarchy in which the managers’ instructions are followed. or when two friends always take turns paying for a round of drinks, they establish and express the moral motive of equality. within the boundaries of such ongoing interactions, people recognize rules and prohibitions and develop consensus about acceptable moral motives to be applied in interactions (rai & fiske, 2011). established relationships, being relatively stable, should thus be the most salient cue guiding behavior compared to specific situational influences which should be less influential. however, economic decisions are often made in a social context where individuals do not share a common history and cultural norms might not exist. coordination in such situations – where explicit communication is also often limited or impossible – can be termed tacit coordination (e.g., abele, stasser, & chartier, 2014; de kwaadsteniet & van dijk, 2010; van dijk, de kwaadsteniet, & de cremer, 2009). in the absence of other information, people base their assumptions 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 5 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives about the relationship on the most salient cues (cf. de kwaadsteniet & van dijk, 2010). in anonymous one-shot social interactions, that is, in situations in which people who cannot identify each other interact only one time, moral motives cannot stem from the relationship between individuals, and individuals do not share a history or potential future. in such situations, individuals should be particularly susceptible to externally provided cues regarding moral considerations. cues such as situational framings should evoke moral motives, leading to tacit coordination and indicating “what to do”. moral motives eliciting different economic decisions the four moral motives unity, hierarchy, equality, and proportionality differently direct people’s actions in social situations (rai & fiske, 2011), including in economic decision-making in social situations. according to brodbeck et al. (2013), unity moral motives should lead to higher levels of solidarity than proportionality moral motives, as unity moral motives serve as motivation to look after in-group members, while proportionality moral motives serve as motivation to calibrate costs and benefits (rai & fiske, 2011). the authors contrasted these two moral motives (unity versus proportionality) because their effects were expected to be particularly large in the context of their paradigm. in a series of studies, brodbeck et al. (2013) showed that situational framings and even subtle subliminal primes in economic decision games involving an anonymous other did indeed evoke moral motives, such as solidarity or proportionality, that distinctively guided individuals’ decision behavior: while a unity frame induced more solidarity behavior with more money being saved for an anonymous other, even though this behavior decreased the decision-maker’s expected total utility, a proportionality frame made participants consider costs and benefits with less money being saved for the anonymous other, increasing the decision-maker’s total expected utility. relationally different situations within our goal of exploring how salient others influence economic decisions, we focused on subtle situational cues providing information on the relationship to the salient other. to explore the effectiveness of the subtle situational cues, we contrasted (a) an anonymous social one-shot interaction with two other conditions: (b) a non-anonymous social situation with an ongoing interaction, and (c) an anonymous non-social one-shot interaction. decision-making was thus dynamic as we investigated interlinked social and non-social interactions in differently framed contexts with short-term and ongoing relationships and changing moral motives, implying short-term and long-term considerations of decision-makers. thereby, we built on research by brodbeck et al. (2013), using their paradigm, the dyadic solidarity game (dsg), to replicate and extend their findings on the influence of situational cues regarding moral motives on social economic decision-making. the dyadic solidarity game in the dsg, two individuals make an economic decision in a one-shot interaction. even though the decision is made by each individual independently, the revenue is dependent on a probabilistic risk as well as the other person’s decision: two participants are matched to form one dyad. both participants in the dyad have eur 10 at their disposal. participants can freely divide the eur 10 into two amounts, amount a and amount b (without knowing how the other participant decides). then a die is rolled: if the die lands on a 1, 2, 3, or 4 (i. e., probability of 2/3) the participants receive their own amount a, and amount b is not disbursed. however, if the die lands on a 5 or 6 (i. e., probability of 1/3), each participant receives the amount b of the other person in the dyad and amount a is not disbursed. in other words, the participants in a dyad can choose to put money aside for each other, which is disbursed in the case of a loss, that is, when the die lands on a 5 or 6. the dsg is a static game where players choose their actions simultaneously and interdependently. the actual profit in the game depends on one’s own and the other’s decisions as well as the roll of a die. however, participants are not able to influence the other person’s decision, and can only actively influence their profit through their own decision. under the assumptions of rationality and von neumann–morgenstern utility (von neumann & morgenstern, 1953; schoemaker, 1982), the only variable influencing utility is the payoff. in the dsg, payoff is maximized when each person in the dyad choses a maximum amount for themselves (i. e., amount a = eur 10) and contributes nothing to the other (amount b = eur 0). thus, this distribution, amount a = eur 10 and amount b = eur 0, represents maximum cost-benefit considerations. cost-benefit considerations decrease as participants allocate less money to amount a and more to amount b. conversely, solidarity is shown when individuals decide to put money aside for the other, at the cost of potentially receiving less payoff, in order to prevent the other person from getting nothing if the die lands on a 5 or 6. the lower amount a and the higher amount b, the higher the solidarity. research overview and hypotheses building on our theoretical argumentation, we assumed that whenever an economic decision involves others which makes the situation a social situation, individuals’ behavior is shaped by the moral motives that are salient and assumed to be acceptable in the given situation; non-social situations, i.e., situations that do not involve other persons, should not be susceptible to moral motives. moreover, we assumed that when individuals interact, that is, when a situation is social, they infer an acceptable moral motive from the 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 6 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives most salient cues. in anonymous social one-shot interactions, individuals use situational cues, for example, content-related information from situational framing, such as written descriptions, to determine an acceptable moral motive. in non-anonymous social ongoing interactions, cues from the interaction should be more salient than situational framing, and moral motives should mainly depend on the quality of prior interaction with one’s counterpart and not on the situational framing of moral motives. building on research by brodbeck et al. (2013) and applying their experimental approach, we also assumed that in social situations, a unity moral motive leads to more solidarity behavior than a proportionality moral motive, while a proportionality moral motive leads to more cost-benefit analyses than a unity moral motive. we conducted a laboratory experiment in which participants engaged in the dyadic solidarity game (dsg; brodbeck et al. 2013). the experiment had a 3 x 2 study design in which we intended to replicate and extend the series of studies conducted by brodbeck et al. (2013). the first independent variable constituted the decision situation. we created three relationally different situations: (a) anonymous social one-shot interactions where two anonymous individuals interacted in the dsg, (b) non-anonymous social ongoing interactions where two individuals had a short personal interaction before engaging in the dsg and (c) anonymous non-social one-shot interactions where one individual interacted with a computer “deciding” on the basis of a fixed algorithm. the second independent variable constituted the framing of the situation with respect to a moral motive. analogous to brodbeck et al. (2013), we compared unity moral motives and proportionality moral motives (rai & fiske, 2011). based on our theoretical argumentation and experimental design, we tested the following hypotheses: hypothesis 1a: in anonymous social one-shot interactions, the situational moral motive framing (unity vs. proportionality) influences participants’ decisions in the dsg: participants receiving a unity moral frame show more solidarity than participants receiving a proportionality moral frame. hypothesis 1b: in non-anonymous social ongoing interactions, the situational moral motive framing (unity vs. proportionality) has no effect on participants’ decisions in the dsg. hypothesis 1c: in anonymous non-social one-shot interactions (i. e., interacting with a computer which is not social), the situational moral motive framing (unity vs. proportionality) has no effect on participants’ decisions in the dsg1. (given that computers are not social, social cues regarding moral motives should not matter.) taking hypotheses 1a–c together, we expected an interaction effect between the decision situation and the situational moral motive framing: hypothesis 1d: there is an interaction effect between the decision situation (anonymous social oneshot vs. non-anonymous social ongoing vs. anonymous non-social one-shot) and the situational moral motive framing (unity vs. proportionality). the moral motive frame influences the economic decision in the dsg in anonymous social one-shot interactions (high solidarity in the unity condition, low solidarity in the proportionality condition) but not in non-anonymous social ongoing interactions or in anonymous non-social one-shot interactions. whereas the hypotheses specify a level of solidarity in the anonymous social one-shot interaction, that is, high solidarity in the unity condition and low solidarity in the proportionality condition, they do not specify a certain level of solidarity in the other two conditions (i. e., non-anonymous social ongoing interaction, anonymous non-social one-shot interaction). we will now close this gap. first, in the non-anonymous social ongoing interaction, participants engage in a cooperative task with another person prior to the dsg. due to the cooperative nature of the task, participants were expected to form an in-group (billig & tajfel, 1973) and strengthen their team spirit (deutsch, 1973). they further had to touch each other by shaking hands and holding a pen together2. according to fiske (1992), the feeling of belonging to an in-group and touching should elicit a unity moral motive. we assumed that the unity motive established in the cooperative task would be maintained for the subsequent interaction in the dsg. second, in the anonymous non-social one-shot interaction, no social moral motive should apply. we assumed that in this situation, individuals solely draw on rational cost-benefit analysis to make their decisions and decide as would be predicted by subjective expected utility theories. in the dsg, this implies that individuals show no/little solidarity. combining these considerations and the hypotheses above, we hypothesize: hypothesis 2: while participants in non-anonymous social ongoing interactions and participants in anonymous social one-shot interactions with a unity frame show high levels of solidarity, participants in anonymous non-social one-shot interactions and participants in anonymous social one-shot interactions with a proportionality frame show low levels of solidarity. experimental conditions and hypotheses are summarized in figure 1. method design in order to test our hypotheses, we employed a 3 (decision situation: anonymous social one-shot interaction vs. non-anonymous social ongoing interaction vs. 1please note that brodbeck and colleagues (2013) also investigated non-social situations: participants interacted with themselves in the self-insurance game (sig). however, interacting with oneself implies non-interdependency and the absence of risk due to either another person (like in the dsg) or a computer (like in the non-social dsg, which we employed). 2data were collected before the covid–19 pandemic. 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 7 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives figure 1. experimental conditions and hypotheses. iv = independent variable; h = hypothesis. anonymous non-social one-shot interaction) x 2 (situational moral motive frame: unity vs. proportionality) between-subjects experimental design. the experiment was conducted in a laboratory of a large german university. each session with up to 24 individuals was randomly assigned to one of the three experimental situations: anonymous social one-shot interaction vs. non-anonymous social ongoing interaction vs. anonymous non-social one-shot interaction. within each session, we randomly assigned 50% of participants to a proportionality moral motive framing and 50% to a unity moral motive framing. participants an a priori power analysis with g*power (faul et al., 2007) for an anova with six cells, a targeted power of .80, an alpha level of .05 and an estimated medium to large effect (f = .30, df = 1) of moral motive framing (cf. the effects found by brodbeck et al., 2013) resulted in a total sample of 90 persons and thus a cell size of at least 15 participants. for the main effect of decision situation (f = .30, df = 2) a total sample of 111 persons was estimated. thus, we recruited 112 students from a large german university. out of the 112 initial participants, we excluded 18 individuals, who indicated in an open-ended question at the end of the study that they had not understood the experimental game (i.e., dsg), even though we provided a detailed explanation of the game followed by an opportunity to ask comprehension questions. the exclusion of participants was determined by two blind coders who were familiar with the dsg. the coders rated all qualitative responses. the inter-rater reliability was cohen’s κ = .80; discrepancies were discussed and could be resolved in all cases. thus, n = 94 participants remained in the sample. those participants varied in sex (59% women) and age (m = 24.16 years, sd = 4.78 years). on average, participants earned eur 10.74 (sd = 2.58) for their participation. the payoff included a eur 4 show-up fee plus the individual’s profit from the dsg (brodbeck et al., 2013). material dependent variable: level of solidarity in the dsg. all participants engaged in the dsg (brodbeck et al., 2013). we measured “level of solidarity in the dsg” with the amount b participants chose in the dsg. amount b varied on a continuum from “high costbenefit considerations and low solidarity” to “low costbenefit considerations and high solidarity”. for ease of reading, we refer to low and high solidarity, which simultaneously implies high and low cost-benefit considerations. independent variable 1: decision situation. the dsg was played in three different decision situations using the computer program z-tree 3.3.11 (fischbacher, 2007): • in the anonymous social one-shot interaction, participants played the dsg with one person who sat in the same room. min = 12 to max = 24 participants engaged in the dsg simultaneously and were randomly matched by the experimental computer. however, participants did not know who the other person was. • in the non-anonymous social ongoing interaction, participants were seated next to their game partners. first, they greeted each other and then engaged jointly and silently in the following cooperative task (antons, 1992): participants drew three 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 8 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives pictures (a house, a tree, and a dog) together by simultaneously holding the same pen. having drawn the three pictures, participants signed their drawings. thus, participants had a common goal and had to touch each other, two mechanisms that should establish a unity moral motive for the relationship. then, participants played the dsg using the computer in a cubical but knowing that the other player was the person next to them they had just met and with whom they had completed the cooperative task. • in the non-social one-shot interaction the other player was the computer. participants were told in the introduction to the game. thus, participants had the option to put money aside for the computer they “played” with. participants were further told that the computer randomly divided “its” eur 10 into amount a and amount b. independent variable 2: moral motive framing. the entire experiment was either framed as a unity situation or a proportionality situation. depending on the moral motive frame, participants were either told that the experiment was about “common welfare in groups or in society” (i.e., unity moral motive frame) or that the experiment was about “cost-benefit optimization in free markets” (i.e., proportionality moral motive frame). the frames were developed and published by brodbeck et al., 2013 and are available in full length there. control variables. after the dsg, participants answered a short questionnaire. the questionnaire included a short version of the positive and negative affect schedule (panas; thompson, 2007). in addition, the questionnaire included an open-ended question about the game and the participants’ decision. we used the latter question to disqualify some of the participants who had not understood the game (see “participants”). procedure each session was conducted by the same experimenter. at the beginning of each session, participants were greeted by the experimenter, who explained the experimental procedure and the tasks. then, participants read a general introduction to the experiment, which included either a unity frame or a proportionality frame (i.e., independent variable 2). following this introduction, participants engaged in the dsg (brodbeck et al., 2013) in one of the three decision situations: anonymous social one-shot interaction, nonanonymous social ongoing interaction, or anonymous non-social one-shot interaction (i.e., independent variable 1). at the end of the session, participants answered a questionnaire including demographic questions and control variables. results preliminary analyses because previous research has shown that sex influences economic behavior (e.g., ortmann, & tichy, 1999; van den assem, van dolder, & thaler 2012; whitaker, bokemeiner, & loveridge, 2013), we wanted to test whether sex had an effect in our data as well. preliminary analyses showed that participants’ sex had no significant effect, t(92) = 0.70, p = .488, d = 0.15, on the level of solidarity (i.e., amount b). a t–test further confirmed that the frame (proportionality vs. unity) evoked neither positive, t(92) = 1.73, p = .088, d = 0.36, nor negative affect, t(92) = -1.55, p = .125, d = 0.32, which could have influenced participants’ decisions. in the non-anonymous social ongoing interaction condition, we ruled out that the sex constellation of the dyad had an effect. note that in the anonymous social one-shot interaction the sex of the other person remained unknown. sex constellation did not have a significant effect on negative affect, f(1, 22) = 4.11, p = .055, h2 = 0.16, positive affect, f(1, 22) = 1.82, p = .191, h2 = .08, or on the level of solidarity, f(1, 22) = 0.21, p = .649, h2 = 0.01. test of hypotheses main effects of moral motives in the three relationally different decision situations. to test hypotheses 1a, 1b, and 1c, we calculated the main effects of moral motive framing (unity vs. proportionality) in the three decision situations (anonymous social one-shot interaction vs. non-anonymous social ongoing interaction table 1. descriptive results by decision situation and moral motive framing. means (m) represent level of solidarity (amount b in the dsg in euro). the effect size cohen’s d quantifies the differences between the unity moral motive framing and the proportionality moral motive framing. decision situation moral motive framing anonymous socialone-shot interaction non-anonymous social ongoing interaction anonymous non-social one-shot interaction n m sd n m sd n m sd total 37 2.20 1.78 28 3.75 1.35 29 0.48 1.12 unity 18 3.06 1.59 16 3.75 1.34 10 0.60 1.35 proportionality 19 1.40 1.60 12 3.75 1.42 19 0.42 1.02 cohen’s d 1.04 0.00 0.15 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 9 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives vs. anonymous non-social one-shot interaction). all results are summarized in table 1. participants in an anonymous social one-shot interaction showed significantly different levels of solidarity (i.e., amount b) depending on their moral motive framing, t(35) = -3.16, p = .003, d = 1.04, with participants in the proportionality frame (m = 1.40, sd = 1.60) showing less solidarity than participants in unity frame (m = 3.06, sd = 1.59). thus, hypothesis 1a was supported. in non-anonymous social ongoing interactions, the level of solidarity did not vary depending on the moral motive framing, t(26) = 0.00, p > .999, d = 0.00, m proportionality = 3.75, sdproportionality = 1.42, m unity = 3.75, sdunity = 1.34. because non-significance does not confirm equivalence, further analyses were conducted using the procedure suggested by rogers, howard, and vessey (1993). equivalence was tested against an assumed large effect size of d = 0.80 (cohen, 1988) based on the results reported by brodbeck and colleagues (2013). a large effect (d = 0.80) translated into a difference of eur 1.10 in amount b, which did not fall within the 90% ci [-0.86, 0.86]. hence, for non-anonymous social ongoing interactions, the levels of solidarity between participants with a unity frame and a proportionality frame were equivalent, supporting hypothesis 1b. in anonymous non-social one-shot interactions, participants in the two framing conditions (unity vs. proportionality) showed no significant difference with respect to their level of solidarity, that is, amount b, t(27) = -0.40, p = .691, d = 0.15, m proportionality = 0.42, sdproportionality = 1.02, m unity = 0.60, sdunity = 1.35. again, further analyses testing the equivalence of the two framings were conducted. following the same assumptions and procedure (rogers et al., 1993) as for the non-anonymous social ongoing interactions, the assumption of equivalence could be supported. based on the sample’s sd, a large difference would amount to eur 0.96, which was outside the 90% ci [-0.55, 0.91]. hence, hypothesis 1c regarding the equivalence of the two framing groups (unity vs. proportionality) in anonymous non-social one-shot interactions was supported. interaction effect between decision situation and situational moral motive framing. to test hypothesis 1d, we conducted a 2 x 3 between-subjects anova, which revealed a significant interaction, f(2, 88) = 3.46, p = .036, h2 = 0.07. the pattern of the interaction supported hypothesis 1d: the moral motive frame (unity vs. proportionality) influenced the level of solidarity in the economic decision only in the anonymous social one-shot interaction but not in the non-anonymous social ongoing interaction or the anonymous non-social one-shot interaction. the results are summarized in figure 2. differences in levels of solidarity. hypothesis 2 predicted that participants in non-anonymous social ongoing interactions, regardless of the moral motive framing, and participants in anonymous social oneshot interactions with a unity frame would show high levels of solidarity, while participants in anonymous non-social one-shot interactions, regardless of the moral motive framing, and participants in anonymous social one-shot interactions with a proportionality frame would show low levels of solidarity. the descriptive results visualized in figure 2 support the predicted pattern. to statistically test this result, we combined the anonymous social one-shot interaction with unity framing and non-anonymous social ongoing interaction with both framings conditions into a “high solidarity” condition and the anonymous social one-shot interaction with proportionality framing and anonymous non-social one-shot interaction with both framings into a “low solidarity” condition. supporting hypothesis 2, the high solidarity condition and the low solidarity condition differed significantly in the predicted direction, t(92) = -8.92, p < .001, d = 1.81, m high solidarity = 3.48, sdhigh solidarity = 1.47, m low solidarity = 0.84, sdlow solidarity = 1.39. exploratory analyses proportionality versus self-interest. the outcome in the dsg paradigm (i.e., amount a and amount b) is suited to differentiate between unity and proportionality moral motives. however, is not equally suited to rule out all alternative explanations for the behavior shown. most apparent, low levels of solidarity or rather high levels of cost-benefit considerations could potentially also stem from pure self-interest besides proportionality moral motives. presumably, both, individuals with self-interest and individuals with proportionality moral motives, consider costs and benefits and conclude that their payoff is maximized by showing no solidarity. from a theoretical point of view the distinction between self-interest and proportionality moral motive is of interest, because self-interest is explicitly not a defining or necessary feature of proportionality. different from self-interest a proportionality moral motive is relational and other-regarding (fiske, 1992; see also brodbeck et al., 2013). to provide some evidence that the effects of our proportionality framing are different from self-interest (which indeed should be shown in the case when interacting with a computer), we compared amount b in the anonymous social one-shot interaction with proportionality framing and amount b in the anonymous non-social one-shot interaction (across both framings). both amounts differed significantly, with amount b in the anonymous social one-shot interaction with proportionality framing (m = 1.40, sd = 1.60) being significantly higher than in the anonymous non-social one-shot interaction across both framings (m = 0.48, sd = 1.12), t(46) = 2.35, p = .023, d = 0.69. posthoc, we assume that besides cost-benefit considerations, participants with a proportionality moral motive might have considered on average eur 1.40 (+ show up fee) the least payoff a participant should get proportional to the effort for participating in the study (i.e., “at least a cappuccino on their way home”). the ‘golden rule’. the golden rule tells us to 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 10 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives figure 2. interaction effect of decision situation and moral motive framing on solidarity. brodbeck et al.’s (2013) results (self-insurance game) are included as a reference value. treat others as we would like to be treated. in our experiment, we demonstrated that participants with a unity motive showed high levels of solidarity towards the other person in the dsg. therefore, one might ask whether individuals with a unity moral motive applied the golden rule and showed solidarity to the extent they would want to be treated themselves. the question’s theoretical foundation refers to the fact that a unity motive entails that everyone should be treated the same – oneself and all others (rai & fiske, 2011; fiske, 1992). to answer the question, we combined our findings with those of brodbeck et al. (2013). brodbeck et al., 2013 included a condition in which individuals played the dsg with themselves (i.e., the selfinsurance game, sig; brodbeck et al. (2013)). in the sig, moral motives likewise did not matter given the absence of a relationship to someone else. instead, participants’ economic decisions in the sig provide an answer to how participants treated themselves in the specific economic situation, that is, how much “solidarity” they showed towards themselves. thus, we tested the equivalence of the following two groups: (a) the level of solidarity of participants with a unity motive engaging in the dsg in our experiment, that is, the average level of solidarity among participants in the anonymous social one-shot interaction with a unity frame and participants in the non-anonymous social ongoing interaction and (b) the amount people put aside for themselves in the sig in the experiment by brodbeck et al. (2013). the descriptive results are visualized in figure 2. conducting the test of equivalence described above (rogers et al., 1993), we found that participants with a unity frame in our experiment put aside the same amount of money for their partner in the dsg as participants put aside for themselves in the sig conducted by brodbeck et al. (2013). assuming a large effect (cohen’s d = .80) based on our and brodbeck et al.’s (2013) findings, the respective difference of 1.31 eur was outside the 90% ci [-0.62, 0.51]. discussion our study sheds light on the question of how “salient others” in social situations influence economic decisions. based on the theory of relationship regulation (rai & fiske, 2011; fiske, 1992), we proposed that in social situations – and only in social situations – moral motives influence behaviors, including economic decision-making. in our study, we showed that, on the one hand, moral motives had no effect in non-social situations, that is, when individuals interacted with a computer in the dsg. conversely, in anonymous social situations, different moral motives led to different economic decisions: (a) the moral motive of unity, underlying relationships with in-group members in which everything is shared according to needs, led to more solidarity and less cost-benefit considerations towards one’s partner in the dsg (and the application of the “golden rule”), (b) the moral motive of proportionality, underlying relationships that function on the basis of market principles, where costs, benefits, gains, and contributions are divided proportionally, led to less solidarity and more cost-benefit considerations towards one’s partner in the dsg. differentiating post-hoc between a proportionality moral motive (an other-regarding motive) and selfinterest (a self-regarding motive), we could show that participants in anonymous social situations with a proportionality frame donated more to the other than people in the non-social situation (which should be guided by self-interest). showing solidarity in our paradigm (i.e., the dyadic solidarity game, dsg; 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 11 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives brodbeck et al., 2013) clearly deviated from what rationality, von neumann–morgenstern utility (von neumann & morgenstern, 1953; schoemaker, 1982) and pure self-interest would predict. however, this posthoc exploration is preliminary and requires further investigation. in addition to showing that moral motives matter in economic decisions involving others, we also shed light on how specific moral motives are activated. in anonymous social one-shot interactions, individuals were susceptible to cues provided by the framing of the situation. individuals implied the appropriate moral motive for the relationship in the dsg from the most salient cue in the situation (cf. de kwaadsteniet & van dijk, 2010). thus, the cues tacitly coordinated the participants’ decision behavior (e.g., abele, stasser, & chartier, 2014). in non-anonymous social ongoing interactions, such situational cues were ineffective. instead, individuals “applied” the moral motive in the dsg that had been established in a previous interaction. in our experiment, we established a unity motive, which subsequently led to high levels of solidarity in the dsg and to an application of the “golden rule” in economic decisions. from an evolutionary perspective, this contingency between situational cues and respective moral motives can be explained by systematic relationships between information and behavior. behavior per se cannot be interpreted without considering the informational context upon which it is contingent. in this vein, relationship regulation based on moral motives could be interpreted as “evolved neural architectures [which] are specifications of richly contingent systems for generating responses to informational inputs” (tooby & cosmides, 2005, p. 13). in sum, the findings show that individuals deviate from rational decisions in social situations as a result of moral motives underlying their relationship to the other person. moreover, individuals’ actual economic decisions can be predicted by the specific moral motive active in the specific interaction. whereas in ongoing relationships, the moral motive stems from the relational history, in anonymous one-shot interactions, individuals’ decision-making can be influenced by situational frames or peripheral cues. this finding is especially interesting given that individuals nowadays often interact anonymously and only once when making economic decisions. nowak (2006) refers to such asymmetric kinds of cooperative behaviors as indirect reciprocity. from an evolutionary perspective, helping strangers or donating money helps to establish a good reputation, which then will be rewarded by others in the long run. according to rai and fiske (2011, p. 59) “our sense of morality functions to facilitate the generation and maintenance of long-term social cooperative relationships with others”. from this evolutionary perspective, cooperation and the evolution of morality go hand in hand and are mutually conditional. our results replicate and extend previous work by brodbeck et al. (2013). brodbeck et al. (2013) also showed that moral motives affect economic decisions when interacting with an anonymous person in the dsg. first, we extend their research by including an anonymous non-social interaction. in doing so, we showed that moral motives only matter in social, not in non-social situations. note that brodbeck et al. (2013) compared anonymous social one-shot interactions in the dsg to a structurally equivalent “interaction” with oneself in the self-insurance game. second, we extended their research by including a nonanonymous social ongoing interaction. in doing so, we shed light on the question of what individuals base their decisions about the appropriate moral motive on and when individuals are especially susceptible to situational cues. our findings also provide an alternative explanation for the identifiability effect. in social decision situations, this effect refers to the fact that “willingness to share or give resources to another person is often greater when the recipient is identified rather than anonymous” (halali, kogut, & ritov, 2017, p. 474, see also small & loewenstein, 2003). the identifiability effect is often explained with reference to emotions or “ethical motivations” (halali et al., 2017, p. 481) that are evoked by an identifiable counterpart and subsequently influence decision-making (e.g., kogut & ritov, 2005, 2015). according to relationship regulation theory (rai & fiske, 2011), these ‘ethical motivations’ could reflect a unity motive activated by the identifiability of the other which then regulates social behavior. this assumption should be tested in future research contrasting ongoing, identifiable relationships in which unity motives are prevalent with ongoing, identifiable relationships in which other motives, such as proportionality motives, are prevalent. economic theories and individual preferences economic theory suggests that when making economic decisions, people are “strongly motivated by otherregarding preferences” (halali, kogut & ritov, 2017, p. 473). individual other-regarding preferences are susceptible to “slight changes in the social context” (fehr & hoff, 2011, p. 7) within which an interaction takes place and can be influenced by culture, situational framings, anchors, or priming of individuals’ identities. for example, the economic environment determines “the preference type that is decisive for the prevailing behavior in equilibrium”: either the fair type or the selfish type (fehr & schmidt, 1999, p. 2). below, we discuss our findings in the light of the economic framing literature and a theory that explicitly models fairness as a decision rationale, the theory of reciprocity (falk & fischbacher, 2006). the economic framing literature explains how changes in the experimental context affect behavior in the short run; for example, individuals contribute more in a one-shot prisoner’s dilemma if it is called a “community game” than if it is called a “wall street game” (fehr & schmidt, 1999, p. 27; liberman, samuels, & ross, 2004). we could easily explain the framing effect in our anonymous social one-shot interaction based on 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 12 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives the framing literature. however, the framing literature cannot explain the differential effects of moral motive framings in the anonymous social versus anonymous non-social interactions. to explain those differential effects, a theory is needed that highlights the distinct nature of social interactions, which the framing literature does not do, but relationship regulation theory does (rai & fiske, 2011), as does the theory of reciprocity (falk & fischbacher, 2006). the theory of reciprocity (falk & fischbacher, 2006) explains why people behave differently when interacting with real persons compared to “interacting” with random devices, drawing on the intentionality of real people’s behavior and non-intentionality of random devices’ “actions”. a random mechanism does not “signal any intentions” (falk & fischbacher, 2006, p. 304) which could be intentionally reciprocated. as such, the theory of reciprocity offers an alternative explanation for why participants in our study did not show solidarity towards an algorithm in the non-social situation but did show solidarity towards other humans in the social situations. moreover, the theory of reciprocity proposes that when acting in a competitive market, people will accept unfair distributions because they know that “in a competitive market [they have] no chance to achieve a ‘fair’ outcome” (falk & fischbacher, 2006, p. 307). by contrast, in cooperative games such as public goods games, people contribute more, the more they expect the others to contribute. moreover, in bilateral interactions, the theory of reciprocity predicts outcomes tending to be ‘fair’. as such, the theory of reciprocity seems to offer alternative explanations for the effects of our unity and proportionality frames as well as for the higher levels of solidarity in situations with prior bilateral interaction. however, the theory of reciprocity defines reciprocity as a “behavioral response to perceived kindness or unkindness” (falk & fischbacher, 2006, 2006, p. 294), and accordingly builds on multi-shot games. thus, the theory of reciprocity cannot explain the origin of non-selfish behavior in one-shot games, as we have demonstrated and theoretically explained in our study. instead, the theory of reciprocity assumes that “unconditional cooperation is practically inexistent” (falk & fischbacher, 2006, p. 308), an assumption which we can reject based on relationship regulation theory (rai & fiske, 2011). taken together, neither the framing literature nor the theory of reciprocity can fully explain the behavioral patterns we identified in our experiment. thus, economic theory and research on individual preferences (e.g., falk & fischbacher, 2006; fehr & schmidt, 1999) offer alternative explanations for some of our results but not for the overall pattern. thus, we encourage the scientific discourse on economic decisionmaking to more systematically take theories of moral motives into account and further explore the nature of moral motives as a component of utility functions. although the discourse on individual preferences and their susceptibility to situational changes has been going on for several decades, more discussion and theoretical integration between economics and psychology would be even better. limitations and future research the generalizability of our results, which stemmed from a sample of 94 people with two out of six cells falling below a cell size of 15 persons, might be questionable. however, we were able to replicate brodbeck et al.’s (2013) findings, which underlines the robustness of the effect of moral motive frames on economic decision-making behavior. moreover, our propositions and hypotheses were derived from and embedded within a strong theoretical rationale, relationship regulation theory (rai & fiske, 2011), which makes our results comparable to other empirical findings in the field of moral psychology and strengthens our results’ interpretability and theoretical relevance. regarding our sample, many drop-outs occurred in the anonymous non-social one-shot interaction with unity framing, that is, when solidarity towards a nonsocial entity was suggested by the situation, and in the non-anonymous social ongoing interaction with proportionality framing, that is, when low solidarity towards a person, with whom a prior cooperative relationship was built, was suggested. in these cases, the experimentally induced moral motive might have contradicted human intuition and thus have caused confusion for some participants. thus, drop-outs might have been confounded with our experimental conditions. these systematic drop-out effects might be subject to future research investigating the power of human moral intuition and consequences of intuitionsituation-incontingencies when interacting with technical or artificial devices. from a broader perspective, insights into such experimental “errors” could also point at general misconceptions of human thinking and reasoning, which can also lead to misconceptions when creating artificial general intelligence (cf. deutsch, 2019). in our study design, we included a non-anonymous, social ongoing interaction to test the assumption that moral motives formed by a prior interaction were stronger than a situational framing provided in the experimental task. by instructing participants to complete a cooperative task, they were expected to establish a unity moral motive for the relationship. future research should also include an experimental condition with a non-anonymous, social ongoing interaction based on proportionality motives, in order to probe whether the overriding effect supported by our data also holds for ongoing relationships based on proportionality motives. in our research rationale, we did not check for the effectiveness of our manipulations, because a manipulation check would have meant to check the presence of moral motives as mental states and to test the mediating function of these mental states between the induced motive and the subsequent behavior. however, a test of this mediation effect was not the focus of this 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 13 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives study, in which we focused on proofing that moral motives were (or were not) present in relationally different types of situations. we inferred on the existence of moral motives based on different behavioral reactions, which were also identified in prior research (see brodbeck et al., 2013). nevertheless, future research could dedicate to the examination of the role of mental states as mediators in morally-loaded social situations. regarding the experimental setting, the manipulation of decision situations was randomized at the experimental session level. thus, the effects of the decision situation on the level of solidarity might have been confounded with potential session effects. however, we tried to keep the experimental setting constant (the same experimenter conducted all sessions) and controlled for effects of positive affect and negative affect, which might have been triggered within specific sessions. with our paradigm, we demonstrated the moral basis of our experimentally induced motives by showing that the respective motives were active in social situations but not in non-social situations. however, with our paradigm, we cannot entirely rule out that our proportionality framing was confounded with pure selfinterest or egoistic behavior. although self-interest is not a distinctive, defining or necessary feature of proportionality, both can be linked and may co-occur (fiske, 1992). future research should contrast situations, in which the moral motive of proportionality leads to decisions which would also be triggered by egoism and situations, in which both lead to distinguishable decisions patterns. economists have begun to recognize the influence of other-regarding preferences, norms or decision heuristics on individuals’ economic decisions, which cause individuals to deviate from self-interest as the primary source of motivated behavior. these other-regarding preferences, norms or decision heuristics show parallels to the four moral motives suggested by relationship regulation theory, which is not astonishing, because the four moral motives claim universal validity. future research should investigate whether otherregarding preferences, norms or decision heuristics as investigated in economic studies can be traced back to the moral motives suggested by relationship regulation theory. for example, noblesse oblige, “a social norm that obligates those of higher rank to be honorable and generous in their dealings with those of lower rank” (fiddick et al., 2013, p. 320), might be an expression of hierarchy; altruism, a “form of unconditional kindness” (fehr & schmidt, 2006, p. 619) might be an expression of unity; reciprocity, which is characterized by being “willing to incur costs with the expectation of immediate or future benefits” (fiddick et al., 2013, p. 319) and the equal division rule (“whatever is being allocated should be divided equally among the participants”; allison & messick, 1990, p. 195) might be expressions of equality; rational cost-benefit calculations might co-occur with proportionality. this endeavor might help to bridge the gap between economics and psychology and advance interdisciplinary theorizing. we also want to offer a normative, ethical perspective on our experimental design and results. two major views have dominated philosophical approaches to morality: utilitarianism (or consequentialism) and deontology. a utilitarian or consequentialist ethic assumes that the rightness of an action can be determined by its consequences (holyoak & powell, 2016). to “bring about the greatest good for the greatest number” (bartels et al., 2015, p. 488) would be such a utilitarian logic. by contrast, a deontological approach assumes that “the right does not necessarily maximize the good” (holyoak & powell, 2016, p. 1180) and that acts are wrong if they violate rules or obligations (bartels et al., 2015). from this perspective, our proportionality framing might have provided the ground for a utilitarian interpretation. as the expected utility for both persons in the dsg was maximized when each person chose a maximum for themselves, the “greatest good for the greatest number” was reached by contributing nothing to the other. by contrast, our unity framing might have provided the ground for a deontological interpretation, as deontic rules might be “driven by concern for the well-being of others” (holyoak & powell, 2016, p. 1181) which is also in the center of a unity moral motive. future research should disentangle (or reconcile) normative ethical theories, moral principles of relationship regulation and economic theories based on “expected utility”. practical implications our results suggest that how people interact in anonymous social settings, such as online settings, is influenced by the moral motive framing provided in the setting itself. a growing body of research in finance examines textual influences on investor behavior in large-sample real-world data sets (for a review, see loughran & mcdonald, 2016). we demonstrated a possible mechanism for why textual characteristics influence investors in an anonymous social situation: textual characteristics might serve as a frame shaping investors’ moral motives and behavior. moreover, we examined economic decision situations in which participants had had a short prior interaction. such short personal interactions can also be found in interactive online tools. live chats and helplines, for example, support people in search of information while opening an online broker account, deciding on a new energy provider or buying new electric appliances, for example. these short interactions can be powerful sources of moral motives, overruling moral motives provided by a situation’s framing. our results also shed light on moral behavior in non-social situations: people were not receptive to moral cues in a non-social situation. this result might be interesting for the design of interaction situations with non-human devices, such as autonomous vehicles, robo-counselling, or smart home systems. 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 14 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives declaration of conflicting interests: the authors declare no conflicts of interest. author contributions: all authors contributed substantially to the conception and design of the study, to the acquisition, analysis, and interpretation of data for the study; they also contributed to drafting the study or revising it critically for important intellectual content. katharina g. kugler and julia a. m. reif equally contributed to the development of the manuscript. author note: gesa-kristina petersen is now at wayfair. julia reif was funded by a postdoctoral research fellowship of the “bayerische gleichstellungsfoerderung” while writing the manuscript. this research was conducted at the “munich experimental laboratory for economic and social sciences” of the ludwig-maximilians-universität münchen, munich, germany, funded by the german research foundation (dfg) in the context of the excellence initiative “lmuexcellent”. data was collected within the context of a diploma thesis completed by gesa-kristina petersen at the ludwigmaximilians-universität münchen. we thank keri hartman for proofreading our manuscript and the reviewers and the editor for their helpful comments. handling editor: wolfgang schoppek copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: kugler, k. g., reif, j. a. m., petersen, g.k., & brodbeck, f. c. (2021). the impact of moral motives on economic decision-making in relationally different situations. journal of dynamic decision making, 7, 4–16. doi:10.11588/jddm.2021.1.77559 received: 09.12.2020 accepted: 15.07.2021 published: 25.11.2021 references abele, s., stasser, g., & chartier, c. (2014). use of social knowledge in tacit coordination: social focal points. organizational behavior and human decision processes, 123, 23–33. http://dx.doi.org/10.1016/j.obhdp.2013.10.005 allison, s. t., & messick, d. m. (1990). social decision heuristics in the use of shared resources. journal of behavioral decision making, 3, 195–204. http://dx.doi.org/10.1002/bdm.3960030304 antons, k. (1992). praxis der gruppendynamik: übungen und techniken. göttigen: hogrefe. bartels, d. m., bauman, c. w., cushman, f. a., pizarro, d. a., & mcgraw, a. p. (2015). moral judgment and decision making. in g. keren & g. wu (eds.), the wiley blackwell handbook of judgment and decision making (pp. 478–515). john wiley & sons. brodbeck, f. c., kugler, k. g., reif, j. a. m., & maier, m. a. (2013). morals matter in economic games. plos one, 8, 1–19. http://dx.doi.org/10.1371/journal.pone.0081558 billig, m., & tajfel, h. (1973). social categorization and similarity in intergroup behaviour. european journal of social psychology, 3, 27–52. http://dx.doi.org/10.1002/ejsp.2420030103 bolton, g. e., & ockenfels, a. (2000). erc: a theory of equity, reciprocity, and competition. american economic review, 90, 166–193. http://dx.doi.org/10.1257/aer.90.1.166 cohen, j. (1988). statistical power analysis for the behavioral sciences (2nd ed.). hillsdale, nj: erlbaum. de kwaadsteniet, e. w., & van dijk, e. (2010). social status as a cue for tacit coordination. journal of experimental social psychology, 46, 515–524. http://dx.doi.org/10.1016/j.jesp.2010.01.005 deutsch, d. (2019). beyond reward and punishment. in j. brockman (ed.), possible minds: twenty-five ways of looking at ai (pp. 113–124). penguin press. deutsch, m. (1973). the resolution of conflict. new haven, ct: yale university press. falk, a., & fischbacher, u. (2006). a theory of reciprocity. games and economic behavior, 54, 293–315. http://dx.doi.org/10.1016/j.geb.2005.03.001 faul, f., erdfelder, e., lang, a.–g., & buchner, a. (2007). g*power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. behavior research methods, 39, 175–191. https://doi.org/10.3758/bf03193146 fehr, e., & hoff, k. (2011). introduction: tastes, castes and culture: the influence of society on preferences. the economic journal, 121, 396–412. http://dx.doi.org/10.1111/j.14680297.2011.02478.x fehr, e., & schmidt, k. m. (1999). a theory of fairness, competition, and cooperation. the quarterly journal of economics, 114, 817–868. http://dx.doi.org/10.1162/003355399556151 fehr, e., & schmidt, k. m. (2006). the economics of fairness, reciprocity and altruism – experimental evidence and new theories. in s. ch. kolm & j. m. ythier, handbook of the economics of giving, altruism and reciprocity, volume 1 (pp. 615– 691). amsterdam: elsevier. http://dx.doi.org/10.1016/s15740714(06)01008-6 fiddick, l., cummins, d. d., janicki, m., lee, s., & erlich, n. (2013). a cross-cultural study of noblesse oblige in economic decision-making. human nature, 24, 318–335. http://dx.doi.org/10.1007/s12110-013-9169-9 fischbacher, u. (2007). z-tree: zurich toolbox for ready-made economic experiments. experimental economics, 10, 171–178. http://dx.doi.org/10.1007/s10683-006-9159-4 fiske, a. p. (1992). the four elementary forms of sociality: framework for a unified theory of social relations. psychological review, 99, 689–723. http://dx.doi.org/10.1037/0033295x.99.4.689 forsyth, d. w. (1995). commentary on fiske’s models of social relations. psychoanalysis and contemporary thought, 18, 119– 153. halali, e., kogut, t., & ritov, i. (2017). reciprocating (more) specifically to you: the role of benefactor’s identifiability on direct and upstream reciprocity. journal of behavioral decision making, 30, 473–483. http://dx.doi.org/10.1002/bdm.1966 haslam, n. (2004). research on the relational models: an overview. in n. haslam (ed.), relational models theory. a con10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 15 https://doi.org/10.11588/jddm.2021.1.77559 https://doi.org/10.11588/jddm.2021.1.77559 kugler et al.: the impact of moral motives temporary overview (pp. 27–57). lawrence erlbaum associates publishers. holyoak, k. j., & powell, d. (2016). deontological coherence: a framework for commonsense moral reasoning. psychological bulletin, 142(11), 1179-1203. http://dx.doi.org/10.1037/bul0000075 kelley, h. h., holmes, j. g., kerr, n. l., reis, h. t., rusbult, c. e., & van lange, p. a. m. (2003). an atlas of interpersonal situations. new york: cambridge university press. kogut, t., & ritov, i. (2005). the “identified victim” effect: an identified group, or just a single individual? journal of behavioral decision making, 18, 157–167. http://dx.doi.org/10.1002/bdm.492 larrick, r. p. (2016). the social context of decisions. annual review of organizational psychology and organizational behavior, 3, 441–467. http://dx.doi.org/10.1146/annurev-orgpsych041015-062445 levine, d. (1998). modeling altruism and spitefulness in experiments. review of economic dynamics, 1, 593–622. http://dx.doi.org/10.1006/redy.1998.0023 liberman, v., samuels, s. m., & ross, l. (2004). the name of the game: predictive power of reputations versus situational labels in determining prisoner’s dilemma game moves. personality and social psychology bulletin, 30, 1175–1185. http://dx.doi.org/10.1177/0146167204264004 loughran, t., & mcdonald, b. (2016). textual analysis in accounting and finance: a survey. journal of accounting research, 54, 1187–1230. http://dx.doi.org/10.1111/1475-679x.12123 novak, m. a. (2006). five rules for the evolution of cooperation. science, 314(5805), 1560–1563. http://dx.doi.org/10.1126/science.1133755 ortmann, a., & tichy, l. k. (1999). gender differences in the laboratory: evidence from prisoner’s dilemma games. journal of economic behavior and organization, 39, 327–339. http://dx.doi.org/10.1016/s0167-2681(99)00038-4 rabin, m. (1993). incorporating fairness into game theory and economics. american economic review, 83, 1281–1302. http://dx.doi.org/10.2307/j.ctvcm4j8j.15 rai, t. s. (2020). relationship regulation theory. in k. gray & j. graham (eds.), atlas of moral psychology (pp. 231–241). the guilford press. rai, t. s., & fiske, a. p. (2011). moral psychology is relationship regulation: moral motives for unity, hierarchy, equality, and proportionality. psychological review, 118, 57–75. http://dx.doi.org/10.1037/a0021867 reis, h. t. (2008). reinvigorating the concept of situation in social psychology. personality and social psychology review, 12, 311– 329. http://dx.doi.org/10.1177/1088868308321721 rogers, j. l., howard, k. i., & vessey, j. t. (1993). using significance tests to evaluate equivalence between two experimental groups. psychological bulletin, 113, 553–565. http://dx.doi.org/10.1037//0033-2909.113.3.553 schoemaker, p. h. (1982). the expected utility model: its variants, purposes, evidence and limitations. journal of economic literature, 20, 529–563. http://dx.doi.org/10.2307/2724488 small, d. a., & loewenstein, g. (2003). helping a victim or helping the victim: altruism and identifiability. journal of risk and uncertainty, 26, 5–16. http://dx.doi.org/10.1023/a:1022299422219 thompson, e. r. (2007). development and validation of an internationally reliable short-form of the positive and negative affect schedule (panas). journal of cross-cultural psychology, 38, 227–242. http://dx.doi.org/10.1177/0022022106297301 tooby, j., & cosmides, l. (2005). conceptual foundations of evolutionary psychology. in d. m. buss (ed.),the handbook of evolutionary psychology (pp. 5–67). hoboken: wiley. van den assem, m. j., van dolder, d., & thaler, r. h. (2012). split or steal? cooperative behavior when the stakes are large. management science, 58, 2–20. http://dx.doi.org/10.1287/mnsc.1110.1413 van dijk, e., de kwaadsteniet, e. w., & de cremer, d. (2009). tacit coordination in social dilemmas: the importance of having a common understanding. journal of personality and social psychology, 96, 665–678. http://dx.doi.org/10.1037/a0012976 von neumann, j., & morgenstern, o. (1953). theory of games and economic behaviour (3rd ed.). new york: wiley. whitaker, e., bokemeiner, j. l., & loveridge, s. (2013). interactional associations of gender on savings behavior: showing gender’s continued influence on economic action. journal of family and economic issues, 34, 105–119. http://dx.doi.org/10.1007/s10834-012-9307-2 10.11588/jddm.2021.1.77559 jddm | 2021 | volume 7 | article 2 | 16 https://doi.org/10.11588/jddm.2021.1.77559 original research the red trousers. about confirmative thinking and perceptual defense in complex and uncertain domains of reality dietrich dörner and ute meck institute of psychology, otto-friedrich-universität, bamberg, germany this article is not about red trousers. the title points to a political foolishness that killed more than 100,000 soldiers. the discussion of this foolishness is an introduction to a general discussion of the reasons for political foolishness. – in her book ‘the march of folly – from troy to vietnam’, barbara tuchman said that in the last 3,000 years mankind has made large progress, primarily in science, but also in medicine, architecture, economy, agriculture, etc. only in politics, in the art of managing a state, nearly no progress is visible. others share this opinion. the swedish chancellor in the time of gustav ii adolph, in the time of the 30 years’ war, axel oxenstierna, said to his son, who was elected for an important political position and had doubts, whether, with his 18 years, he would be able to cope with this difficult task: “if you would know, my son, with what low degree of intelligence the world is governed . . . .” – in surveys about the reputation of professions, politicians normally get low ranks. why is that so? – in this article we try to give an answer to that question. the answer is very simple. foolish decisions are reducible firstly to a low or wavering self-esteem. secondly, they are based on a lack of phantasy; politicians have difficulties in finding new solutions for problems. – this answer is not at all new; already platon and – nearly at the same time – the ancient indian bhagavad gita gave the same response. in this article we develop a theory about political foolishness. keywords: thinking, decision-making, politics, errors, foolishness “le pantalon rouge c’est la france!”1 eugène étienne, minister of war of france in 1913 in a parliament-debate 1912 about the uniforms of the french army figure 1 shows a french infantryman in the uniformin which – in autumn 1914 – he marched into the war. even in those days the uniform was outdated. many armies had changed already to a camouflagebattle-dress; the english soldiers wore khaki, the germans preferred light grey ("feldgrau"), the russians green uniforms. especially the red trousers give rise to concern. they make the soldier a well visible target. (at least the red cap got a blue coating when the french army went to war, showing a certain concern about the inappropriateness of the uniform.) as tardi (2013) writes, figure 1. french infantry soldier in battle dress, 1914. the french soldiers themselves called their dress an operettaor circus-dress, which made them look like jumping jacks in uniforms. in his book, tardi uses the memories of his grandfather about the great war. therefore, the novel seems to be authentic; the stories make a genuine impression. figure 2 shows a line of german soldiers at the beginning of the war (the figure is taken from the graphic novel ‘miserable war’ (‘elender krieg’ by jacques tardi and jean-pierre verney, 2013, p.10). figure 3 shows a company of french soldiers preparing a bayonet-attack on the german line (see fig. 2). one can see the start and the end of that “attaque brusquée”. if you compare the french and the german uniforms, equipment and tactics, you will understand the reasons for the french defeat. within the first five months of the war, 350,000 french soldiers lost their lives in combat, but “only” corresponding author: dietrich dörner, otto-friedrich-universität, institute of psychology, economic and organisational psychology, markusplatz 3, 96047 bamberg, germany. e-mail: dietrich.doerner@uni-bamberg.de 1“the red trousers, that is france!” 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 1 mailto:dietrich.doerner@uni-bamberg.de https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers figure 2. german infantry soldiers in the uniform of autumn 1914. 250,000 german soldiers. the difference in the numbers of killed soldiers is not only due to the outdated uniforms of the french soldiers, but also to differences in tactics and the armament. for instance, at the beginning of the war, the french army had nearly no heavy artillery at its disposal. heavy artillery, in the eyes of the french generals, did not suit to the french maxime “l’attaque, l’attaque, toujours l’attaque”; it was considered as too slow, too awkward to move, and therefore was abandoned. (however, only heavy artillery, with mortars and howitzers, made it possible in those times to fire “indirectly” over hills and walls. therefore, the abandonment of heavy artillery was foolish2 to a high degree.) moreover, the french soldiers were not equipped with pickaxes and shovels in sufficient numbers, because their fiery assaults should not be hampered by clinging to the soil. additionally, the french army had much fewer reconnaissance-planes than the germans, who used them for guiding and controlling the (indirect) fire of the heavy artillery on long distances. the french infantrymen complained that grenades were poured over them and they could not even identify the source of the fire. (see tuchman, 2007, p. 266.) the french also had fewer light machine-guns than the germans, as they considered the machine-gun as a pure “defense-weapon” (see fig. 2). all that resulted in the french attacks often ending as depicted in figure 3 (lower panel). what was the reason for such a silly preparation of a possible war? the “red trousers” and all the other regulations for “l’attaque, l’attaque, toujours l’attaque” had not happened accidentally. they were based on a “philosophy of war” which was the result of reflecting about the reasons of the german victory in the war of 1870/71. for centuries, the french armed forces were considered to be the foremost ones in europe. although figure 3. french infantry prepares for an assault on the german lines. – and below it is shown, how it ended. the victorious time of napoleon bonaparte ended with napoleon’s defeat, the great reputation of the french army remained untouched. hence, the defeat of 1870/71 was an unexpected catastrophe for the french generals. soldiers and politicians were usually keen on honour and reputation, and thus the defeat of 1870/71 greatly affected the self-esteem of the french generals and not only of them. what could be done? well, a good method to repair a wounded self-esteem is to deny the insult! has the defeat of 1870/71 and in the franco-prussian war anything to do with the abilities of the french army? not at all. the commander-in-chief of the french army in 1870 was napoleon iii, a nephew of napoleon bonaparte. napoleon iii had an education as an artillery-officer. but contrary to his uncle, he did not have very much military experience and obviously was not at all a gifted army leader. in 1870/71, france intended to attack (and to conquer) baden and württemberg. however this plan failed: the prussian-german armies attacked alsace and the french army was forced to defend central france. therefore, after the defeat of 1870/71, the french generals explained that the french army was deployed only defensively and this was not in accordance with the “real” talents of the army. in 1806 napoleon defeated prussia within a fortnight. – what was the reason for the disastrous defeat in the 1870/71 war? not the low quality of the french army, of course! the french army had been excellent (also in 1914), but it was forced by an ungifted general to fight defensively. this, however, was not in accordance with the spirit of the french army, which in the glorious days of napoleon i was offensive! the 2the term ‘foolish’ is understood as thinking or reasoning distorted by emotions, whereas ‘silly’ means a low-grade thinking because of low cognitive capacities. 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 2 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers quality of the french soldier and hence of the french army was the offensive. – even biological factors were discussed in this context. bergson said that humans are characterized by different levels of an “élan vital”, and french have a much greater level of “élan vital” than germans, as it is shown (or not shown at all?) by figure 2 and figure 3. (see tuchman, 2007, p.39.) and therefore, in a future war, the french army should fight offensively, which always had been its real strength. thus, equipment and manoeuvers were adapted to the idea of a permanent offensive. the words ‘defense’ or ‘defensive’ were banned from the french military vocabulary. even in the war-schools, officers did not learn how to organize defense. and all that happened although clausewitz, who has been read in france, too3, had stated that defense is “the stronger mode of fighting”. (see figure 2.) the french generals, preparing for a possible war with germany, made the mistake not to think about soldiers and fighting and equipment, but were busy repairing their hurt self-confidence by remembering the great days of napoleon. and, unfortunately, the manoeuvers and uniforms of those days, too. remembering past glory replaced thinking. – so they lost their methodical flexibility by hallowing the doctrine of the offensive. questions about life or death of the french soldiers were replaced by questions about how to repair the self-esteem of the french generals. and so the germans won the border-battles in autumn 1914 through a strategical offensive, which however, was carried out by a tactical defense (see figure 2). in 1912, the french minister of war, alphonse messimy, after having visited bulgaria in the first balkan war and having observed bulgarian soldiers in camouflage uniforms, found it necessary to introduce camouflage dresses for the french army, too. this, however, caused considerable protest votes in the french military, as well as in the french public. it was argued that the french soldiers would feel “humiliated” to be dressed in “dirty grey” uniforms. however, it was quite obvious that the red trousers were rather conspicuous. but this argument was rationalized away. the french army in manoeuvres experimented with attacking infantry. a french infantry attack was expected to be initiated by an artillery assault on the enemy. (the french army in 1914 was equipped with an extremely effective light artillery, the soixante-quinze-cannons, calibre 7.5 cm). after the end of an artillery assault, the respective enemy, according to experiments and calculations of the french army, would need about 20 seconds to be prepared for defense. in this time a well-trained french infantryman could run about 50 m and before the enemy was able to defend himself, he was already confronted with the french bayonets. therefore, whether the french infantrymen wore red or grey trousers during the assault was completely insignificant. (later measurements showed that the available time for an unhindered approach was not 20 s, but only 8 s. see tuchman, 2007 , p. 247.) such “unconfirmed” ideas, however, had no chance to be considered by the french military command. in the first five months of the great war, the french lost 100,000 soldiers more than the germans, because of silly and foolish decision-making. this had serious consequences for the morals of the french army. during the entire world war 1 (ww1), 25 deserters and mutineers were sentenced to death and executed in the german army. in the british army, this number was 48; but in the french army this number was 645. in the french army in 1917, half of the divisions mutinied (which did not happen in any other army of ww1). this was, in accordance with reports in tardis work, the consequence of the french soldiers being unsatisfied not only with the conditions of fighting, but with their command, too. contrary to the public, in the trenches they did not love general joseph joffre. in the famous ‘chanson de craonne’, the chanson of the mutineers, the germans are not mentioned as enemies. the enemy is the war, “l’infame guerre”! in tardi’s novels, this plays a central role for the mutiny of 1917. stanley kubrick directed a movie about this mutiny (“paths of glory”; in german “wege zum ruhm”) featuring kirk douglas. it was forbidden to perform this film in france until 1975. and the ‘chanson de craonne’ was forbidden, too. (this verifies tardis assumptions about the causes of the mutiny. it testifies that indeed the mutiny of 1917 was considered as extremely dangerous for the french morals.) an explanation what was the reason for these foolish forms of thinking by the french generals before and at the beginning of ww1? here is an answer: the reason for this kind of thinking is fear – or rather the effort to overcome fear. in war (and mostly in politics), the state of reality is not completely known, or better, is unknown to a high degree. even what you believe to know is uncertain; it might be true or false. additionally, there is a secondorder uncertainty; you do not know what you do not know. the reality can be described as a large set of variables, forming a network with complex forms of dependencies. therefore, it is rather difficult to predict the development of the whole system. nevertheless, a politician or a general has to decide – in spite of the state of reality being unknown to a high degree. (we focus on decisions rather than on actions, because the politician could decide not to act, which would be an “action”, too, as “no action” has certain consequences as well as “action”.) – often people in a highly uncertain situation feel the urge to act as the ability to act creates a feeling of competence and certainty. to be able to act means that one is able to change the state of reality, and this gives a feeling of power. that 3general foch, later commander-in-chief of the allied forces in france, was an admirer of clausewitz. 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 3 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers is what people desire when exposed to a situation of high uncertainty. however, in politics it is often unclear, whether an action had any consequences at all. for instance, at this very moment (september 24th 2021, the german minister of health declared that a “vaccination week” was successful. the success, however, is not visible in the numbers of newly infected or hospitalized persons. it could be that this ‘no change’ means, that without the vaccination week the numbers would have been greater and that they have been constant only because of the vaccination week. or the numbers of newly infected will decrease in the next weeks. this would be a long-term effect. obviously this is an unclear situation, not a success. very often in war and in politics it is difficult to identify the results of an action. and often – when some positive consequences are visible – side and long-term effects are not (yet) visible. – the “optimism of the direct effect” results in a strong tendency of decisionmakers to overrate the direct effect and to underestimate sideand long-term effects, which will often be visible only after months or even years. what immediately may be identified as a great success, may however hide the monsters of the side and long-term effects in the future. how to cope with uncertainty? well, it might be appropriate to launch some activities to explore uncertain parts of the respective domain. how could that be done? well, for instance by launching reconnaissance activities. or by thinking about the unknown domain; how could the unknown terrain be shaped? such activities produce information, even though they usually do not provide a complete picture of the current state of reality. therefore, thinking is necessary. in particular, phantasy is necessary – the ability to raise hypotheses about the structure of the unknown state of reality. and this is the reason why, for instance, platon demanded that a politician should be thoughtful. that does not only mean that one is able to devise a plan by means of reasoning. thoughtful means that an actor is able to think about his own thinking. he must be able to criticise himself. he must be able – and willing! – to identify his own mistakes and errors. here, however, a problem arises: should general joffre tell his soldiers: “hello comrades, unfortunately i must tell you that we had rather wrong ideas about infantry attacks, the necessity of heavy artillery, the necessity of reconnaissance-airplanes, the necessity of pickaxes and shovels and about the role of offensive and defense. so sorry!” it is known, what general joffre really told his army: “the division commanders and some armycommanders, too, for instance general lanrezac, have not understood the plan xvii. and so we have to replace these commanders!” (see tuchman, 2007, p. 438.) there are two simple but deceptive methods of altering the subjective reality; for identifying the actions which are appropriate to change the image of the current state of reality. these methods are: • perceptual defense (pd): information about the state of reality one does not wish to see is not perceived. • confirmative thinking (ct): those aspects of the current state of reality that are necessary or would be helpful, but definitely do not exist, are “perceived”. – the reaction of general joffre to the failure of the french army in the borderbattles, as mentioned above, is an example of ct: “the division-commanders have failed.” pd and ct presumably are phylogenetically old mechanisms, which may give courage in hopeless situations. they result in the tendency of an organism, not to give in, which is better than giving in, because the latter means the end of all efforts. if one does not give in, a certain amount of hope may remain. we believe that pd and ct developed from safeguarding behaviour, which often can be identified with animals and humans, too. the blackbird i observe on my balcony, periodically switches to safeguarding behaviour and looks around, whether something useful or something dangerous might be close. pd and ct are internal forms of safeguarding behaviour. this safeguarding behaviour is done by checking what else, apart from the objects in the center of attention, exists or does not exist or may develop in the situation one thinks about. pd and ct mislead decision-makers with high probability. we will now show some rather common cts and pds that can be found rather frequently in military and political thinking: • denial of side and long-term effects: if decision-makers consider only the positive effects of a measure while forgetting or suppressing the idea, that there might be unwanted sideand long-term effects, they very likely commit a severe error. this form of pd is very frequent. (dörner,1996) • rumpelstiltskin planning (or neglecting conditions): rumpelstiltskin, a character from the tales of the brothers grimm, planned in the following way: “today i shall bake, tomorrow i’ll brew and the other day i’ll marry the queen’s child!” but for baking it is necessary to have firewood, flour, yeast, and so on. rumpelstiltskin did not consider that. and he did not check either, whether the firewood would be sufficient for baking and brewing. and so, in the end, rumpelstiltskin failed. (well, he did not fail because of the firewood and so on, but because, when dancing around a fire, he sang his name and so the queen became to know the name of rumpelstiltskin. and not being able to find out the name of rumpelstiltskin was the condition for the transfer of the queen’s daughter.) – anyway: rumpelstiltskin failed because he disregarded the conditions of actions. – to solve a problem it is not 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 4 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers sufficient to have a goal. it is necessary to dispose of the actions, too. and this is only the case if one knows whether the condition for a certain action is given. otherwise, the respective action will not work. • thinking in goals: the goals, one is striving for, are great for one’s self-esteem. because if we reach them this will mean happiness and welfare for mankind. “we should decrease the co2 level by 75% to be on the safe side with respect of global warming.” clear goal. – however, it is not enough to have goals, but it is necessary to know (or to find out) how to reach them. forgetting to think about that is an example of pd. the reasoning is incomplete then. – immanuel kant (1965, p. 10) meant with respect to ‘thinking in goals’: “to make plans is a luxuriant and boasting mode of thinking, it is claimed what cannot be done, it is demanded, what is unclear, it is sought after, what is unknown.” – this form of “making plans” can be found quite frequently during election campaigns and in the programs of political parties. • methodical rigidity: you have experience in your field and you know a lot of methods by which you will reach the goal you aspire. but unfortunately for military and political realities a complete set of methods that allows to solve each problem, does not exist. and it will never exist. – in chess you know all the methods how to move and use the pieces. and your knowledge is sufficient for all the purposes of the game. this, however, is not true for political or military realities. there you can never know all the methods for different purposes, because their number is infinite. this is due to the “incompleteness-theorem” by kurt gödel (see for instance, hoffmann, 2012): a calculus (i.e. a set of action methods or information processing methods) to solve every problem of a certain reality, does not exist. it is always and for infinite times necessary to be prepared for self-criticism and therefore to be prepared to think about new modes of thinking. the belief to believe all the necessary methods is called methodological rigidity and that is a frequent form of ct. – even if you can show yourself and others that you are able to solve a certain problem, this does not mean that you are able to solve every problem in the respective field. • groupthink: normally, a military or political leader has a “staff”, a couple of people, whose task is to give advice to the commander-in-chief. if, with respect to a proposed solution of a problem, all the advisors agree with the commander, this will be considered as an indicator that the solution is adequate. however, the agreement might be the result of the tendency of the advisors to share the opinion of the commander in order to get the image of a loyal follower. those who do not agree, run the risk to be identified as opponents of the commander-in-chief or even as opponents of the group. and this could endanger their career. therefore, they agree publicly, although they do not personally agree! this is “groupthink” as janis (1972) described. – it is important to have independent persons in advisory groups. persons who have their own standpoint and do not give it up just to be in agreement with the chief. an agreement of “loyal” group members is not at all an indicator for an adequate solution. – groupthink is a – probably frequent – form of ct. “we are right!” • “make it simple”: not everybody understands complicated explanations. if one does not understand an explanation one will normally get angry and will try to avoid the respective topic. therefore, if you want to persuade somebody that your ideas are true, if you want him as a follower, then it is wise to cast your ideas into simple forms. so marx and engels wrote the ‘communist manifesto’ to persuade uneducated workers that communism is the right idea. bertolt brecht wrote a poem about communism; it begins with “communism is simple, everybody can understand it!” unfortunately, the poem ends with: “communism is simple, but difficult to accomplish.” what does this mean? is communism simple? why then is it difficult to accomplish it? we will come back to this. the swedish historian peter englund states: “reality, well! certainly reality has something to do with the problem. without any doubt politicians, generals, and others act in a strict logical manner. however, they do not cope with reality. as they never care about what we call reality, but act according to an image of reality, which they have created; however this image often is not even similar to reality.” (englund 1993, p. 42.) with pd and ct it is extremely simple to get wrong ideas about reality and hence wrong ideas for the solution of a problem. however, there are conditions favouring the use of ct and pd. if somebody runs out of new ideas, cts or pds are possible resorts. it is very easy to ignore the side and long-term effects of an action, especially when these effects are not yet visible. – an action that produces positive effects immediately and negative effects only with a delay is a trap. but if you are caught in a trap because of having decided for a certain action, it might be very difficult to get out again, to avoid the (negative) long-term effects. a new image of the world, which does not mirror reality, is the result of using cts and pds! has the decision of the french generals to adhere to the 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 5 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers red trousers been stupid for the french infantrymen? well, not at all. the french military command invested a lot of reasoning – experiments and calculations (see above) – to arrive at the answer that the red trousers should be maintained. it was not stupid, it was foolish! there is a difference between foolishness and stupidity. people are stupid if they use false forms of deduction or other incorrect forms of reasoning. or they use data that are obviously wrong. – foolishness means not to think at all about topics one should think about. for instance, it is foolish not to consider side and long-term effects when planning actions. or it is foolish to assume that the conditions for a certain action are given without checking, whether this is true. the basis for ignoring topics one should consider in the given situation, is the premature assumption that certain events cannot happen or that certain conditions do exist (or do not exist). the tendency for confirmatory thinking or for perceptual defense are well known in psychology and not at all uncommon to human thinking. confirmatory thinking and perceptual defense have the function of creating courage. many philosophers and writers stress the importance of ct and pd. schopenhauer (2011, p. 246.) for instance, writes: "an adopted hypothesis gives us lynx-eyes for everything that confirms it and makes us blind to everything that contradicts it.” ct and pd are often the reason for people to continue trying to solve a problem although all the odds are against them. often this tendency to continue a difficult task is necessary, and confirmatory thinking and perceptual defense deliver an optimistic view on reality and its development. sternberg (2004) believes that foolishness is due to • unrealistic optimism, • egocentrism; only the effect for oneself is considered and not the difficulties other people have to cope with, • illusion of omniscience, • illusion of omnipotency, • illusion of invulnerability. this is a good list! however, we interpret it in a different way than sternberg. unrealistic optimism is not the cause of foolishness, but the result of counterfeiting reality by confirmatory thinking and perceptual defense. the illusions of knowing everything and of being omnipotent again are not the causes of foolishness, but the result of foolish thinking (viz., “nonthinking”). back to invulnerability and omnipotence in connection with the “red trousers”: the commander-in-chief of the french army in 1914, general joffre, sent a message to the french minister of war, alphonse messimy, in the evening of 20th of august 1914, immediately before launching the plan xvii: “we have every reason to expect the development of the operations with confidence!” – this, however, was definitely wrong: the development of the next four days demonstrated that the french attack according to the plan xvii failed completely. – now the germans believed they had won the war and saw the french divisions in a headless flight. but this, again, was confirmatory perception and not at all true. the french fell back – not at all headless, but in good order! – in the direction of the marne region, east of paris. therefore, the german assumptions about the french withdrawal from the border-battles, were as wrong as the french assumptions before launching the plan xvii. examples like this are relevant for a psychological investigation because we use theoretical consistency as a method to corroborate our theory about foolishness and mistakes of military leaders. this method is recognised in all sciences. it concerns the explanation of single cases or events, which are empirically inaccessible. the method simply consists in proving that the event or the behaviour, in our case the mistakes and the foolishness of military leaders, can be deduced from a theory. -– this method is quite common in science. for instance, darwin’s theory of evolution could only be proven by this method for a long time. physicists use this method to prove that their theories about the big bang is true. it has been shown that the series of events of the big bang can be deduced from some basic physical assumptions. in psychology, where single cases often have to be explained (for instance, hitler’s or stalin’s behaviour) this method could be very useful. but it is rarely used in the discipline. a fact that could be explained by the nonexistence of a theoretical psychology. let’s now have a look at another example of foolish thinking with devastating consequences. this time we look at the german army. the schlieffen-plan the “red trousers” are a good example for foolish behaviour. as this was a french foolishness, we will – to get a balance! – now turn to a german one at the beginning of ww1; no, not only – of course! – for the sake of balance, but because this example shows us aspects of foolishness we have not encountered in the french examples. we will examine the so-called schlieffen plan. the schlieffen plan was a reaction of the german military to the political situation that resulted from the cancellation of the „reinsurance treaty“ between germany and russia in 1886. in the aftermath of that cancellation, russia looked for a close relation with france which looked for a possibility to revise the result of the war of 1870/71. with french money, a russian network of railways was constructed, particularly concerning the connection between the center of russia and the german eastern border. for germany, this development meant the risk of a two front war against france and russia; and this pos10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 6 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers sibility raised fear in germany. the german general staff tried to find a double sided strategy, but none of the proposed plans was convincing. in that situation, general schlieffen’s plan of having two successive wars emerged. first, the french army should be destroyed in a campaign of about 40 days, afterwards russia should be attacked. for quickly defeating the french army, the idea of the german general staff had been to attack the left wing of the french army with a very strong right wing through the neutral belgium, to pass paris in the west, then to destroy the rest of the french army by pressing it against the left-wing of the german army (see figure 4). this idea was a strategical repetition of one of the battles of frederick the great. in 1757, frederick the great defeated in the battle of leuthen (village in silesia, near breslau/wroclaw) an austrian army, which was about 70,000 men strong, with an army having about half the strength of the austrian army, by using the so-called “oblique order of battle”, an idea invented in the antique by epaminondas of theben. – (tuchman writes: “past battles and dead generals keep the military spirit in a deadly grip and so germans, and the others, too, plan the past war!”) the idea of the schlieffen plan appeared unorthodox to the german general staff and was accepted with relief. this was a possibility to win the war – after eliminating the french danger in the west, nearly the complete german army would be at disposal in the east to defend germany against the russian “steam roller”. – on the other hand: the schlieffen plan was first considered to be extremely risky. what would happen if the idea to outflank the french army in the west would not work? then one would get a twofront war against france in the west and russia in the east. (that is exactly what happened in august and september 1914.) but as all the other alternatives seemed to be still worse, the german military leaders agreed to elaborate the schlieffen plan. it may be, that the conviction that the offensive mostly is the right method, played a certain role in this decision, because the 19th century had been a nearly uninterrupted series of successful attacks for prussia. what happened then is, in my eyes, characteristic of the fate of plans when their elaboration begins. the schlieffen plan had seen 20 years of elaboration when it was realised in august 1914. railway lines were built for the plan of general schlieffen and the plan was more and more refined with respect to its execution. the more the schlieffen plan was refined, the more the members of the german general staff appreciated it. each improvement of the plan increased the estimation of its success and, hence, the german general staff overestimated its probability of success. “past battles and dead generals . . . .!” above all, the general staff forgot platon’s reminder to stay thoughtful, to think about one’s own thinking. so over the course of time, what some authors called the ‘mythos of the schlieffen plan’ developed. “schlieffen saves us all!” when the germans began to carry out the schlieffen plan, it soon became apparent that none of the conditions was met which were considered as prerequisites for success. belgium did not concede the germans passage, but became a french ally and defended the belgian country with furor. england, initially showing not much enthusiasm to engage in a war against germany on the side of france and russia (both traditional enemies of england), decided to do just that. the french army was defeated in the border-battles and the french plan xvii found a quick termination. however things began to develop in a different direction than had been planned by the germans. the french army managed to escape in good order to the west, to the marne region. and so the so called ‘miracle of the marne’ became possible. in the beginning of september, the french armies were able to defeat the completely exhausted german troops. (as far as we know even the probability of exhaustion was not considered in the schlieffen plan. for instance, it was reported that the horses of the german cavalry were not even able to trot.) additionally, the russians appeared in east prussia much earlier than expected by the germans. and so the germans had to send troops to east prussia to stop the russian invasion. (there are discussions whether it was really necessary to send these troops from west to east. however they were sent and they were lacking in the west.) all that could have been considered by the german military command; it was well known that the russians had built railway-lines to the west with french help, which made it possible for them to invade for instance east prussia much earlier than it had been expected. additionally, the russian army had improved qualitatively a lot since the catastrophic defeat the army had suffered in the russian-japanese war in 1905; and the germans knew that! the behaviour of the french army after its defeat during the border-battles, was – as already mentioned – well-organized. it did not have the character of a headless flight. this, however, was not regarded by the germans; the leader of the 1st german army, general kluck, decided not to pass paris in the west, but to directly attack the allegedly half ruined french army east of paris, to destroy it finally (see figure 4). so the french troops around paris were able to attack the right flank of the german 1st army and the germans lost the battle of the marne. to summarize: the germans overestimated themselves and underestimated the french and the russians. as with the red trousers, in this case the germans did not live in the “actual” reality but in a selfmade, illusionary reality. schlieffens plan was overrated. “past battles . . . ..!” the clarity and the logical structure of the schlieffen plan was one of the reasons for its overestimation. decision-makers often misinterpret the clarity of a plan as an indicator of its adequateness. however, an idea that is simple and clear and therefore is un10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 7 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers figure 4. the schlieffen plan. derstood by everybody, can still be wrong because it might be too simple. for instance, the simple form in which friedrich engels cast the ideas of karl marx (see sperber, 2013, p. 356) promoted the attractiveness of the communist ideas, but concealed their shortcomings. in the communist manifesto by marx & engels, one can find a list of measures to create the communist system. fourteen measures were proposed. the list begins with the famous request that the expropriators should be expropriated. 1. expropriation of the landowners. 2. hard progressive taxes. 3. repeal of the law of inheritance. 4. confiscation of the propriety of all emigrants and rebels. 5. . . . . . . . (see marx & engels, 1946, p.24.) we will not discuss these measures in the particular case, but we want to indicate that any discussion of their side and long-term effects is missing. this means that it remains quite unclear what in particular these measures, when carried out, will result in. it seems necessary at this point to cite machiavelli, who said: „... ., that always something evil is close to the positive effect, which easily is generated together with it, so that 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 8 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers it is impossible to avoid it if the positive effect is striven for.” (machiavelli, 3rd book, 37., p. 369.) “communism is simple!” we learned from bertolt brecht. this list of measures in the communist manifesto seems to be simple. however, it is a black box; nobody knows what it contains. the core of the communist manifesto is pd, “perceptual defence”. more foolishness! and what could be learned from it? general lanrezac, commander of the french 5th army on the outermost left-wing of the french army (see figure 4), had been immediately confronted with the very strong right wing of the german army, which passed through belgium. he was surprised and horrified by the great strength of the germans at that place, which he could reliably identify by his reconnaissancecavalry. he found it necessary to inform general joffre, commander-in-chief of the french army, about the unexpected strength of the german troops in belgium. however, general joffre was occupied with launching plan xvii, which meant crashing through the german centre and crossing the rhine near mayence. the news from lanrezac therefore did not find much resonance in the french headquarter – it was just a nuisance. taking it seriously would have meant to stop plan xvii immediately to concentrate on the defense against the german right wing. in the mind of general joffre, however, the plan xvii had top priority. therefore, joffre (and his headquarter) reacted to lanrezac’s news between the 10th and the 20st of august 1914 in the following way: • “lanrezac is wrong; it is impossible that the germans have so many troops in this region!” • “the german troops in belgium have a special order!” • “we have the impression that the germans have no troops in that region!” • “general lanrezac exaggerates!” • “general lanrezac is a coward!” (see tuchman, 2007, p. 233 ff.) these are impressive examples of pd, are they not? here are some more significant examples of perceptual defense and confirmatory perception in the time of the beginning of ww1: • “the only task cavalry could serve for in a future war is cooking rice for the infantry!” (general ian hamilton, british army, observer of the russianjapanese war in 1905). • “obviously the sun of the orient has burnt the brain of general hamilton!” (reaction of the supreme command of the british army). • “the machine-gun as a weapon is completely overrated!” (general haig, british army in 19084) in 1910, general ruffey, french army, proposed to buy 3,000 reconnaissance-airplanes for the french army. the reaction of general joffre was: • “airplane!? this is a piece of sports equipment!” (cited from peter englund, 1993) general haig later became the commander-in-chief of the british army in france. ironically, peter englund noticed, that the only idea of general haig demonstrating a certain degree of originality had been the decision to introduce lances for the british cavalry. these proved to be very useful for hunting wild boars in india. the preparedness to launch a war would have been far lower if generals and politicians had possessed a better image of what a war in 1914 would mean. but politicians and generals refused to develop an idea about what a war with machine-guns, airplanes and quick firing artillery meant. or more likely, phantasy was lacking to construct an image of a “modern” war. we easily could continue the list of foolish ideas in the dawn of the great war. an overview about these ideas can be found in the book by barbara tuchman. with this book it can be shown, that it is not only possible to shake one’s head about the foolish ideas politicians and generals had hatched, but that it is possible as well to alter one’s behaviour. the book was published in 1962, and john f. kennedy, president of the us in those days, read it during the second cuba crisis in 1963. the book altered kennedy’s behaviour when coping with that crisis, which menaced to end in an atomic war between russia and the united states. kennedy not only stated that chruschtschow had installed atomic missiles on cuba, but he asked himself, why the head of the soviet government had acted in that way? he found out the reason of chruschtschows behaviour. a great agro-technical project had proved to be a failure, and this jeopardised chruschtschow’s position in the politburo. so in his eyes “throwing a hedgehog into uncle sam’s trousers” was the right way to re-establish his standing in the politburo. when kennedy was informed about that, he decided to help chruschtschow! yes, he brought help to the man who just had installed atomic missiles in cuba, deadly menacing the united states!) – kennedy withdraw missiles in turkey (which where outdated anyway). so chruschtschow could decide to dismantle the atomic missiles in cuba on his side. additionally, kennedy commanded the navy units, who had to control the sea around cuba, not to sink russian vessels trying to break the blockade, but to destroy only the helm and the propeller of the respective 4in a similar way, helmuth von moltke, chief of the general staff of the german army, reacted to a corresponding report of the colonel max hoffman, prussian army, who, like general hamilton, was an observer of the russian-japanese war in 1905. 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 9 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers ship in order to save the life of the crew. – this was very wise once more, as killing men certainly would have intensified the crisis. the idea of revenge will emerge! – ask yourself! if you had been in the situation of kennedy, would that idea have emerged in your mind? – as far as we can remember, this idea is without any example in the past. kennedy’s advisory board comprised not only persons who shared his view, but also persons with deviating opinions. so he diminished the effects of “groupthink” (janis, 1972), reduced the dangers of a distorted view of reality by perceptual defense or confirmatory perception. the dangers of groupthink kennedy himself experienced in the bay of pigs fiasco 1 ½ year before the 2nd cuba crisis. here an american charge on cuba failed because an all too homogeneous planning group, without a “devil’s advocate”, made too much planning errors. kennedy had learned from the bay of pigs fiasco.) so – fortunately – the cuba crisis came to a peaceful end. – kennedy followed platon’s advice “be thoughtful! criticize your own thinking!” – for his first plan after detecting the russian atomic missiles in cuba was: “bomb them!” kennedy shows that it is possible to learn from history although the situations and problems will never be the same. but the meta-methods to find the right methods for the problem at hand can be learned. it can be learned that homogeneous boards of advisors are a bad idea. and it can be learned that it is a bad idea not to check whether the conditions necessary for a certain action are given. and it can be learned, that sideand longterm-effects of actions are not exceptions, but the rule in complex environments. and it can be learned to be thoughtful; it is a good idea to have a critical distance to one’s own thinking. platon knew that. when it is discussed whether a certain person was able to serve as a minister (or similar positions), it is often discussed whether the person has “political experience”. however, it is meaningful to understand that “political experience” may have disastrous effects. – hitler did not have any political experience, but he managed that the saarland remained a part of germany. he managed that the rhineland was liberated from french occupation (1936), he managed that the sudeten territories became a part of germany (1938), he managed that austria was unified with germany, he managed to defeat poland in a short war, he managed to defeat france (1940) in a short war. – field marshal keitel called hitler the “greatest general of all times” (“größter feldherr aller zeiten”); this name later was abbreviated to “gröfaz”) and the german soldiers used this term – especially after the battle of stalingrad – not at all as a honorary name. unfortunately however, due to his successes, hitler was convinced of his “political experience” and this played an important role for his decision to begin a war against nearly half of the world. “to the german soldier nothing is impossible!” – political experience may be nothing but an overgeneralization and may figure 5. diesel and cancer. see text. thus result in sticklers for methods. gilbert parker said: "memory is man’s greatest friend and worst enemy." think about that, if you ask for „political experience“. – the movie ‘der untergang’ (‘downfall’ in the english version) raised criticism because it did not show hitler as a monster, but as a – with respect to his immediate environment – nice yet rather helpless man. this is how traudl junge, one of hitlers secretaries, remembered hitler. her book was used by the author of the movie, oliver hirschbiegel, as a source of information about hitler. and today? all that, the great war, hitler, and even the cuba crisis are long gone. nobody would recommend red trousers for soldiers today. but look at figure 5! an engineer of a well-known producer of cars had an idea how to improve the filter to reduce the emission of the diesel engine of the bmw cars. “wonderful; we will use this filter for our cars!” fortunately enough, another engineer looked at the filter more closely. in figure 5 you can see what he detected. this particle filter reduced the emission of soot. as the emission of soot causes cancer this means that the filter reduces the danger of cancer! wonderful! however, the filter increased the output of polycyclic aromatic hydrocarbon particles. this in turn increases the danger of cancer. additionally, the particle filter increases the emission of fibre particles, which again increases the danger of cancer. additionally, the particle filter increases diesel-consumption, therefore the emission of soot is increased, too. (bartsch, 2004) the filter had some positive, but many more negative effects. it is typical for planning that people consider only the positive effects and not the negative ones. here we have another example of pd (perceptual defense): the negative effects of an action have not been taken into account. as mentioned above, war and politics are uncertain and complex realities; one never knows exactly how a development will continue. most of the time, it is unknown and difficult to explore what really is the case. such realities cause anxiety, promote worry; they are alarming. however, a politician, a general, or an engi10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 10 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers neer has to do something. and to get rid of increasing concern by studying the reality, perceptual defense and confirmatory cognition “help” a lot especially in politics, where the consequences of poor acting are normally not visible immediately: negative side-effects frequently are longterm effects. look into your daily newspaper and examine the degree to which negative consequences are not considered in political decision-making, to what degree it is assumed that necessary conditions are given instead of verifying them. please check, how frequently politicians plan “methodically” or “in goals” only and do not invest any thought in how the goals could be reached. it was already platon (timaios 69d, in platon vii) who stated that failures in political decision-making may, to a high degree, be attributed to overestimation of one’s own strengths. and this overestimation is the result of confirmatory thinking and perceptual defense. in this very moment (august 2021), in germany and elsewhere in the world the debacle of afghanistan is discussed. within a very short period of time, the taliban had managed to re-establish their political power in afghanistan. during approximately 20 years, the western world had tried to create something like a foundation for a modern state in afghanistan and had established a 300,000-man army with modern weapons. not in years, but in some weeks all that collapsed and some thousand soldiers of the taliban on mopeds and armed with kalashnikov rifles, re-established their political power in afghanistan. in june 2021, the german minister of foreign affairs, heiko maas, explained: „... that in some weeks the talibans will seize the power, is not the basis of my assumptions¡‘ and afterwards, in august 2021, the minister explained: “we all have erred!” this was wrong, because some of the multinational forces in afghanistan had understood well what was going on, for instance the french. and for years there had been reports about the sluggish or even negative development in afghanistan, recognizable to each reader of newspapers. at least for them it was quite clear that the whole mission was not a progress at all. – in the meantime we know that the minister of foreign affairs had been informed about the development in afghanistan by the german ambassador in the united states, emily haber. she had a talk with the chief of the cia about 1½ weeks before kabul was “freed” by the taliban on august 23, 2021. the chief of the cia informed the german ambassador that the afghan government could collapse far more quickly than expected. therefore,the ambassador recommended that the emergency plans for the german embassy in kabul should be activated at any rate. however, nothing seems to have happened. the german minister of foreign affairs seems to have done nothing. in a session of an investigation committee of the german bundestag in the beginning of september 2021, the minister explained that this had been one of the frequent wire news the ministry of foreign affairs is getting daily. – obviously, this important message had not altered the minister’s view of the situation in afghanistan. – this is an example of pd with serious consequences. please try to identify the cases of neglecting sideor long-term effects, the cases of abandoning the analysis of conditions for actions, the cases of planning through goal-setting only, without considering whether activities for reaching the goals are known. or look at those cases in which the appropriateness of the plan is accepted only because all the members of a group agree. or consider those cases in which the members of a committee are selected with respect to their agreement with the opinion of the leader of the group. closing remarks i (dd) was about to put a book about the seven years’ war (archenholtz, 1973) back into the bookshelf. not very skilfully, i tried to grab it, so the book opened on page 68. and there i found a remark on friedrich ii ( frederick the great, king of prussia) after the battle of kolin (june 18, 1757). this was the first battle frederick lost. in a letter to his friend, the lord marshal, friedrich wrote: „the fortune, dear lord, often produces unjustified self-confidence. twentythree battalions were not sufficient to remove 60,000 men from an advantageous position“. here you find the confession of frederick that he had overestimated his capabilities and therefore made mistakes. the uninterrupted series of military successes of the prussian king in the first two silesian wars had produced an (inappropriate) feeling of superiority in him. the result were the mistakes of kolin. and so frederick concluded: „next time we will do better¡‘ this succeeded: he won the next battles near rossbach (november 5, 1757) and at leuthen (december 4, 1757), not least because of a number of new and creative ideas about the conduct of war. so the prussian king identified the dangers of confirmative perception, the generalisation of successes. a series of successes produces sticklers for methods, produces the conviction of having identified all the secrets of warfare. (please remember the gödel-theorem here!) and this was the reason why the king failed at kolin. however, he was able to criticise himself and so he learned. but shortly after the battle of kolin and after giving up the siege of prague, frederick tried to make his younger brother, august wilhelm, responsible for the failure. many historians believe that this was unjustified. and august wilhelm died one year later. it is said, that his brother’s accusations played an important role for his untimely death. – frederick’s 14 years younger brother heinrich later constructed an obelisk in the park of rheinsberg, which was devoted to the remembrance of his brother august wilhelm and other officers, who in his opinion were treated unfairly by frederick the great (you can still pay a visit to that obelisk today). 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 11 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers the stories are similar: joffre and lanrezac, frederick and august wilhelm. a general overestimates himself, makes mistakes, and declares others responsible for the failure. different places, different conditions, but a similar course of events. if we try to identify the psychological background of foolish behaviour in politics, two causes for inappropriate decision-making may be identified: unstable self-esteem may be one cause. and the other one is a lack of phantasy, the inability to generate new ideas for the solution to a problem. these two reasons seem to be the central factors being responsible for foolish behaviour. low self-esteem spurs the search for signals of competence. the acting person wants to prove that abilities exist that allow to do something, to solve problems. if this search for signals of competence is successful, and if the new idea works and produces solutions, then what friedrich (the great) experienced may happen: it is not improbable that the own strengths and capabilities will now be overrated. – this however, produces mistakes, generating a feeling of helplessness again. it is not known what could be done! however – as mentioned above – persons with an unstable self-esteem do not ask for advice in such a situation to avoid admitting that the own capabilities are low. sometimes people try to find signals of competence in other fields. in august 1914, general joffre drove at a speed of 100 km/h (high speed in those times!) from division headquarter to division headquarter in order to control whether each commander did what he was ordered to do. (for this purpose joffre’s car was driven by a professional racing driver.) being able to control the division-commanders unexpectedly and frequently demonstrated competence! obviously, in joffres eyes a general should be a kind of bloodhound, biting the legs of the sub-commanders, to lead them in the right direction. without phantasy, it is very difficult to find new ideas. for someone without phantasy, it would be difficult to think about how one could help chruschtschow instead of fighting him. – without phantasy, advice will come from perceptual defence or confirmatory thinking. this, however, mostly is poor advice. joffre for instance to whom barbara tuchman (2007, p. 400) ascribes no capabilities of having new ideas, did not ask for advice and was not willing to accept advice from others. so he repelled general lanrezac’s warning to consider the strong right wing of the germans, passing through belgium westward. – why? well, we explained it earlier: if you ask somebody else for advice this means that you have no ideas, that you cannot produce new ideas. and joffre tried to avoid this impression. the “order of the day” before the battle of the marne, the order general joffre delivered to the french army on september 6, 1914, contained the following sentence: „a troop not being able to advance further should, whatever it costs, keep the conquered ground, and should die in their position rather than falling back“. this sounds very heroic: „win or die!“ however, this maxim often is inappropriate. there are enough examples in military history where falling back or evading have been appropriate methods to escape the defeat, even to win a war. fabius maximus cunctator (fabius the waverer) succeeded in winning the 2nd punic war against hannibal by a flexible, evasive behaviour; a french example is the flexible behaviour of bertrand du guesclin in the hundred years’ war between england and france. guesclin ruined the english forces by a flexible and evasive conduct of war. “win or die!” is a very rough maxim, and it seems to mirror joffre’s problems. it means “here i am again!” and a sentence right before the “win or die”order testifies this: “this is not the moment to look back!” joffre said. – not to look back? not to look back on what? – not to look back on the lost border-battles? not to look back on plan xvii and its failure? not to look back on the missing heavy artillery and the missing airplanes? not to look back on the decision of not replacing the red trousers by a camouflage-dress? not to look back on the failure of not listening to the warnings of general lanrezac? by “win or die!” joffre repaired his self-esteem. however “win or die!” would not have been necessary if joffre had acted more prudently in the early days of the war. but joffre was joffre and not gallieni. in 1915 joffre continued the basic idea of plan xvii: offensive! obviously for joffre this was the only appropriate strategy to give the war a different direction. (see verney in tardi & verney, 2013, p.54.) in 1915, joffre launched attacks at compiègne, then north of arras at the vimy-heights, and repeated the battle of the champagne in september 1915. in the whole year of 1915 the staff-divisions intensified the former tactic: “l’ attaque . . . ”. the result were minimal territorial profits for the allied forces, but 100,000 french, 60,000 british and (only) 65,000 german soldiers were killed. the attacking allied soldiers died because of the obstinacy of their leaders. the political price was that sir french as supreme commander of the british expedition corps had to leave the command to sir douglas haig. – joffre as the “victor of the marne” was still untouchable (for the public!). however, his reputation was violated (see purseigle, 2013, p. 148 and 154-156). joffre is a great example of methodical rigidity, because of his craving for glory and his lack of phantasy. – verney (tardi & verney, p. 54) asks: “did joffre see the superiority of the german heavy artillery and the prudent german defence-tactics? did he know the exhaustion of the soldiers; did he see the real causes for the flaw of ammunition?” these are rhetorical questions. the answer to all these questions is: “no!” joffre was a master of perceptual defence! – joffre saw nothing! why? joffre’s tendency to continue a behaviour although it had proved to be not successful, in this case not to change the strategy of “l’attaque, l’attaque, toujours l’attaque!” is typical for people with low self-esteem and a low degree of phantasy. errors are denied – 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 12 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers they are not perceived as errors. at the most some details went wrong or some sub-commanders failed. but the general idea remained untouched and the belief in its validity was even strengthened. for persons like general joffre everything else, for instance admitting a substantial error, would be considered as an indicator of weakness. so a strategical change is obdurately refused. for such people there is no way out because of their wavering self-esteem and their lacking phantasy. – (today, such a behaviour is called the kruger-dunning-effect, see kruger & dunning, 1999) the necessity for self-criticism and to think always anew at war and in politics is illustrated by nobody else as well as by clausewitz: „war, in general terms, does not consist of an infinite number of events, the differences of which balance each other and therefore are mastered by a better or worse scythe better or worse. war consists of single problems, which should be handled individually. war is not a land of stalks which, without respect to the features of a single stem, with a better or worse scythe is mown better or worse. but it consists of big trees, to which the axe should be used with reflection according to the quality and direction of each single tree. (translation from: clausewitz, 1880, p. 130 f.) you may replace “war” in the first line of the clausewitz quote by “politics”. for clausewitz war is politics with different means. – undoubtedly, platon would have agreed with clausewitz immediately. for a political leader it is necessary to be thoughtful; he must be prepared for self-criticism. – those however with an unstable, wavering self-esteem avoid self-criticism. for them self-criticism is a danger, a danger to lose self-confidence. they like fixed rules. a fixed system of rules appears them to guarantee success. they like confirmation and not critique. “this is not the moment to look back!” – such persons should not become politicians or should not be admitted to leading positions in an army. they do not strive for success in the long term, but for immediate success, immediate glory. unfortunately, just this kind of persons seem to strive for political careers. – and this is the reason why platon (politeia, 520 c,d) recommended not to admit such people to political careers. and clausewitz recommended the same for people whose main motive is boosting their self-esteem (“eigensinn” as clausewitz, 1880, p.64, calls it). as such persons cannot cope with failures, they will exhibit a strong tendency to avoid self-criticism. interestingly, you will find nearly the same description of persons not suitable for politics in the bhagavad gita, part of the ancient indian epic of the mahabharata. in the 18th chant you will find the verses 23 to 25, which summarise the action-theory of the bhagavad gita. here they are, translated by the authors from a german edition of the bhagavad gita (mylius, 1997): 23. the necessary action, done without clinging, without passion and hatred, without striving for glory, is a good action. 24. but the action done because of striving for sensuality or for self-satisfaction which causes great grievance, is called the passionate action. 25. however an action without regard of the consequences, own losses, damage to others and the own strength, done because of blindness, is called the mode of the dark. it is important not to understand these verses as part of a religious hymn or as a moralising sermon. they are practical recommendations for the organisation of action and you should ask “why”? why, for instance, is it recommended to engage only in actions that are necessary? an action should be necessary to reach a goal. the goal must be a political one, not a personal goal such as revenge or retaliation. the goal should be necessary to install a better reality or to prevent a bad reality to come into being. – why should necessary actions be done without attachment? well, you should not cling to your methods, you should not be a stickler for methods. you should be able to alter your methods if these have proven unsuccessful. – why should you act without passion and hatred? well, because passion and hatred blind you! why should a commander not strive for glory? because striving for glory is contradictory to the preparedness for self-criticism. – it is very simple to act in the right way, isn’t it? the next two verses 24 and 25 describe how one should not act. passion and hatred will blind you and you will not find the right way. you will not act for the right goals and you will not find the right methods of acting. because there is a high risk that passion and hatred will suggest inappropriate views of reality and methods. verse 25 recommends not to leave the command to people whose mind is “dark” and without light, hence to people, who are not very intelligent and often cannot take into consideration sideand long-term effects. in the first years of the roman republic the higherdegree officials, the praetors and consuls, were elected in rome by the people‘s assembly. however, those who were admitted to the election were selected by a commission that tried to keep off the above mentioned persons! – a good rule? declaration of conflicting interests: the authors declare no conflicts of interest. author note: we would like to say “thank you” to following persons: wolfgang schoppek, who turned our english into an understandable form and contributed a lot of ideas to augment the flow of thoughts, sigrid dörner, who found a lot of mistakes in spelling and in the course 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 13 https://doi.org/10.11588/jddm.2022.1.84578 dörner & meck: the red trousers of the argumentation, peter fankhänel for an idea, why people with a low self-esteem confronted with a failure, increase their self-esteem, steffen haumann for ideas, how to condense the line of thoughts. peer review: in a blind peer review process, carlos kölbl (bayreuth) and lars allolio-näcke (erlangen) have reviewed this article before publication. both reviewers have approved the disclosure of their names after the end of the review process. handling editor: alexander nicolai wendt copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: dörner, d. & meck, u. (2022). the red trousers. about confirmative thinking and perceptual defense in complex and uncertain domains of reality. journal of dynamic decision making, 8, 1–14. doi:10.11588/jddm.2022.1.84578 received: 29.11.2021 accepted: 27.03.2022 published: 11.07.2022 references archenholtz, j.w. von (1973). geschichte des siebenjährigen krieges in deutschland. krefeld: j. olmes. bartsch, c. (2004). die dunkleren seiten des partikelfilters. frankfurter allgemeine zeitung, 142. cabanes, b. & duménil, a. (ed.) (2013). der erste weltkrieg – eine europäische katastrophe. darmstadt: wissenschaftliche buchgesellschaft. von clausewitz, c. (1880). vom kriege. berlin: dümmler. dörner, d. (1996). the logic of failure. new york: metropolitan books. englund, p. (1993). die marx-brothers in petrograd. berlin: basis-druck. hoffmann, d. w. (2012). die gödel’schen unvollständigkeitssätze: eine geführte reise durch kurt gödels historischen beweis. berlin: springer. janis, i. (1972). the victims of groupthink. boston: mifflin. kant, i. (1965). prolegomena zu einer jeden künftigen metaphysik. felix meiner: hamburg. kruger, j. & dunning, d. (1999). unskilled and unaware of it. how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. journal of personality and social psychology, 77(6), 1121-1134. doi: 10.1037/0022-3514.77.6.1121 machiavelli, n. (2010). vom staate vom fürsten kleinere schriften. hamburg: nikol verlagsgesellschaft. marx, k. & engels, f. (1946). manifest der kommunistischen partei. berlin: verlag neuer weg. mylius, k. (ed.) (1997). die bhagavad gita: des erhabenen gesang. münchen: dtv. platon (1990). werke in 8 bänden (i viii). darmstadt: wissenschaftliche buchgesellschaft. purseigle, p. (2013). vom artois bis zur champagne – kein durchkommen. in bruno cabanes & anne duménil (ed.), der erste weltkrieg (pp. 147 156). darmstadt: wissenschaftliche buchgesellschaft. schopenhauer, a. (2011) [1844]. the world as will and presentation, vol. 2. new york: routledge. sperber, j. (2013). karl marx sein leben und sein jahrhundert. münchen: c.h.beck. sternberg, r. j. (2004). why smart people can be so foolish(3). european psychologist, 9(13), 145-150. doi: 10.1027/10169040.9.3.145 tardi, jaques & verney, jean-pierre (2013). elender krieg. bruxelles: editions moderne, casterman. tuchman, b. (2001). die torheit der regierenden. von troja bis vietnam. reinbek: rowohlt. tuchman, b. (2007). august 1914. frankfurt am main: fischer. 10.11588/jddm.2022.1.84578 jddm | 2022 | volume 8 | article 1 | 14 https://doi.org/10.11588/jddm.2022.1.84578 https://doi.org/10.11588/jddm.2022.1.84578 original research strategies, tactics, and errors in dynamic decision making in an asian sample c. dominik güss1, ma. teresa tuason2, and lloyd v. orduña3 1department of psychology, university of north florida, 2department of public health, university of north florida, and 3research & development center, university of baguio the current study had three goals: (1) to investigate strategies, tactics, and errors as predictors of success and failure under uncertainty following the dynamic decision making (ddm) and complex problem solving (cps) framework; (2) to use observation and to examine its reliability and potential as a data collection method when using microworlds; and (3) to investigate the applicability and validity of a microworld developed in the west, to an asian sample. one hundred three participants in the philippines took the role of fire chief in the microworld winfire (gerdes, dörner, & pfeiffer, 1993). their strategies, tactics, and errors were observed and coded by experimenters as they worked individually on the simulation twice. results showed that (1) ddm strategies, tactics, and errors predicted success and failure in winfire, and strategies and tactics that led to success increased while errors decreased over time; (2) strategies, tactics, and errors can be validly assessed through observation by experimenters, specifically that two types of decision makers were identified: the active, flexible, and big picture planners and the slow or cautious, and singlefocused decision makers; (3) these findings together with participants’ survey ratings speak for the applicability of the microworld in an east asian sample and for its validity. findings are potentially relevant for experts and for training programs, highlighting the benefits of virtual environments during ddm. keywords: strategy, tactic, dynamic decision making, complex problem solving, naturalistic decision making, errors, success, cognitive biases, microworld, uncertainty, virtual environment, culture the fire chief has a problem. fires are spreading inthe forest near the city, and with the current wind strength and direction, they will most likely move toward the city. this is not the only problem, however. north of the city, in the forest, there are several small fires starting due to the extreme summer heat. what can the fire chief do? what strategy will the fire chief choose? should the fire chief distribute the firefighting units and give them commands to quickly extinguish the fires, or order them to clear an area close to the city to stop the fires from spreading? should the fire chief focus on the most urgent fire first, or try to deal with all the fires at the same time? we presented participants with this situation simulated as a virtual environment in order to observe their strategies, tactics, and errors. developing and selecting successful strategies and tactics and avoiding errors are important to daily life. a strategy provides a general framework for action, for example: focus on the main fire first. strategies can be defined as broad directions to meet long-term goals (güss, 2000) and provide answers to the questions: “what do i want to accomplish and why?” tactics, which are more specific, address short-term goals by constructing and executing short-term plans to implement a strategy (bonissone, dutta, & wood, 1994), for example, since the fire is strong, send three fire trucks to the fire. tactics answer the question: “how can i get things done under the given circumstances?” both levels, strategic and tactical, are crucial for successful action: tactics are used to react quickly to environment changes, whereas strategic plans provide more continuity. strategies play an important role in complex real life decision making, yet it has been problematic to link them to success or failure, or to describe them as errors or biases. one reason for the difficulty in analyzing errors and success of strategies is related to methodological problems. first, many analyses of errors in naturalistic contexts are retrospective (e.g., reason, 1990). one starts with the negative unintended outcome and tries to identify errors that led to this outcome. such post-accident analyses can fall for the hindsight bias, i.e., in hindsight it seems easy to identify the strategy that led to failure, when at the time the situation actually occurred, the outcome was not easily foreseeable (fischhoff, 1975). further, it is possible for bad outcomes to happen as a consequence of sound decision strategies. also, given that human beings are part of, and act within complex, uncertain systems, it is difficult to identify one specific cause of error (lipshitz, 2005). some researchers argue that errors cannot easily be detected in dynamic environments due to unclear links of causation in the system (e.g., flach, 1999; rasmussen, 1990) or that it is difficult to define what the right decision strategy is in an ill-defined situation. as klein (1999, p. 97) put it, “when we move to natural settings where we don’t have tight control of the stimulus conditions we may have difficulty identifying biases and errors.” this and other methodological problems have been recognized by montgomery, lipshitz, and brehmer (2005, p. 10), who summarized the state of naturalistic decision making ndm methodology and argued, “ndm still lacks generally accepted criteria of rigor that are suitable for its particular nature. such criteria are essential for guiding the design of good research and the evaluation of research outcomes.” however, this linking of strategies to success or failure is an important task. for instance, survey results corresponding author: c. dominik güss, department of psychology, university of north florida, jacksonville, fl 32225. e-mail: dguess@unf.edu 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 1 mailto:dguess@unf.edu http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors have shown that over 70% of accidents in medicine, aviation, and various industries – and this is a conservative estimate – can be attributed to human error (cook & woods, 1994). therefore, the ability to identify successful strategies, tactics, and errors in decision making is key. exactly this can be done with complex and dynamic computersimulated problem scenarios used in the field of complex problem solving cps and dynamic decision making ddm (dörner, 1996; o’neil & perez, 2008). the pitfalls of naturalistic error and strategy analysis such as being retrospective, hindsight bias, lack of control, and questionable links of causation can be avoided with standardized computersimulated problem scenarios. thus, the main goal of this article is to assess strategies, tactics, and errors empirically in a laboratory situation. like a few researchers in the past, our study investigated strategies, tactics, and errors over time, related them to performance, but unlike those researchers, we used observation as a method. strategies in dynamic decision making (ddm) and complex problem solving (cps) there has been recent interest in using virtual environments (also called microworlds or simulations, these terms will be used synonymously) to study strategies in the fields of ddm and cps (e.g., baker & delacruz, 2008; brehmer, 2004; funke, 2003; güss, tuason, & gerhard, 2010). schmid, ragni, gonzalez, and funke (2011) distinguished between the ddm and cps research traditions. according to the authors, cps refers to research mainly conducted in europe (broadbent, 1977; dörner, 1980; frensch & funke, 1995) where the primary goal is to investigate how naive subjects deal with complex, nontransparent, and dynamic problems simulated as microworlds. the term ddm is more frequently used in the north american tradition (e.g., gonzalez, 2005), but is similar to cps in that it also focuses on environments consisting of connected variables that change over time. additionally, the ddm tradition might use simpler simulated problems. this does not mean, however, that it is easy for participants to solve them. in fact participants often do very poorly (e.g., funke, 1991; strohschneider & güss, 1999). using computerized simulations or microworlds in the study of cps and ddm, it is possible to develop complex theories of human thought and behavior (e.g., dörner, 1999; dörner & güss, 2013) and to adequately identify which strategies in a specific situation will be more likely to lead to success and which to failure. in microworlds (brehmer & dörner, 1993; frensch & funke, 1995; funke, 2003; gonzalez, 2005; schmid, ragni, gonzalez, & funke, 2011), a participant is confronted with a complex, uncertain, and dynamic problem situation and performance is assessed by automatically recording and saving to a computer file each decision the participant makes, along with changes in the system state. predictors of individual differences in ddm and cps and their relation to performance have been widely studied, such as intelligence (e.g., rigas & brehmer, 1999; strohschneider, 1991; wenke, frensch, & funke, 2005), personality variables (e.g., schaub, 2001), or problem-relevant domain-specific knowledge (e.g., putzosterloh & lemme, 1987). such studies have certain limitations, as solely focusing on performance outcomes will not reveal details about the ddm process (spector, 2008). therefore, some ddm and cps researchers have suggested using these computer protocols to examine behavior patterns (fischer, greiff, & funke, 2012; strohschneider & güss, 1999), or strategies (schoppek & putz-osterloh, 2003; wüstenberg, greiff, molnár, & funke, 2014), or common tactical failures (e.g., dörner, 1996; ramnarayan, strohschneider, & schaub, 1997). thus, the goal of the current study is to address the need expressed by researchers to study strategies and errors systematically, specifically their role in ddm (e.g., qudrat-ullah, 2008; schoppek & putz-osterloh, 2003). with particular focus on the process of ddm (as opposed to the relationship of individual differences to outcomes), the current study investigated strategies, tactics, and errors leading to success or failure by observing participants as they worked on a microworld. we will discuss our research using the term ddm, although the microworld winfire has been used in both the cps and ddm tradition. it was postulated that strategies and tactics would be adapted to the situation over time. successful strategies and tactics would increase in frequency and errors would decrease in frequency. before discussing specific strategies, tactics, and errors, we will describe the winfire simulation in detail, to illustrate its demands and constraints. demands of the winfire simulation in the microworld winfire (gerdes et al., 1993), a participant takes the role of a fire-fighting commander who tries to protect cities from approaching fires, put out existing fires, and save the forest. similar fire simulations, e.g., newfire or the firechief simulation in australia (omodei, mclennan, elliott, wearing, & clancy, 2005) or c3fire in sweden (granlund, 2003; johansson, trnka, granlund, & götmar, 2010) have been used by other researchers focusing more on the results and the influence of outside variables (such as personality traits or intelligence) or simulation variations than on decision-making process analyses. simpler versions of the fire scenario have been used in previous research to investigate ddm strategies. schoppek (1991), for example, administered the scenario five times to 22 german participants, analyzed strategies in the logfiles (e.g., duration of scenario, number of commands and other, more sophisticated indices, such as initial distribution of fire-fighting units) and showed that a variety of problem-solving operations correlated with the use of multiple learning strategies to prepare for exams. dörner and pfeifer in germany (1991) compared 20 participants who worked under the stress of sound to 20 participants who worked without stress on five versions of fire. they found that although the two groups did not significantly differ in performance, they showed different strategic approaches, such that stressed participants, for instance, gave more goal commands to individual units and fewer goal commands to several units. in their study, the relationship between different strategic behaviors and performance, however, was not examined. the design of the current winfire scenario requires participants to distribute their resources, puts the decision maker under time pressure and requires quick and decisive actions. according to the microworld criteria discussed previously, winfire is moderate in complexity (consists of many variables), high in dynamics (develops in a nonlinear way), and moderate in transparency (unknown new fire locations and unforeseen wind changes). specifically, the complexity results from having to select from four main (and three other) command options for each of 12 units at any given time. winfire is highly dynamic because 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 2 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors the situation changes even without any intervention from the participant; in 11 minutes, 15 fires break out at programmed times and changes in wind direction and strength are programmed as well. winfire is moderate in transparency because fires start at staggered unknown times and locations. the presumed ddm core strategies in winfire consist of assessing situations quickly and identifying crucial situations, prioritizing, flexible planning that considers resource allocation and situational demands, and quick decision making which takes long-term effects into account to avoid further escalation of the problems. participant observation extant research describing dynamic decisions in a fire simulation (dörner and pfeifer, 1991; schoppek, 1991), referred to decisions of participants automatically saved in log files. these log files accurately reflect each decision of every participant. one disadvantage of these log files is that they mostly indicate tactics and system data (e.g. number of commands, duration of a simulation), as opposed to strategies. the current study attempted to use observation in order to assess tactics and strategies (like some in schoppek’s log-file analysis, 1991) using a bigger sample size. using observation, one can assess a decision in relation to a specific context as shown on the computer screen. using one helicopter can be “good” or “bad”. it depends on the context, for example, of how big a fire is or how distant the helicopter is from the fire and if there are other nearby units that would be available. this context is sometimes missing if one solely analyzes the log files. whereas observation is one of the key methods used in analyzing naturalistic decision making ndm in the field (e.g., lipshitz, 2005), it has been rarely used to assess strategies and tactics in the study of microworlds in the laboratory. thus another goal of the current study is to assess the applicability of participant observation as a method of data collection using the winfire scenario. the selection of ddm strategies the key demands of winfire are related to situation assessment, planning, and decision making under time pressure and ever-changing conditions. both planning and decision making are probably the key steps in the ddm process (e.g., dörner, 1996; funke, 2010). planning can be defined as the development of a course of action aimed at achieving a goal (güss, 2000; hayes-roth & hayes-roth, 1979). decision making, then, refers to the selection and execution of a course of action. participants have to take the described constraints in winfire (e.g., slow moving trucks, wind strength and wind direction, lack of water) into consideration and plan how to distribute and allocate their resources to mitigate forest fires. to identify potentially successful and unsuccessful strategies for the current study, we used both top-down (theoretically guided) and bottom-up (empirically guided) approaches; top-down, because theoretically they are based on the planning and decision-making steps of ddm; bottom-up because strategies, tactics, and errors are triggered by the context of winfire and were operationalized and identified within this context. the strategies, tactics, and errors were observed when participants worked on the simulation in pilot studies. the following set of four strategies and tactics are related to planning because they involve decisions based on predictions about future developments of the system environment – whether more proactive general strategic planning or more reactive tactical planning in specific circumstances. the planning strategies and tactics were expected to correlate positively with success (i.e., percent of remaining forest area at the end of the simulation). the second set consists of four errors because they involve immediate decisions that are not adapted adequately to the current state of the environment. the errors were expected to correlate negatively with success. one exception was the strategy “effect control and flexible strategy.” we could not make a clear prediction because a sudden change in behavior pattern can be either positively or negatively associated with performance depending upon the change. planning strategies and tactics • proactive strategic planning: active distribution of units, operationalized as distribution of fire-fighting units before the first fire starts in cycle 4 and use of patrol command that makes units patrol a specified area independently throughout the simulation. this strategy is expected to correlate positively with performance as it takes time for units to travel from one location to another. if units are not well distributed to cover the entire terrain, they may be too far from an emerging fire, allowing the fire to spread before they get there and making it harder to contain. the patrol command can be used to assign areas for units to patrol independently and search for fires, which can minimize potential travel time to emerging fires. • flexible strategy: sudden change in behavior pattern, for example, suddenly selecting several units at once and giving them the same command. (previous research has shown how stress can reduce flexibility in problem solving, e.g., renner & beversdorf, 2010.) a flexible strategy usually indicates the realization that an executed problem-solving approach did not show the expected result and a new problem-solving strategy has been executed to address the problem situation. • tactical planning: number of helicopters sent to a fire in the first 10 seconds after it starts. when a fire starts, a participant needs to plan. the participant has to detect the fire, have the goal to extinguish the fire, predict how long it takes a unit to reach the goal, predict spreading of fires due to wind. the participant must select one or more units (depending on their location and the strength of a specific fire), give them the goal command, and indicate where they need to move. the participant must consider the length of units’ travel times, the wind direction, and spreading of the fires, and decide upon the appropriate number of units to send to a fire to control it without having too many units in one place, leaving areas vulnerable to other emerging fires. • tactical planning: number of trucks sent to a fire in the first 10 seconds after the fire starts. decision-making errors • single-focus strategy and lack of multi-tasking: number of decisions related to one fire only and not to 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 3 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors other fires burning more than 5 seconds (counted for each of the 9 observation cycles where new fires started, minimum 0, maximum 9). (multitasking has been described as crucial for many dynamic tasks, e.g., hunt & joslyn, 2000; salvucci & taatgen, 2008.) while focusing on one fire, one cannot neglect the other burning fires. • incorrect order or function of commands: number of tactical decision-making errors, such as sending a unit to a fire and forgetting to click the extinguish command, or mixing up commands such as clicking the patrol command instead of the extinguish command when attempting to extinguish a fire, or clicking on extinguish without selecting a specific fire-fighting unit to receive the command. (these errors are related to slips and lapses as described by reason, 1990; slips are actions that are carried out but not in the intended way, and lapses are omissions in a sequence of actions.) commands have to be given in a certain form and sequence. tactical mistakes in executing commands will not lead to the intended results. • perceptual inaccuracy leading to inappropriate action: number of units sent to a burned field. (once a forest area has burned, it will remain grey, thus, sending units to burned areas instead of burning areas is a waste of resources and will have no effect.) • lack of foresight and impulsivity: number of distant units sent to a fire instead of nearby units. (other research has shown that impulsivity was related to lower performance and higher response frequencies in a dynamic task, quiroga, martínez-molina, lozano, & santacreu, 2011.) as previously mentioned, it takes units a relatively long time to travel to a fire location. we observed participants who, perhaps due to stress, sent distant units to a fire when they could have sent a unit nearby. by the time the unit reached the fire, the fire had spread. cultural applicability although several cross-cultural studies in the field of ddm exist (e.g., güss, tuason, & gerhard, 2010; strohschneider & güss, 1999), most of the research has been conducted in western europe and the united states. for psychology as a science, it is imperative, however, to study individuals across the world and to test theories and methods in various cultural and ethnic groups. using the winfire simulation, research has shown that already the problem perception differs among participants from different countries (güss, glencross, tuason, summerlin, & richard, 2004). brazilian participants, for example, perceived fire as more complex and more difficult compared to indian and us participants. the current investigation of ddm strategies and tactics and errors, was conducted with an asian sample, specifically in the philippines. the applicability and validity of a microworld in the filipino sample will be investigated. besides gathering winfire data and analyzing observation data, survey questions about the simulation will be distributed to assess what participants think of the simulation. goals of the current study in summary, the current study has three goals. first, the current study addresses a need to systematically investigate strategies, tactics, and errors in ddm over time and as predictors of performance. the four planning strategies and tactics are expected to correlate positively with performance and the four decision-making errors are expected to correlate negatively with performance. second, as observation is used as a method to assess strategies, tactics, and errors in their context, the study aims to examine the reliability of observation, and its potential as a data collection method. third, to increase generalizability of the findings, the applicability and validity of the microworld winfire developed in the west, is tested in an asian sample. method participants participants were 103 undergraduate students with various majors of study, from two universities in the northern philippines. fifty-nine percent of the participants were female, 41% were male. their ages ranged from 18 to 35 years (m = 20.0, sd = 2.59). computer simulation data from seven participants were incomplete (i.e., from 6 participants we have only one of two fire data sets, from one participant none of the two fire data sets were saved). we have observation data from all, but one participant. when initially performance was regressed on strategies and tactics, three outliers in fire a and three in fire b were deleted due to their mahalanobis and cook’s distances. thus the final data set was 96 participants in fire a and fire b. instruments winfire. winfire (gerdes et al., 1993) was used as research instrument and was already briefly described in the introduction. on the computer screen in winfire, participants see the forest, cities, red firefighting trucks, yellow helicopters, lakes, and a black stony area, and the command options. if fire-fighting units can extinguish a burning field in time, the field will remain green. if they come too late, the field will be completely burned and will turn grey. wind strength and direction determine how fast fires will spread and in which direction. participants are given eight command options. by selecting a unit and clicking the goal command, participants can send a unit to a specific location, for example, to a fire, or to a lake to fill up water. when units receive the search command, they independently search for fires in their vicinity. units that receive the extinguish command extinguish fires, and units that receive the clear command cut trees to create a barrier to prevent spreading of fires. the performance criterion in winfire is proportion of saved forest at the end of the game. at the end of each time step, the percentage of unburned forest is determined and saved in a computer file. the 11-minute winfire simulation consists of 111 time steps (trials), each lasting 6 seconds, after which the proportion of burning forest is updated. for the 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 4 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors participants, the situation seems to develop continuously. during the first 2 minutes of the simulation, no fires start, giving participants time to familiarize themselves with the situation and to distribute fire trucks and helicopters strategically. then a few small fires start, which can be contained easily because there is little wind. after 4 minutes of the simulation, fires start simultaneously and spread because of wind strength. in total, 15 fires (some starting on neighboring fields) are programmed to start at different times. the same simulation was played twice by each participant because we were interested in seeing whether and how strategies, tactics, and errors would change between winfire a and b. winfire a and winfire b were exactly the same simulation; due to its difficulty, participants performed very poorly in winfire a. without any intervention, 45.18% of the forest would be saved at the end of the simulation. overall, participants saved 50.06% (sd = 8.89) of the forest in winfire a. if we excluded the 4 outliers whose performance scores were more than 2 standard deviations above the mean during winfire a, the proportion of saved forest at the end of winfire a would be only 48.44 (sd = 4.00). thus, the simulation was very difficult for the participants, even after reading the instructions and practicing in a test game. despite the simulation’s difficulty, participants’ performance improved in the second simulation. they saved 63.18% (sd = 17.83) of the forest at the end of the winfire b game. thus, using the same simulation twice was necessary to see how well participants performed and whether they tried the same or different strategies or tactics, or made the same or different errors. coding system and training. experimenters were trained to identify the 4 planning strategies and tactics and 4 decision-making errors, and to code them accurately. the two authors trained the three filipino experimenters on the coding system. the training, which took a period of 1 week consisted of doing practice coding together as a team, and then coding individually and assessing the reliability of the coding. for each participant, 2 experimenters seated behind them when they worked on winfire and used the coding system to code each decision participants made and certain predefined resulting events. the 11-minute winfire time was divided into 22 cycles, each lasting 30 seconds (a time step in winfire lasted 6 seconds). at cycles 4, 5, 7, 9, 10, 12, 13, 14, and 15, fires started. the strategies and tactics coded were described in the introduction. interrater reliability was always calculated for the two experimenters present at each observation experiment. percent of agreement and kappa were calculated. according to fleiss (1981), a kappa between .40 and .60 is fair, between .60 and .75 is good, higher than .75 is excellent. overall, the kappas ranged from fair to excellent (see table 1). procedure initially, each participant received a 3-page instruction sheet explaining the context of winfire and the command options, including a screenshot. participants kept the sheet for the experiment duration. after reading the instructions, participants played a 10-minute test version of the winfire simulation to familiarize themselves with the commands and screen. two experimenters were seated about 3 feet slightly behind the participant and were instructed to explain the commands in the test game and to answer questions participants might have regarding the nature of the commands. however, experimenters were not allowed to give any form of advice on how to work or succeed in the simulation. then, each participant worked on exactly the same winfire simulation twice. while participants worked on winfire, experimenters were instructed to observe participants’ mouse clicks and actions on the screen and to fill out the coding sheet. two kappas raised questions: kappa .39 for sudden change in strategy and kappa -.02 for sending unit to burned field. it is important to consider, however, that these two strategies were defined and coded as yes/no compared to the other categories, which were numerical counts. for sudden change in strategy, the interrater agreement was actually 96%. so 96 times out of 100, both raters agreed with “no” and 1 time they agreed with “yes.” only 3 times out of 100 did one rater code “yes” and the other “no.” a similar pattern was found for sending unit to burned field (97% agreement). thus, despite the kappas, the interrater agreement for these two strategies can be considered very high. for such a complex coding system, the kappas and the agreement percentages can be regarded as highly satisfactory, probably due to the long and intensive training given to the experimenters. results before we describe strategies, tactics, and performance, we would like to briefly provide information about participants’ perception of the winfire simulation. the objective characterization of the winfire problem situation as moderate in complexity, high in dynamics, and moderate in transparency has been validated by participants’ subjective perceptions assessed through rating scales and by participants’ performance. after completing both versions of winfire, participants answered a 28-question survey using a 7-point likert scale (1-yes to 7-no) regarding the characteristics of winfire. referring only to questions with extreme mean value ratings above 6 and below 2, participants perceived that: small actions had big consequences (m = 1.65, sd = 1.09), the simulation was close to real life (m = 1.80, sd = 0.95), a single command could have a huge impact (m = 1.92, sd = 1.30), the simulation described a realistic situation (m = 1.96, sd = 1.14), many factors came together and influenced each other (m = 1.73, sd 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 5 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors table 1. planning strategies and tactics and errors and their operationalizations and interrater reliability. percent kappa agreement planning strategies and tactics proactive strategic planning: active distribution of units and use of patrol command 95.6 % 0.71 effect control and flexible strategy: sudden change in behavior pattern 97.0 % 0.39 tactical planning: number of helicopters sent to a starting fire 90.0 % 0.46 tactical planning: number of trucks sent to a starting fire 91.7 % 0.78 errors single-focus strategy and lack of multi-tasking: decisions related to one fire only and not to multiple fires 95.6 % 0.90 incorrect order or function of commands: number of tactical decision-making errors 86.1 % 0.73 perceptual inaccuracy leading to inappropriate action: number of units sent to a burned field 96.7 % -0.02 lack of foresight and impulsivity: number of more distant units sent to a fire instead of nearby units 93.3 % 0.42 = 0.97), and it was important to be successful in the game (m = 1.67, sd = 1.04). predictive validity: correlations between strategies, tactics, and errors and overall performance in winfire a and winfire b in order to measure the predictive validity of the assessed strategies, their frequencies were correlated with success in both versions of the winfire simulation. success was defined as percentage of forest saved at the end of the simulation. success was better in winfire b (m = 63.18, sd = 17.83) compared to winfire a (m = 50.06, sd = 8.89), as a paired samples t-test showed, t(91) = -7.42, p < .001, with a large effect size, cohen’s d = .68. the correlation between performance in winfire a and winfire b was r = .34, p < .001. we controlled for potential covariates such as computer skills, years using a pc, and problems with using the mouse. none of these correlated significantly with success. therefore, they were not included as covariates in further analyses. table 2 shows the correlations between strategies and success in winfire a and winfire b. it was expected that the first four planning strategies and tactics would be successful and correlate positively with success in both winfire versions and that the second set of four errors would be unsuccessful and correlate negatively with success in both winfire versions. none of the expected correlations was significant for winfire a (we excluded 4 extreme outliers for this analysis – participants ub02, ub17, ub23, and slu23, whose performance was higher than 70% of saved forest area). the non-significant correlations are probably due to a floor effect with 70% of all participants in winfire a having a performance between 41% and 49% of saved forest area. for winfire b, 6 of the 8 correlations were significant; of these 6 correlations, 2 were medium and 3 were large (cohen, 1988). the direction of all winfire b correlations was as expected with the exception of sending more distant unit to a fire. one would expect this to be a tactical mistake. it makes more sense and is more economical to send a unit that is close to a fire to the fire location. further examination revealed that sending a more distant unit to a fire correlated positively with number of trucks (r = .24, p = .02) sent to a starting fire. these data indicate that participants corrected their mistake and sent other trucks with the distant unit to the fire. although this is an error, in terms of performance, it was regarded as successful because ultimately, the participants were still sending units to a fire. they sent other closer units after sending the distant ones first and realizing that the far units are relatively slow. two strategies and tactics did not correlate significantly with success in either winfire version: sudden change in strategy and sending unit on burned field, possibly due to their frequency, i.e., below 2 in both winfire simulations (see figure 1). additionally, sudden change can be either positively or negatively related to performance depending on the change made. winfire a: cluster analysis in a next step we conducted cluster analyses for winfire a and winfire b to determine specific subgroups in our sample according to strategies and tactics used and to determine how well the selected strategies and tactics predicted success in winfire a. as for the correlations before, we also excluded 4 extreme outliers for the cluster analysis – participants 2, 16, 22, and 54, whose performance was higher than 70% of the saved area. we used the two step cluster analysis approach, because we did not know the number of clusters in advance, because it is a relatively robust technique, and because it is a novel method that addresses weaknesses of the k-means clustering method and the hierarchical clustering method (bacher, wenzig, & vogler, 2004). although two step cluster analysis is a method that can also incorporate categorical independent variables, according to the authors, it performs especially well if all variables are continuous. in our study we used eight variables, all of them being continuous. the cluster analysis for winfire a results in 2 clusters. cluster 1 comprises 37.4% (n = 34) of the participants and cluster 2 comprises 62.6% (n = 57) of the participants. the two most influential predictors were tactical planning: number of trucks sent to a starting fire (importance = 1.00) and proactive strategic planning: active distribution of units and use of 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 6 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors table 2. planning strategies and tactics and errors and their pearson correlations with performance in winfire a and winfire b. performance in performance in winfire a winfire a planning strategies and tactics proactive strategic planning: active distribution of units and use of patrol command .16 .31∗∗ effect control and flexible strategy: sudden change in behavior pattern .05 .12 tactical planning: number of helicopters sent to a starting fire .12 .58∗∗∗ tactical planning: number of trucks sent to a starting fire .07 .55∗∗∗ errors single-focus strategy and lack of multi-tasking: decisions related to one fire only and not to multiple fires .02 -.47∗∗∗ incorrect order or function of commands: number of tactical decision-making errors -.13 -.25∗ perceptual inaccuracy leading to inappropriate action: number of units sent to a burned field .04 -.15 lack of foresight and impulsivity: number of more distant units sent to a fire instead of nearby units −.15 .18† note. † p < .09. ∗ p < .05. ∗∗ p < .01. ∗∗∗ p < .001. patrol command (importance = 0.75). the means of the 4 strategies and tactics and 4 errors in the two cluster groups are represented in table 3. we can characterize group 1 as active, flexible, and big picture planners and decision makers. group 2, which consists of almost two thirds of the participants, can be characterized as slow or cautious, and single-focused decision makers. comparing the performance of cluster group 1 (m = 49.11, sd = 5.03) and group 2 (m = 48.07, sd = 3.23) using an independent samples t-test, group 1 did not perform better than group 2, t(88) = 1.19, p = .24. with the exceptions of sending a distant unit to fire and sending unit on burned field, the means confirm the hypotheses regarding correlations of strategies, tactics, and errors with performance. looking at the results and means of the two cluster groups one could have the impression that action leads to success. simple activity, however, defined as total number of commands given to units during the winfire simulation, did not correlate significantly with success. this correlation was r = .12, p = .23 for winfire a and r = .11, p = .28 for winfire b, showing further that it was the specific strategies, tactics, and errors, and not simply sending many units around, without plan or focus, that predicted success. winfire b: cluster analysis another two step cluster analyses was conducted for winfire b. the cluster analysis also resulted in 2 clusters. cluster 1 comprises 36.8% (n = 35) of the participants and cluster 2 comprises 63.2% (n = 60) of the participants. the two most influential predictors were single-focus strategy and lack of multi-tasking: decisions related to one fire only and not to multiple fires (importance = 1.00) and incorrect order or function of commands: number of tactical decisionmaking errors (importance = 0.37). the means of the 4 strategies and tactics and 4 errors in the two cluster groups are also represented in table 3. group 1, which consists of about one third of the participants, can be characterized as slow or cautious, and single-focused decision makers (similar to group 2 in winfire a). we can characterize group 2 (which is now the higher performing group as we will see, and similar to group 1 in winfire a) as active, flexible, and big picture planners and decision makers. comparing the performance of cluster group 1 (m = 54.88, sd = 11.72) and group 2 (m = 68.35, sd = 19.03) using an independent samples t-test showed that group 2 performed significantly better than group 1, t(91.84) = -4.25, p < .001. with the exception of sending distant unit to fire, all the means of the other 7 variables reflect the hypotheses on which strategies and tactics and which errors should correlate with performance. changes between winfire a and winfire b comparing winfire a and winfire b, an interesting question is to what extent participants stay in the respective successful and non-successful clusters, and if participants move from the non-successful cluster to the successful cluster. results of a chi-square test show marginal significant changes, χ2 (df = 1, n = 89) = 2.65, p = .10. most of the participants who were successful in winfire a were also successful in winfire b (n = 24 out of the 33, see table 4). nine out of the 33 were successful in winfire a and weak in winfire b. less than one third of all participants performed poorly in winfire a and also in winfire b (n = 25). what is interesting to note is the group that performed poorly in winfire a, but then performed well in winfire b (n = 31). all in all, roughly one third of all participants performed poorly in both winfire simulations, one third performed well in both winfire simulations, and one third improved their performance, where they first performed poorly in winfire a and then got better in winfire b. strategies, tactics, and errors in winfire a and winfire b the means of the strategies, tactics, and errors in the first and second winfire simulation are presented in figure 1 and table 5. the four planning strategies and tactics to the left were expected to enhance success (with the exception of “effect control and flexible strategy”), while the four errors to the right were expected to be detrimental to success. 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 7 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors table 3. means of the tactics and errors for the 2 custer groups in winfire a and winfire b (2 separate cluster analyses). the two cluster groups in both winfire a and winfire b differ on all 8 strategies, tactics, and errors, except the ones in the grey underlined fields. winfire a winfire b group 1 group 2 group 1 group 2 planning strategies and tactics 37.4 % 62.6 % 36.8 % 63.2 % proactive strategic planning: active distribution of units and use of patrol com1.94 0.30 2.26 2.60 mand effect control and flexible strategy: sudden change in behavior pattern 0.62 0.16 0.49 1.27 tactical planning: number of helicopters sent to a starting fire 1.85 0.56 1.11 2.92 tactical planning: number of trucks sent to a starting fire 5.62 1.70 3.06 6.65 errors single-focus strategy and lack of multi-tasking: decisions related to one fire only 3.85 4.28 5.80 1.15 and not to multiple fires incorrect order or function of commands: number of tactical decision-making errors 14.79 20.98 15.20 6.97 perceptual inaccuracy leading to inappropriate action: number of units sent to a 0.76 0.49 0.89 0.27 burned field lack of foresight and impulsivity: number of more distant units sent to a fire instead of nearby units 1.06 0.09 0.43 1.07 table 4. distribution of poor and strong performance according to the two clusters in winfire a and winfire b. cluster winfire b “weak” “strong” performance performance total cluster winfire a “strong” performance 9 24 33 “weak” performance 25 31 56 total 34 55 89 results showed that the four successful planning strategies and tactics to the left increased in winfire b compared to winfire a, and the two most frequent errors to the right decreased from winfire a to winfire b. the most frequent category was number of tactical errors, for example, sending a unit to a fire and forgetting to click the extinguish command or clicking extinguish without selecting a specific firefighting unit to receive the command. figure 1. means of the strategies, tactics, and errors in winfire a and winfire b. strategies, tactics, and errors in different cycles of winfire a and winfire b to further investigate decision making during a dynamic task, researchers have shown that it is beneficial to scrutinize the task and look at specific parts and events (e.g., sohn, douglass, chen, & anderson, 2005). figure 2 shows the development of successful planning strategies and tactics over time in both winfire simulations. figure 3 shows the development of errors over time in both winfire simulations. we expected successful planning strategies and tactics to increase, i.e., to be higher in winfire b than in winfire a (figure 2). indeed, the three expected planning strategies and tactics increased in winfire b. comparing the frequency of planning strategies and tactics in both winfire simulations (calculating repeated measures anovas and reporting only the main effects for winfire version), units were more actively distributed at the beginning of the simulation and received more patrolling commands, f(1,188) = 30.23, p < .001, η2p = .14; tactical planning was more frequent, i.e., the use of helicopters, increased, f(1,188) = 12.17, p = .001, η2p = .06, and more fire-fighting units were sent to fires, f(1,188) = 11.34, p = .001, η2p = .06. we expected errors to decrease, i.e., to be lower in winfire b than in winfire a. (figure 3). tactical errors decreased and focusing on only one fire also decreased comparing the first and second winfire simulations. comparing the development of errors in both winfire simulations (calculating repeated measures anovas and reporting only the main effect for winfire version), the number of tactical errors declined, f(1,188) = 31.48, p < .001, η2p = .14, and the focus on fighting several fires simultaneously instead of only one fire increased in winfire b, f(1,188) = 7.61, p = .006, η2p = .04. 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 8 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors figure 2. means of the three most frequent successful planning strategies (solid lines) and tactics (dotted lines) in winfire a (blue) and winfire b (red). cycles marked with a red bar indicate the start of new fires. figure 3. means of the two most frequent errors in winfire a (blue) and winfire b (red). cycles marked with a red bar indicate the start of new fires. 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 9 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors discussion the current study had three goals. first, it addressed a need for a systematic and precise investigation of strategies, tactics, and errors in ddm over time relating those to performance. second, the potential of observation as a method in ddm was examined. third, the applicability and validity of a microworld developed in the west was investigated in an asian sample. strategies, tactics, errors, and performance results showed interesting patterns in strategies, tactics, and errors specifically referring to planning and decision making. for example, tactical decisionmaking errors were quite frequent under this time pressure conditions and sudden change in strategy was rarely observed. participants mostly stuck to what they did, and did not change their strategic approach, so that one third of all participants was consistent in their poor performance and another third was consistent in their good performance. this finding is consistent with other research highlighting mental sets, i.e., a fixed way to solve problems (e.g., luchins & luchins, 1959). specifically for the third of participants who performed poorly both times, this shows the lack of strategic flexibility in adapting strategies to changes in environment (e.g., cañas, antolí, fajardo, & salmerón, 2005). the last third of our participants, the group whose performance improved, switched from a slow, cautious, and single-focused strategy to an active and flexible strategy. this confirms what rasmussen (1990, p. 458) pointed out as a necessity, “dynamic shifting among alternative strategies is very important for skilled people as a means to resolve resource-demand conflicts.” the observed strategies, tactics, and errors correlated with performance in the expected direction in winfire b and could explain variance in performance. correlations and cluster analyses revealed that successful participants compared to unsuccessful participants showed more proactive planning, more active decisions, more multi-tasking, more flexibility, and fewer decision-making errors (i.e., single-focus strategy, impulsivity, incorrect order). it is important to highlight that decision-making activity alone did not correlate with performance. this clearly supports the notion that what is important is not the number of activities or responses to a problem per se, but the intentionality and directedness of the activity to solve the problem. results showed the predictive validity of the postulated strategies, tactics, and errors and highlight the importance of strategies and tactics in the problem-solving process (see e.g., anderson, 2005; newell & simon, 1972; rieskamp, 2008). one can assume that the successful strategies in winfire will also lead to success in other complex and dynamic problem situations. for example, multi-tasking strategy and tactical plans that lead to problem solving success were also identified in several other studies (e.g., blackie & maharg, 1998). we come back to this point when we discuss the applicability of our findings. observation as a method in ddm the second goal for this study was methodological, i.e., to investigate whether observation of strategies and tactics by experimenters can be a robust method in the field of ddm. often times, researchers rely on the saved computer log files of participants to analyze their behavior (e.g., güss & dörner, 2011; schoppek, 1991; wüstenberg et al., 2014), and while doing so gives pertinent data on system and tactics, “bigger contextualized behaviors” and strategies can hardly be captured. results of the participants’ observations have shown that strategies and tactics can be identified and coded, and thus allow researchers to study ddm processes in their context as opposed to solely focusing on ddm outcome or the saved log files. particularly for this study, inter-rater agreement was acceptable to high. such credibility and reliability in the coding method is probably due to the time-intensive training received by experimenters. nowadays, although computer programs can be written to analyze these huge datasets and probably also some strategies, tactics, and errors as in our case (see also wüstenberg et al., 2014), still many strategies, tactics, and errors refer to several variable combinations over time in a specific context shown on the computer screen, and are thereby difficult to program. applicability of winfire in an asian sample most studies involving winfire and similar simulations have been conducted in australia, germany, sweden or the united states. to increase generalizability of research findings and to test the applicability of winfire, we had a relatively big sample of students in the philippines work on the simulation. one indicator of the applicability of winfire in this sample is that strategies tactics, and errors correlated with performance in the expected direction. it was also possible to distinguish successful from unsuccessful participants in winfire on the basis of their patterns in strategies, tactics, and errors. in winfire a, the most important predictors of the cluster analysis were tactical planning and proactive strategic planning. in winfire b, the most important predictors of the cluster analysis were multi-tasking and fewer tactical decision-making errors. in this asian sample, two kinds of planners and decision makers stand out, those who are active, flexible, and who see the bigger picture, which therefore perform successfully, and another group who is slow, more cautious, and are fixated on a single focus, which then performs poorly. another indicator of the simulation’s applicability is the survey data results. all of the objective criteria of winfire are confirmed in the participants’ subjective assessment of the simulation as describing a realistic situation, where small actions or a single action had great impact and consequences, and where 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 10 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors many factors came together and influenced each other leading to success or failure. limitations and recommendations for future research although the current study showed high inter-raterreliability of strategy and tactic observation, it could well be that other operationalizations of strategies in other simulations could be more prone to inter-rater bias. researchers working with microworlds can rely on observation, and as we did here, concurrently with saved log files of participants’ decisions and saved changes in the system to analyze strategies (see e.g., frensch & funke, 1995; gonzalez, 2005; güss, 2011) or processes (e.g., koop & johnson, 2011), and performance. although saved data might not always allow for investigations of more complex behavior patterns or strategies, they are certainly highly accurate and reliable and allow for different kinds of process analyses (see baker & delacruz, 2008). we would like to acknowledge that several tactics that we have operationalized in this study could potentially be derived from log files provided the output files are programmed accordingly. correlations among observations and log file data would add to the validity of future studies. another limitation related to the method of observation is that strategies, tactics, and errors related to one decision-making step are easier to observe compared to strategies, tactics, and errors related to another decision-making step. for example, goal definition can be inferred in winfire by observing participants commanding units to go to certain locations. what cannot be observed, however, is how the participant came up with this particular goal and whether the participant had alternative goals in mind or goal conflicts. it is similarly hard to observe behavior related to information collection. for instance, eye-tracking methodology would be required to collect data on the specific aspects of the computer screen a participant focused upon, i.e., which city, which fire trucks, and which fires. in addition to the saved decision logs, one could also use thinking-aloud protocols to gain information about all decision-making steps (güss et al., 2010). thus, our focus on the planning and decisionmaking steps is partly due to the constraints of the observation methodology we chose. relevance and applicability of findings to training and other domains the primary focus of winfire is not to simulate accurately all the details of fires and the physical environment, but to observe decision making under information overload and extreme time pressure (omodei et al., 2005, using the firechief simulation, argued the same). thus, one can expect that strategies, tactics, and errors identified for the winfire context can also be found and observed in a range of other timepressured dynamic situations, such as military or plant operations: tactical planning and being prepared for an emergency; having flexible and alternative plans of action in case the current approach is not successful; decisive and quick actions adjusted to the situational demands; multitasking and dealing with different problems at the same time; avoiding slips and lapses; information collection and perception of key problems. these ddm strategies represent domain general competencies (see also greiff & funke, 2009; güss, 2010). especially because the winfire simulation has been used as a powerful training instrument in management (motowidlo, dunnette, & carter, 1990), in medicine and aviation (thomas, 2009), and in team coordination among fire emergency responders (toups, kerne, hamilton, & shahzad, 2011), the findings we gathered specifically in identifying kinds of decision makers and planners that lead to success or failure may provide more information and insight in training. specifically the findings show that successful performance is related to increasing proactive strategic planning, increasing tactics for carrying out decisions and estimating resources, and decreasing errors such as fixating on a single focus when there are several other demands as well. microworlds and virtual environments allow decisions that result in catastrophic consequences, yet only virtual catastrophic consequences (e.g., o’neil & perez, 2008). future training programs, even using microworlds, could focus, for example, on strategy selection and application, and show how certain strategies and tactics can enhance success and other strategies and tactics can lead to decreased success (e.g., gonzalez, 2005; vollmeyer, burns, & holyoak, 1996), taking into consideration the unique demands of the simulated situations. this would lead to a strategy repertoire, allowing people to better perceive the circumstances to which a strategy best applies (see schunn, mcgregor, & saner, 2005). learning to expand one’s planning and decision making strategies to be more proactive, to use more foresight, to focus on several things at the same time, and to keep from being impulsive, in a virtual environment will hopefully lead to the use of these learned strategies and tactics when confronted with real-life problem situations. conclusion to conclude, the current study addressed the need to investigate strategies, tactics, and errors systematically in ddm using a relatively big sample size. observing participants while working on the winfire simulation was a reliable method which allowed for the assessment of strategies, tactics, and errors in their specific context. successful participants showed more proactive planning, more active decisions, more multi-tasking, more flexibility in their strategy, and fewer decision-making errors. the decrease in errors and increase in successful planning strategies and tactics showed that participants’ proficiency while solving problems increased. and in this assessment, it was 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 11 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors gleaned that some strategies and tactics, such as multitasking and tactical planning that are intentional and directed, lead to greater success, more than number of responses or activity alone. the study also confirms that microworlds can be used to study ddm strategies in a non-western culture, in a filipino sample, where utilizing this can be robust for scientific endeavors. acknowledgements: this research was supported by a national science foundation grant no. 0349997 to the first author and by a humboldt fellowship for experienced researchers also to the first author. we are thankful to those who helped us in data gathering and data analysis when we were in the philippines: lorraine matulac, oliver pangan, peter tuason, michelle westerwelle, and to our colleagues in the philippines for all their support and hard work. we also would like to thank shannon mcleish for editing a previous version of this manuscript. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: cdg designed the study. cdg and mtt conducted the study, collected the data, and coded the data together with the colleagues mentioned in the acknowledgements. lvo was primarily responsible for organizing the study and data collection in the philippines. cdg conducted the data analysis. cdg and mtt wrote the manuscript. supplementary material: supplementary material available online. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: güss, c. d., tuason, m. t., & orduña, l. v. (2015). strategies, tactics, and errors in dynamic decision making in an asian sample. journal of dynamic decision making, 1, 3. doi:10.11588/jddm.2015.1.13131 received: 01 april 2014 accepted: 16 october 2015 published: 10 november 2015 references anderson, j. r. (2005). cognitive psychology and its implications (6th ed.). new york, ny: worth publishers. bacher, j., wenzig, k., & vogler, m. (2004). spss two step cluster a first evaluation. social science open access repository. retrieved from: http://nbn-resolving.de/urn:nbn:de:0168 -ssoar-327153 baker, e. l., & delacruz, g. c. (2008). a framework for the assessment of learning games. in h. f. o’neil & r. s. perez (eds.), computer games and team and individual learning (pp. 21-37). oxford, uk: elsevier. blackie, j., & maharg, p. (1998). the delict game. retrieved from: http://www.academia.edu/1960624/the_delict_game bonissone, p. p., dutta, s., & wood, n. c. (1994). merging strategic and tactical planning in dynamic and uncertain environments. ieee transactions on systems, man, & cybernetics, 24, 841-862. doi:10.1109/caia.1992.200016 brehmer, b. (2004). some reflections on microworld research. in s. g. schiflett, l. r. elliot, e. salas, & m. d. coovert (eds.), (2004). scaled worlds: development, validation, and applications (pp. 22-36). hants, england: ashgate. brehmer, b., & dörner, d. (1993). experiments with computersimulated microworlds: escaping both the narrow straits of the laboratory and the deep blue sea of the field study. computers in human behavior, 9, 171-184. doi:10.1016/07475632(93)90005-d broadbent, d. e. (1977). levels, hierarchies, and the locus of control. quarterly journal of experimental psychology, 29, 181–201. doi:10.1080/14640747708400596 cañas, j. j., antolí, a., fajardo, i., & salmerón, l. (2005). cognitive inflexibility and the development and use of strategies for solving complex dynamic problems: effects of different types of training. theoretical issues in ergonomics science, 6, 95-108. doi:10.1080/14639220512331311599 cohen, j. w. (1988). statistical power analysis for the behavioral sciences (2nd ed.). hillsdale, nj: erlbaum. cook, r. i., & woods, d. d. (1994). operating at the sharp end: the complexity of human error. in m. s. bogner (ed.), human error in medicine (pp. 255-310). hillsdale, nj: erlbaum. dörner, d. (1980). on the difficulty people have in dealing with complexity. simulation & games, 11, 87–106. dörner, d. (1996). the logic of failure. new york, ny: holt. dörner, d. (1999). bauplan für eine seele [blueprint for a soul]. reinbek, germany: rowohlt. dörner, d., & güss, c. d. (2013). psi: a computational architecture of cognition, motivation, and emotion. review of general psychology, 17, 297-317. doi:10.1037/a0032947 dörner, d., & pfeifer, e. (1991). strategisches denken und stress [strategic thinking and stress]. zeitschrift für psychologie, supplement, 11, 71-83. fischer, a., greiff, s., & funke, j. (2012). the process of solving complex problems. the journal of problem solving, 4, 19-42. doi:10.7771/1932-6246.1118 fischhoff, b. (1975). hindsight is not equal to foresight: the effect of outcome knowledge on judgment under uncertainty. journal of experimental psychology: human perception and performance, 1, 288-299. doi:10.1037/0096-1523.1.3.288 flach, j. m. (1999). beyond error: the language of coordination and stability. in p. a. hancock (ed.), human performance and ergonomics (pp. 109-128). san diego, ca: academic press. fleiss, j. l. (1981). statistical methods for rates and proportions. new york, ny: wiley. frensch, p., & funke, j. (eds.). (1995). complex problem solving: the european perspective. hillsdale, nj: erlbaum. funke, j. (1991). solving complex problems: exploration and control of complex systems. in r. j. sternberg & p. a. frensch (eds.), complex problem solving: principles and mechanisms (pp. 185-222). hillsdale, nj: erlbaum. 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 12 http://nbn-resolving.de/urn:nbn:de:0168-ssoar-327153 http://nbn-resolving.de/urn:nbn:de:0168-ssoar-327153 http://www.academia.edu/1960624/the_delict_game http://dx.doi.org/10.1109/caia.1992.200016 http://dx.doi.org/10.1016/0747-5632(93)90005-d http://dx.doi.org/10.1016/0747-5632(93)90005-d http://dx.doi.org/10.1080/14640747708400596 http://dx.doi.org/10.1080/14639220512331311599 http://dx.doi.org/10.1037/a0032947 http://dx.doi.org/10.7771/1932-6246.1118 http://dx.doi.org/10.1037/0096-1523.1.3.288 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors funke, j. (2003). problemlösendes denken [problem solving thinking]. stuttgart: kohlhammer. funke, j. (2010). complex problem solving: a case for complex cognition? cognitive processing, 11, 133-142. doi:10.1007/s10339-009-0345-0 gerdes, j., dörner, d. & pfeiffer, e. (1993). interaktive computersimulation “winfire” [the interactive computer simulation “winfire”]. otto-friedrich-universität bamberg, germany: lehrstuhl psychologie ii. gonzalez, c. (2005). decision support for real-time, dynamic decision-making tasks. organizational behavior and human decision processes, 96, 142-154. doi:10.1016/j.obhdp.2004.11.002 granlund, r. (2003). monitoring experiences from command and control research with the c3fire microworld. cognition, technology, and work, 5, 183-190. doi:10.1007/s10111-003-0129-8 greiff, s., & funke, j. (2009). measuring complex problem solving: the microdyn approach. in f. scheuermann & j. björnsson (eds.), the transition to computer-based assessment. new approaches to skills assessment and implications for large-scale testing (pp. 157-163). luxemburg: office for official publications of the european communities. güss, c. d. (2000). planen und kultur? [planning and culture?]. lengerich, germany: pabst. güss, c. d. (2011). fire and ice: testing a model on cultural values and complex problem solving. journal of cross-cultural psychology, 42, 1279-1298. doi:10.1177/0022022110383320 güss, c. d., & dörner, d. (2011). cultural differences in dynamic decision-making strategies in a non-linear, timedelayed task. cognitive systems research, 12(3-4), 365376. doi:10.1016/j.cogsys.2010.12.003 güss, c. d., glencross, e., tuason, m. t., summerlin, l., & richard, f. d. (2004). task complexity and difficulty in two computer-simulated problems: cross-cultural similarities and differences. in k. forbus, d. gentner, & t. regier (eds.), proceedings of the twenty-sixth annual conference of the cognitive science society (pp. 511-516). mahwah, nj: erlbaum. güss, c. d., tuason, m. t., & gerhard, c. (2010). cross-national comparisons of complex problem-solving strategies in two microworlds. cognitive science, 34, 489–520. doi:10.1111/j.15516709.2009.01087.x hayes-roth, b. & hayes-roth, f. (1979). a cognitive model of planning. cognitive science, 3, 275-310. doi:10.1207/s15516709cog0304_1 hunt, e., & joslyn, s. (2000). a functional task analysis of timepressured decision making. in j. m. schraagen, s. f. chipman, & v. shalin (eds.), cognitive task analysis (pp. 119-132). mahwah, nj: erlbaum. johansson, b. j. e., trnka, j., granlund, r., & götmar, a. (2010). the effect of a geographical information system on performance and communication of a command and control organization. international journal of human–computer interaction, 26, 228246. doi:10.1080/10447310903498981 klein, g. (1999). applied decision making. in p. a. hancock (ed.), human performance and ergonomics (pp. 87-107). san diego, ca: academic press. koop, g. j., & johnson, j. g. (2011). response dynamics: a new window on the decision process. judgment and decision making, 6(8), 750–758. retrieved from: http://journal.sjdm.org/11/ m29/m29.html lipshitz, r. (2005). there is more to seeing than what meets the eyeball: the art and science of observation. in h. montgomery, r. lipshity, & b. brehmer (eds.), how professionals make decisions (pp. 365-378). mahwah, nj: erlbaum. luchins, a. s., & luchins, e. h. (1959). rigidity of behavior: a variational approach to the effect of einstellung. eugene, or: university of oregon books. montgomery, h., lipshitz, r., & brehmer, b. (2005). introduction: from the first to the fifth volume of naturalistic decision-making research. in h. montgomery, r. lipshitz, & b. brehmer (eds.), how professionals make decisions (pp. 1-11). mahwah, nj: erlbaum. motowidlo, s. j., dunnette, m. d., & carter, g.w. (1990). an alternative selection procedure: the low-fidelity simulation. journal of applied psychology, 75, 640-647. doi:10.1037/00219010.75.6.640 newell, a., & simon, h. a. (1972). human problem solving. englewood cliffs, nj: prentice-hall. omodei, m. m., mclennan, j., elliott, g. c., wearing, a. j., & clancy, j. m. (2005). “more is better?”: a bias toward overuse of resources in naturalistic decision-making settings. in h. montgomery, r. lipshitz, & b. brehmer (eds.), how professionals make decisions (pp. 29-36). mahwah, nj: erlbaum. o’neil, h. f., & perez, r. s. (eds.) (2008). computer games and team and individual learning. oxford, uk: elsevier. putz-osterloh, w., & lemme, m. (1987). knowledge and its intelligent application to problem solving. german journal of psychology, 11, 286-303. qudrat-ullah (2008). future directions on complex decision making: using modeling and simulation decision support. in h. qudrat-ullah, j. m. spector, & p. i. davidsen (eds.), complex decision making (pp. 323-337). cambridge, ma: springer. doi:10.1007/978-3-540-73665-3_16 quiroga, m. a., martínez-molina, a., lozano, j. h., & santacreu, j. (2011). reflection-impulsivity assessed through performance differences in a computerized spatial task. journal of individual differences, 32, 85–93. doi:10.1027/1614-0001/a000038 ramnarayan, s., strohschneider, s., & schaub, h. (1997). trappings of expertise and the pursuit of failure. simulation & gaming, 28, 28-43. doi:10.1177/1046878197281004 rasmussen, j. (1990). human error and the problem of causality in analysis of accidents. philosophical transactions of the royal society of london, 12, 449-460. doi:10.1098/rstb.1990.0088 reason, j. (1990). human error. new york, ny: cambridge university. renner, k. h., & beversdorf, d. q. (2010). effects of naturalistic stressors on cognitive flexibility and working memory task performance. neurocase, 16, 293–300. doi:10.1080/13554790903463601 rieskamp, j. (2008). the importance of learning when making inferences. judgment and decision making, 3, 261-277. retrieved from: http://journal.sjdm.org/bn6/bn6.html rigas, g., & brehmer, b. (1999). mental processes in intelligence tests and dynamic decision making tasks. in p. juslin, & h. montgomery (eds.), judgment and decision making: neobrunswikian and process-tracing approaches (pp. 45-66). mahwah, nj: erlbaum. salvucci, d. d., & taatgen, n. a. (2008). threaded cognition: an integrated theory of concurrent multitasking. psychological review, 115, 101–130. doi:10.1037/0033-295x.115.1.101 schaub, h. (2001). persönlichkeit und problemlösen: persönlichkeitsfaktoren als parameter eines informationsverarbeitenden systems [personality and problem solving. personality characteristics as parameters of an information processing system]. weinheim, germany: beltz. schmid, u., ragni, m., gonzalez, c., & funke, j. (2011). the challenge of complexity for cognitive systems. cognitive systems research, 12, 211-218. doi:10.1016/j.cogsys.2010.12.007 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 13 http://dx.doi.org/10.1007/s10339-009-0345-0 http://dx.doi.org/10.1016/j.obhdp.2004.11.002 http://dx.doi.org/10.1007/s10111-003-0129-8 http://dx.doi.org/10.1177/0022022110383320 http://dx.doi.org/10.1016/j.cogsys.2010.12.003 http://dx.doi.org/10.1111/j.1551-6709.2009.01087.x http://dx.doi.org/10.1111/j.1551-6709.2009.01087.x http://dx.doi.org/10.1207/s15516709cog0304_1 http://dx.doi.org/10.1080/10447310903498981 http://journal.sjdm.org/11/m29/m29.html http://journal.sjdm.org/11/m29/m29.html http://dx.doi.org/10.1037/0021-9010.75.6.640 http://dx.doi.org/10.1037/0021-9010.75.6.640 http://dx.doi.org/10.1007/978-3-540-73665-3_16 http://dx.doi.org/10.1027/1614-0001/a000038 http://dx.doi.org/10.1177/1046878197281004 http://dx.doi.org/10.1098/rstb.1990.0088 http://dx.doi.org/10.1080/13554790903463601 http://journal.sjdm.org/bn6/bn6.html http://dx.doi.org/10.1037/0033-295x.115.1.101 http://dx.doi.org/10.1016/j.cogsys.2010.12.007 http://dx.doi.org/10.11588/jddm.2015.1.13131 güss et al.: strategies, tactics, and errors schoppek, w. (1991). game and reality. reliability and validity of behavior-patterns in complex situations. sprache und kognition, 10, 15-27. schoppek, w., & putz-osterloh, w. (2003). individuelle unterschiede und die bearbeitung komplexer probleme [individual differences and complex problem solving]. zeitschrift für differentielle und diagnostische psychologie, 24, 163173. doi:10.1024/0170-1789.24.3.163 schunn, c. d., mcgregor, m. u., & saner, l. d. (2005). expertise in ill-defined problem-solving domains as effective strategy use. memory & cognition, 33, 1377-1387. doi:10.3758/bf03193370 sohn, m.-h., douglass, s. a., chen, m.-c., & anderson, j. r. (2005). characteristics of fluent skills in a complex, dynamic problem-solving task. human factors, 47, 742752. doi:10.1518/001872005775570943 spector, j. m. (2008). expertise and dynamic tasks. in h. qudrat-ullah, j. m. spector, & p. i. davidsen (eds.), complex decision making (pp. 25-40). cambridge, ma: springer. doi:10.1007/978-3-540-73665-3_2 strohschneider, s. (1991). problemlösen und intelligenz: über die effekte der konkretisierung komplexer probleme [problem solving and intelligence: on the effects of concretizing complex problems]. diagnostica, 37, 353-371. strohschneider, s., & güss, d. (1999). the fate of the moros: a cross-cultural exploration of strategies in complex and dynamic decision making. international journal of psychology, 34, 235252. doi:10.1080/002075999399873 thomas, m. j. w. (2009). integrating low-fidelity desktop scenarios into the high-fidelity simulation curriculum in medicine and aviation. retrieved from: http://www.unisanet.unisa.edu.au/ staff/matthewthomas/paper/thomas_desktopscenarios.pdf toups, z. o., kerne, a., hamilton, w. a., & shahzad, n. (2011). zero-fidelity simulation of fire emergency response: improving team coordination learning. proceedings of the acm chi conference on human factors in computing systems (chi ’11) (pp. 1959-1968). new york, ny: chi. doi:10.1145/1978942.1979226 vollmeyer r., burns b. d., & holyoak k. j. (1996). the impact of goal specificity on strategy use and the acquisition of problem structure. cognitive science, 20, 75100. doi:10.1207/s15516709cog2001_3 wenke, d., frensch, p. a., & funke, j. (2005). complex problem solving and intelligence: empirical relation and causal direction. in r. j. sternberg & j. e. pretz (eds.), cognition and intelligence (pp. 160–187). new york, ny: cambridge university press. wüstenberg, s., greiff, s., molnár, g., & funke, j. (2014). cross-national gender differences in complex problem solving and their determinants. learning and individual differences, 29, 1829. doi:10.1016/j.lindif.2013.10.006 10.11588/jddm.2015.1.13131 jddm | 2015 | volume 1 | article 3 | 14 http://dx.doi.org/10.1024/0170-1789.24.3.163 http://dx.doi.org/10.3758/bf03193370 http://dx.doi.org/10.1518/001872005775570943 http://dx.doi.org/10.1007/978-3-540-73665-3_2 http://dx.doi.org/10.1080/002075999399873 http://www.unisanet.unisa.edu.au/staff/matthewthomas/paper/thomas_desktopscenarios.pdf http://www.unisanet.unisa.edu.au/staff/matthewthomas/paper/thomas_desktopscenarios.pdf http://dx.doi.org/10.1145/1978942.1979226 http://dx.doi.org/10.1207/s15516709cog2001_3 http://dx.doi.org/10.1016/j.lindif.2013.10.006 http://dx.doi.org/10.11588/jddm.2015.1.13131 original research can motto-goals outperform learning and performance goals? influence of goal setting on performance and affect in a complex problem solving task miriam s. rohe1, joachim funke1, maja storch2 and julia weber2 1department of psychology, heidelberg university, germany and 2institute for self-management und motivation zurich ismz, switzerland in this paper, we bring together research on complex problem solving with that on motivational psychology about goal setting. complex problems require motivational effort because of their inherent difficulties. goal setting theory has shown with simple tasks that high, specific performance goals lead to better performance outcome than do-your-best goals. however, in complex tasks, learning goals have proven more effective than performance goals. based on the zurich resource model (storch & krause, 2014), so-called motto-goals (e.g., "i breathe happiness") should activate a person’s resources through positive affect. it was found that motto-goals are effective with unpleasant duties. therefore, we tested the hypothesis that motto-goals outperform learning and performance goals in the case of complex problems. a total of n = 123 subjects participated in the experiment. in dependence of their goal condition, subjects developed a personal motto, learning, or performance goal. this goal was adapted for the computer-simulated complex scenario tailorshop, where subjects worked as managers in a small fictional company. other than expected, there was no main effect of goal condition for the management performance. as hypothesized, motto goals led to higher positive and lower negative affect than the other two goal types. even though positive affect decreased and negative affect increased in all three groups during tailorshop completion, participants with motto goals reported the lowest rates of negative affect over time. exploratory analyses investigated the role of affect in complex problem solving via mediational analyses and the influence of goal type on perceived goal attainment. keywords: problem solving, goal setting, affect, performance, tailorshop global problems like climate change, unstablepolitical systems, and the financial crisis pose complex challenges. hence, solving complex problems is seen as key competency in today’s world (funke, 2013a; greiff, holt, & funke, 2013).an important research question in motivational psychology is how goals must be designed to successfully guide behavior (locke & latham, 1990). connecting these two lines of research, the present study investigates in what way different goal types influence complex problem solving. complex problem solving (cps) as defined by dörner, kreuzig, reither, and stäudel (1983), complex problems are characterized by five criteria: first, as the name suggests, the problem is complex such that the number of involved variables is high. second, these variables are mutually connected. third, the system is dynamic and changes over time – be it due to its own momentum or due to the problem solver’s actions. fourth, the relationships between the different variables are intransparent, so that the problem solver does not have all the information necessary to reach an optimal decision. fifth, the problem solver pursues multiple goals which frequently work in opposite directions. cps performance is often assessed via computer-simulated microworlds, which simulate the structural dependencies and the temporal dynamics of a given problem (funke, 2010). in the present study, the well-established tailorshop microworld was applied. in this simulation, participants are asked to behave like the ceo of a small shirt factory who aims to maximize the company value (e.g., danner, hagemann, holt, et al., 2011; danner, hagemann, schankin, hager, & funke, 2011; funke, 2010). past research hints at a relationship between the affect a person experiences and his or her cps performance. yet, two contradicting directions of influence are suggested by theory and research: on the one hand, positive affect might be helpful in cps situations because it is associated with higher self-esteem (e.g., brown & mankowski, 1993), creativity (estrada, isen, & young, 1994; isen, daubman, & nowicki, 1987), and a stronger confidence in one’s own resources (schwarz & skurnik, 2003). negative affect, in turn, fosters analytic, and systematic processing – a competency that is vital in cps (barth & funke, 2010; spering, wagener, & funke, 2005). thus, positive as corresponding author: dr. joachim funke, department of psychology, heidelberg university, hauptstr 47, 69117 heidelberg, germany. email: joachim.funke@psychologie.uni-heidelberg.de 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 1 mailto:joachim.funke@psychologie.uni-heidelberg.de https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving well as negative affect seem to mobilize distinct resources for cps. two experiments that analyzed the influence of affect in a cps task are worth mentioning. spering et al. (2005) asked participants to manage a forest enterprise in a computer simulation. before working on the task, participants received false positive or negative feedback on an intelligence test to trigger either positive or negative affect. the analyses showed that, surprisingly, affect did not influence cps performance. nevertheless, participants with negative feedback gathered more information at the beginning of the task. this is in accordance with the above mentioned phenomenon that negative affect fosters the acquisition of new information. barth and funke (2010) found further evidence for analytic processing in negative affective environments. their participants worked on the tailorshop which was either characterized by a positive environment (positive performance feedback through increasing profit) or by a negative environment (negative performance feedback through decreasing profit). as expected, the positive environment fostered positive affect whereas the negative environment fostered negative affect. the analyses revealed that the negative environment increased cps performance – a result that had not been found in the above mentioned study by spering et al. (2005). yet, a mediational analysis did not display the hypothesized influence of affect on the environment-performance relationship. in fact, the environment influenced affect, profit, and information retrieval, but affect did not have a mediating function. hence, negative environments were beneficial for analytic processing and cps performance, but the exact role of affect still remains to be clarified. the results suggest that it might not be affect itself that influenced cps performance but the degree of information retrieval or features of the environment. hence, mobilizing useful resources might be more important in cps than fostering a certain affective pattern. still, it is important to regard cps not only as a cognitive, but also as an emotional and motivational process (cf. funke, 2003, 2010, 2014). after all, affect and performance probably influence each other constantly. in the present study, a mediational analysis was conducted to explore the interplay between affect and performance. doing this, we investigated whether the relationship between affect before and after cps can be explained by cps performance. hence, the analysis covered both the influence of affect on cps and the influence of cps on affect. classic goal setting research one of the most famous and widespread motivational psychological theories is probably goal setting theory by locke and latham (1990), which is based on two main assumptions: first, goal difficulty positively predicts performance, provided that a person’s ability level is not exceeded. second, goal specificity plays a crucial role: the authors propose that high and specific goals give rise to a better performance than socalled do-your-best goals (i.e., the task to show the best possible performance). it is important to mention that when the authors speak of high, specific goals they mostly refer to performance goals, i.e., goals that focus on the performance outcome (seijts, latham, & woodwark, 2013). the great success of goal setting theory in simple laboratory tasks notwithstanding, several studies have shown that in more complex tasks, high, specific goals led to a lower performance than do-your-best goals (e.g., earley, connolly, & ekgren, 1989; kanfer & ackerman, 1989; mone & shalley, 1995). resource allocation can explain this finding: participants who do not have sufficient experience with a task benefit from putting cognitive resources into the discovery of task strategies rather than the achievement of a certain performance outcome (kanfer & ackerman, 1989). faced with these results, a number of subsequent studies revealed that in complex tasks, high, specific learning goals, focusing on the discovery of required task strategies, can lead to a higher performance than do-yourbest as well as performance goals (e.g., seijts, latham, tasa, & latham, 2004; winters & latham, 1996). to our knowledge, the affective content of learning vs. performance goals directly after goal induction has not been investigated so far. yet, research has shown that performance goals that students develop in university classes (i.e., striving after favorable judgments of one’s competence) can increase their anxiety, hopelessness, and shame about an upcoming exam in this particular class – at least when the goal focuses on avoiding negative judgments. learning goals, on the other hand, (i.e., striving after increasing one’s competence) can increase enjoyment, hope, and pride and decrease boredom and anger (daniels et al., 2009; pekrun, elliot, & maier, 2006; pekrun, elliot, & maier, 2009). hence, it can be assumed that already the induction of learning goals might trigger a more positive affective pattern than the induction of performance goals. investigating the influence of goal setting during task completion, research has revealed that performance goals can be associated with a feeling of helplessness, negative self-cognitions, and maladaptive attributions of failures when confronted with obstacles. in contrast, learning goals seem to be associated with higher positive affect and with more effective problem-solving strategies (diener & dweck, 1978, 1980; dweck & leggett, 1988). moreover, learning goals seem to buffer against negative performance feedback (cianci, klein, & seijts, 2010; kozlowski & bell, 2006). consequently, learning goals should lead to a more positive affective pattern directly after their induction as well as during cps than performance goals. however, most of the cited studies applied moderately complex tasks which did not always fulfill the complexity criteria established by dörner et al. (1983). one objective of the present study was to investigate goal setting in a truly complex problem solving task. 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 2 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving a new approach: motto-goals while the previous paragraph made clear that learning goals seem more adaptive than performance goals in complex tasks, the following section delineates why so-called motto-goals might be even more successful. in their zurich resource model (zrm), storch and krause (2014) developed motto-goals as a new goal type. other than high, specific goals, they describe an individual approach towards a task and aim to activate a person’s (unconscious) resources for this very situation. to develop a motto-goal, participants are instructed to choose from a variety of pictures one picture that triggers positive affect and that may serve as resource regarding a specific situation1. next, the person is given several positive associations with the picture and is instructed to select his or her favorite ideas. using these ideas, the person then develops a personal goal in a stepwise process. the resulting goal is called motto-goal and reflects how the particular person aims to approach the specific situation (e.g., i climb the mountain step by step and in my own pace). a central feature of motto goals consists of the affective response to the goal: during the whole process, emphasis is put on the development of a goal that triggers positive affect and that is associated with zero negative affect (storch & krause, 2014). one of the zrm’s underlying models is kuhl’s psi theory (2001). motto-goals should activate the extension memory, a highly inferential and complex system that is assumed to process information intuitively, holistically, flexibly, and very fast2. according to kuhl (2000), the extension memory "integrates an extended network of representations of own states, including personal preferences, needs, emotional states, options for action in particular situations, and past experiences involving the self" (p. 131). this suggests that mottogoals foster flexible and creative behavior, which might be especially helpful in complex situations. the zrm has been applied successfully in a wide variety of settings – be it in coaching and adult education (storch & krause, 2014), to treat persons with clinical disorders (schuler & sandmeier, 2008), or in organizational settings (e.g., temme, 2013). it has been shown that motto-goals, when compared to a control group, are able to reduce participants’ cortisol level in a stress test (storch, gaab, küttel, stüssi, & fend, 2007), to help patients with eating disorders to downregulate negative affect and to reduce dietary restrains (storch, keller, weber, spindler, & milos, 2011), and to increase affect regulation competencies in persons participating in a health prevention program (storch & olbrich, 2011). apart from this overall positive effect of motto-goals, further studies specifically compared motto-goals to high, specific goals. for example, it has been shown that motto-goals increase positive and decrease negative affect more effectively than high, specific goals (e.g., temme, 2013; weber, 2013) and that they are associated with higher goal attainment, higher personal identification (bruggmann, 2003), and higher goal commitment (huwyler, 2012) than high, specific goals. one study seems particularly useful to draw inferences about motto-goals in cps settings and is therefore explained in more detail: weber (2013) asked participants to name an unpleasant duty they had to deal with at the moment. more than half of the subjects chose duties from the categories writing texts (e.g., bachelor or master thesis), preparing and reworking studies (e.g., studying for exams), and handing in work on time (e.g., managing documents or bills). these categories can arguably be considered cps situations. weber’s participants then took part in a goal training where they developed either a motto-goal (e.g., dynamically and full of joy i dash towards my goal) or a high, specific goal (e.g., during the next three weeks, i will write my master thesis from monday to friday from 9 till 12 am. meanwhile, i switch off my mobile phone and my email inbox and i don’t let anything distract me) to approach their unpleasant duty3. weber showed that the motto-goal training significantly increased participants’ positive and decreased their negative affect while high, specific goals did neither influence positive nor negative affect. furthermore, mottogoals led to higher subjective change of experience and behavior one week after the training and to a stronger increase of self-reported action orientation after failure than high, specific goals. when considering that many participants in weber’s study chose rather complex unpleasant duties, the comparatively successful handling of unpleasant duties through motto-goals might be transferred to the cps setting in the present study. the present study the objective of the present study was twofold. first, we aimed to replicate the superiority of learning over performance goals in a truly complex task. second, we investigated whether the advantage of motto-goals over high, specific goals can also be found in a cps task. the first part of the study concerned the influence of goal setting on cps performance. as described above, past research revealed that learning goals can lead to higher task performance than performance goals in complex environments. this leads to the first hypothesis: hypothesis 1a: participants with high, specific learning goals show a higher cps performance than participants with high, specific performance goals. motto goals, however, might be even more adaptive as they can activate the extension memory, which is 1another possibility is the construction of a general motto-goal without having a specific situation in mind for which the goal might be helpful (storch & krause, 2014). 2for a comprehensive overview of psi theory, see kuhl (2000) and kuhl (2001). 3a third group dealt with a positive imagination of future goal realization. as this goal type is not considered in the present study, the study description is confined to motto-goals and high, specific goals. 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 3 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving considered helpful in complex environments. for instance, it can help to flexibly adjust goals in a cps task and to search for new problem solving strategies (biebrich & kuhl, 2003). further, motto-goals aim to make use of unconscious thought processes which seem important to achieve successful problem solving (e.g., dijksterhuis, bos, nordgren, & van baaren, 2006; dijksterhuis & nordgren, 2006). as explained above, research has already shown that persons pursuing motto goals can handle unpleasant duties more successfully than persons pursuing high, specific goals (weber, 2013). the present study hypothesizes that this is also the case for complex problems: hypothesis 1b: participants with mottogoals show a higher cps performance than participants with high, specific performance or learning goals. the second part of the study concerned the influence of goal setting on positive and negative affect before and after cps. as described above, motto-goals are by definition associated with high positive and low negative affect, which is corroborated by research investigating affect after goal induction (temme, 2013; weber, 2013). based on these results, we formulate the following hypothesis: hypothesis 2a: the induction of motto-goals leads to higher positive and lower negative affect than the induction of high, specific learning or performance goals. findings on learning and performance goals further suggest that learning goals might be associated with higher positive and lower negative affect than performance goals (daniels et al., 2009; pekrun, elliot, & maier, 2006; pekrun, elliot, & maier, 2009), which leads to the following hypothesis: hypothesis 2b: the induction of high, specific learning goals leads to higher positive and lower negative affect than the induction of high, specific performance goals. next, affective change due to cps was considered. complex problems, as they are very difficult to complete successfully, are likely to trigger frustration and a feeling of being overwhelmed. however, this might vary in dependence of goal setting. in case it was true that motto goals activate the extension memory, they should allow an integration of (possibly frustrating) tailorshop experiences into the self and avoid a feeling of helplessness and frustration (biebrich & kuhl, 2003; kuhl, 2001): hypothesis 3a: after having worked on the cps task, positive affect decreases less and negative affect increases less for participants with motto-goals than for participants with learning or performance goals. research indicates that learning goals seem to buffer against negative feedback more effectively and to be associated with higher positive affect than performance goals (cianci, klein, and seijts, 2010; diener & dweck, 1978, 1980; dweck & leggett, 1988; kozlowski & bell, 2006). hence, we postulated the following hypothesis: hypothesis 3b: after having worked on the cps task, positive affect decreases less and negative affect increases less for participants with learning goals than for participants with performance goals. apart from these specific hypotheses, the study contained an exploratory part: first, the interplay between affect and cps performance was investigated via mediational analyses. as explained above, positive as well as negative affect might be helpful in cps tasks. past research delivered inconsistent findings and suggested that it might not be affect itself that fosters cps performance, but a high degree of information retrieval or certain features of the environment. hence, although we assume that motto-goals are associated with high positive and low negative affect and foster high cps performance, it seems inappropriate to derive specific hypotheses regarding the interplay between affect and performance. second, we analyzed possible differences between the three goal types regarding perceived goal achievement, satisfaction with goal achievement, and difficulty of goal achievement. method the hypotheses were tested in a randomized experimental study. depending on the experimental condition, participants were instructed to develop (1) a high, specific performance goal, (2) a high, specific learning goal, or (3) a motto-goal adapted for the tailorshop. with the respective goal in mind, they completed the tailorshop scenario. participants and design an a-priori power analysis was conducted to estimate the number of participants required to reveal significant group differences. this was achieved via g*power (faul, erdfelder, lang, & buchner, 2007). following tabachnick & fidell (2007), a power of 1 − β = 0.80 with α = 0.05 was preset. seijts et al. (2004) reported an effect size of η2 = 0.07 regarding differences in cps performance between participants with learning, performance, or do-your-best goals. although we applied motto-goals instead of do-your-best goals in our study, the effect size might be a good estimate for the expected effect of the goal manipulation on tailorshop performance. hence, we used this value to calculate the required sample size. results of the power analysis showed that a total sample size of 132 participants should suffice to detect significant group differences. 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 4 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving we managed to recruit a total of 123 subjects (99 female, 24 male) aged between 17 and 35 years (m = 21.19, sd = 3.51)4. the majority of them (n = 105) were psychology students of heidelberg university. they received course credit for participation. a mixed factorial design was applied. goal type served as between-subjects factor with three levels (performance goals: n = 40; learning goals: n = 41; motto-goals: n = 42). affect was measured three times (baseline; after goal induction; after problem solving), so that the time of measurement constituted a withinsubjects factor. data was collected in a computer laboratory in groups of 3 to 20 participants who worked on the task on their own and were asked not to interact with each other. experimental task the cps task was the latest german version of the tailorshop scenario (danner, hagemann, holt, et al., 2011). in this computer simulation, subjects are the managers of a fictional organization that produces and sells shirts. it consists of two phases with different requirements: in the exploration phase, participants are instructed to explore the system freely over a simulated period of 6 months. during the second phase, the control phase, participants manage the tailorshop for 12 months with the assignment to maximize the company value. the tailorshop version used in the present study consists of 24 variables, of which 22 are visible in the user interface, and 12 can be directly controlled by the participants (e.g., salary). 12 variables cannot be manipulated directly, but are influenced by the subjects’ actions (e.g., job satisfaction). when participants click the “next” button, the passing of one month is simulated and the updated values of the system are displayed and visualized by arrows pointing up or down (danner, hagemann, holt, et al., 2011). a demo version of the tailorshop can be found online (https://www.psychologie.uni-heidelberg .de/ae/allg/tools/tailorshop/). procedure the experiment lasted approximately one hour, but there was no imposed time limit. the study was entirely computer-based and programmed with the online survey tool questback efs 10.4 (http://www .unipark.com/). when subjects arrived in the computer lab, they signed an informed consent and were given brief information about the study’s purpose and procedure. thereafter, they completed the questionnaire, which was identical in all three conditions except for the goal induction part. figure 1 illustrates the questionnaire’s composition. after a baseline measure of positive and negative affect (t1), subjects read the standard instruction of the tailorshop and were presented a graph illustrating the performance of previous participants which was based on data by danner, hagemann, holt, et al. (2011). this way, they familiarized themselves with the task, but did not gain any positive or negative experience with it. next, participants were instructed to develop a personal performance goal, a learning goal, or a motto-goal for the tailorshop, depending on their experimental condition. thereafter, affect was measured a second time (t2) and participants worked on the tailorshop scenario. before the exploration and the control phase started, they were reminded to keep their goal in mind. after tailorshop completion, affect was measured a third time (t3). for exploratory purposes, questions regarding goal attainment and sociodemographic data were assessed in the end. manipulation of goal type goal induction in all conditions started with a short text, framed in a goal-type specific way. participants in the performance goal condition were instructed to maximize their performance, to show their competence, and to avoid errors during tailorshop completion. subjects in the learning goal condition were instructed to maximize their learning success, to comprehend the relations underlying the system, and to regard errors as learning opportunity. texts for the learning and performance goal conditions were based on instructions of previous studies (cianci, klein, & seijts, 2010; kozlowski & bell, 2006; seijts et al., 2004). participants in the motto-goal condition were asked to mobilize their own resources, to develop a personal approach to the task, and to see the tailorshop as opportunity to make use of their own resources. in a next step, participants were instructed to develop a personal goal for the tailorshop. in the performance and learning goal condition, a high, specific goal was predetermined based on the data of previous studies to ensure that subjects indeed pursued such a goal. the predefined performance goal was to maintain a company value of at least 250,000 units, which had been reached by the best 10% in the study by danner, hagemann, holt, et al. (2011) and can therefore be considered a high, specific goal (kanfer & ackerman, 1989; locke & latham, 1990; winters & latham, 1996). similarly, in the learning goal condition, the predetermined goal was to learn at least 15 relationships between the different tailorshop variables. this had been achieved by the best 10% in a previous study applying concept maps (öllinger, hammon, von grundher, & funke, 2015). to align the time participants spent on goal setting across conditions and because motto-goal induction requires time due to the manualized process, subjects with learning and performance goals were instructed to answer specific questions step by step. after having developed a first version of the goal, they were asked to formulate it in first-person perspective. next, they were instructed to add until when they wanted to achieve the goal. in a next step, they included the methods they planned to use to reach their goal. last, they were asked why they strived for this goal. in every step, they were 4a post-hoc analysis applying g*power revealed that the actual power given the 123 participants was 1 − β = 0.77. 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 5 https://www.psychologie.uni-heidelberg.de/ae/allg/tools/tailorshop/ https://www.psychologie.uni-heidelberg.de/ae/allg/tools/tailorshop/ http://www.unipark.com/ http://www.unipark.com/ https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving figure 1. course of the online questionnaire. shown the latest version of their individual goal and added their new thoughts in a text field. sample goals of the participants are i want to understand at least 15 relationships within the 12 months through high attention and concentration in order to apply that knowledge to increase my company value (learning goal) and within 12 months i will reach a company value of at least 250000 with the help of motivation, intelligence, and organisation, in order to be successful and because i have the responsibility (performance goal). in the motto-goal condition, participants were shown 10 pictures (e.g., boy who just caught a fish) in randomized order and were asked to choose one of them which was associated with a good feeling and which served as resource for the tailorshop. after they had made their decision, subjects chose their favorite associations on this particular picture from a list depicting many different ideas (e.g., "yeah, i made it", "childlike joy", or "to present and enjoy success". using these favorite associations, they were asked to formulate a personal motto-goal which described how they planned to approach the tailorshop (e.g., i want to be like a child – without fear and without too much rumination – and be happy about my success and shout: yeah, i made it!)5. measures cps performance. following past research (danner, hagemann, holt, et al., 2011; danner, hagemann, schankin, et al., 2011; meyer & scholl, 2009; öllinger et al., 2015), only performance in the control phase of the tailorshop was analyzed. this was done via two indicators: the company value change (cv change) describes the absolute difference between the company value at the beginning of the tailorshop, which was the same for all participants, and the final company value. the company value trend (cv trend) indicates the number of months in which the company value increased. as proposed by danner, hagemann, holt, et al. (2011), only trends between the second and the last month were included. positive and negative affect. participants’ momentary positive and negative affect was assessed on a 5-point likert scale via the german version of the positive and negative affect schedule (panas; watson, clark, & tellegen, 1988) from krohne, egloff, kohlmann, and tausch (1996), which consists of 20 adjectives on two largely independent subscales. 10 items measure positive (e.g., interested, enthusiastic) and 10 items negative affect (e.g., upset, ashamed). the internal consistency was high for both positive (measure 1: cronbach’s α = .84, measure 2: α = .91, measure 3: α = .91) and negative affect (measure 1: cronbach’s α = .73, measure 2: α = .81, measure 3: α = .86). questions on goal. for exploratory purposes, subjects rated on a 10-point likert scale to what degree they had achieved their goal (not achieved at all – 5for a comprehensive description of the motto-goal development, see storch and krause (2014); for a motto-goal online tool similar to the one used in the present study, see http://ismz.ch/zrm/onlinetool.html. 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 6 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving completely achieved), how satisfied they were with this achievement (not satisfied at all – completely satisfied), and how difficult it was to achieve the goal (not difficult at all – extremely difficult). results before conducting the analyses, 5 participants (mottogoals: n = 2, learning goals: n = 2, performance goals: n = 1) were identified as outliers (z > 3.29, p < .001) regarding cv change or negative affect. following tabachnick and fidell (2007), the deviant scores were not excluded, but adjusted to the next extreme score of the respective condition, so that their statistical impact did not distort the analyses, but the fact that the persons had extreme values was taken into account6. table 2. planned contrasts to analyze the influence of goal type on performance. contrast motto-goals learning goals performance goals 1 2 -1 -1 2 0 1 -1 influence of goal type on cps performance table 1 depicts the means and standard deviations of both performance indicators in dependence of goal condition. both performance indicators correlated significantly, r = .57, p < .001. further, as apparent in table 1, mean cv change was negative in all three conditions. thus, in line with past research (barth & funke, 2010; danner, hagemann, schankin, et al., 2011; öllinger et al., 2015), the initial company value decreased over time. regarding group differences, we expected a main effect of goal type: participants with learning goals should show higher performance than participants with performance goals (1a) and participants with motto-goals should show higher performance than participants with learning or performance goals (1b). to test the hypotheses, two separate oneway analyses of variance (anova) with goal condition as between-subjects factor and cv change respectively cv trend as dependent variable were calculated7. to specify the assumed main effect of goal type, orthogonal contrasts were constructed (see table 2). other than expected, both anovas revealed no significant overall effect of goal condition, both f(2, 120) < 1, n.s.8 thus, hypotheses 1a and 1b were not supported9. influence of goal type on positive and negative affect next, the influence of goal condition on positive and negative affect was analyzed. figure 2 depicts the affective state of participants in the three conditions at all three times of measurement. two separate one-way anovas with goal condition as independent variable and baseline positive respectively negative affect as dependent variable showed that, as expected, participants in the three conditions did not differ in their initial affective state, both f(2, 120) < 1, n.s. influence on affect after goal induction. the second set of hypotheses postulated an interaction between goal type and time: it was assumed that mottogoals increased positive and decreased negative affect more strongly over time than learning or performance goals (2a) and that learning goals increased positive and decreased negative affect more strongly over time than performance goals (2b). we conducted two separate two-way mixed anovas with goal condition as between-subjects factor, time of measurement as within-subjects factor, and the baseline measurement of positive respectively negative affect (t1) as well as the measurement of positive respectively negative affect after goal induction (t2) as dependent variables. interaction contrasts were constructed to interpret the interaction between goal condition and time of measurement. these contrasts applied the weights depicted in table 2, but additionally included the factor time. as expected, the interaction between goal type and time was significant for both positive, f(2, 120) = 5.44, p < .01, and negative affect, f(2, 120) = 3.83, p < .05. also the first interaction contrast, comparing motto-goals to the other two goal types over time, was significant for positive, t (120) = 3.07, p < .01, and negative affect, t (120) = -2.26, p < .05. this indicated that motto-goal induction indeed led to a higher increase of positive affect and a higher decrease of negative affect than learning or performance goal induction, supporting hypothesis 2a (see figure 2). yet, the second interaction contrast, comparing learning to performance goals over time, was not significant and even pointed slightly into the opposite direction for both positive, t (120) = -1.18, p = .24, and negative affect, t (120) = 1.58, p = .12. hence, hypothesis 2b was not supported. influence on affect after cps. next, affective change due to tailorshop completion was investigated. 6 without this adjustment results did not change except for one case as reported below. 7due to the relatively high correlation of cv change and cv trend, a manova was not appropriate. 8the inclusion of gender as second independent variable revealed a significant main effect of gender, f(2, 116) = 5.15, p < .01. post-hoc tests with bonferroni correction showed that men’s cv change was slightly less negative, p = .08, and men’s cv trend was significantly higher, p < .01, than women’s. thus, men outperformed women in the tailorshop on both performance indicators, although the difference regarding cv change was only marginal. however, due to the unequal proportion of male vs. female participants and due to the lack of interaction effects, we collapsed across gender in all analyses. 9five participants (four of them in the motto-goal condition and one in the performance goal condition) had already performed the tailorshop before. however, as in all cases this experience was at least six months before study participation and as performance did not differ between groups, we did not further regard prior tailorshop experience. 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 7 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving table 1. means and standard deviations of the two tailorshop performance indicators in dependence of goal condition. cv change cv trend goal condition n m sd m sd motto-goals 42 -94,634.18 107,478.40 2.00 2.74 learning goals 41 -73,689.14 66,308.13 2.07 3.04 performance goals 40 -90,105.19 72,625.81 1.65 3.22 total 123 -86,179.66 84,213.21 1.91 2.98 figure 2. change of mean positive affect (left side) and negative affect (right side) over the three measurements in dependence of goal condition (motto, learning, performance). again, an interaction between goal type and time was assumed: positive affect should decrease and negative affect increase less over time when participants pursued motto-goals instead of learning or performance goals (3a) and positive affect should decrease and negative affect increase less over time when participants pursued learning instead of performance goals (3b). to test the assumptions, we performed anovas for positive and negative affect that included all three times of measurement. again, goal condition was applied as independent variable. unexpectedly, the main effect of time was significant for positive, f(2, 196) = 24.95, p < .001, and negative affect, f(1, 167) = 39.32, p < .001. this result is not surprising when considering the change of affect over time as depicted in figure 2. the main effect of goal type was not significant for positive affect, f(2, 120) = 1.03, p = .36, but it was significant for negative affect, f(2, 120) = 3.45, p < .0510. contrasts, applying the weights depicted in table 2, showed that subjects with motto-goals reported significantly lower negative affect than subjects with learning or performance goals, p < .05, whereas the difference between participants with learning versus performance goals was not significant, p = .24. unexpectedly, the interaction between goal type and time was not significant for positive, f(3, 196) = 1.35, p = .26, as well as negative affect, f(3, 167) < 1, n.s. thus, groups did not differ in how their affect changed from the baseline measure to the third measure, so that hypotheses 3a and 3b were not supported. yet, participants with motto-goals reported the lowest rate of negative affect over time. exploratory analyses affect and cps. apart from the hypothesis testing, we explored the role of affect in cps. to illuminate this issue, mediational analyses were conducted for positive and negative affect and the two performance indicators. affect before cps (after goal induction) served as independent variable, affect after cps served as dependent variable, and cps performance served as potential mediator variable. doing this, the values of participants in all three conditions were aggregated. because the two performance indicators seem to be causally related (a high number of gain months implies a high final company value and vice versa), a multiple mediator model with both performance indicators seemed inappropriate (hayes, 2013). the preferred option was to calculate four different models for each of the two performance indicators and for positive and negative affect separately. as the variance of cv change was extremely large compared to the variance 10when running the analysis without adjusting the outliers regarding negative affect, the main effect of negative affect was only marginally significant, f(2, 120) = 2.91, p = .06. 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 8 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving of the other variables, all variables were z-standardized beforehand. the significance of indirect effects was tested with the help of bootstrapped 95% confidence intervals (bootstrapping sample = 5000). figure 3 displays the resulting models including b-values and significance levels. the first and the second mediational model (upper part of figure 3) tested whether positive affect before and after cps were related and whether this relationship was mediated by cv change respectively cv trend. results showed that the direct effect of positive affect before cps on positive affect after cps was significantly positive. that is, subjects who experienced positive affect after goal induction were likely to experience positive affect after cps as well. furthermore, a high performance regarding cv change and cv trend significantly predicted positive affect after cps. however, positive affect before cps did neither predict cv change nor cv trend, so that the indirect effect of positive affect before cps on positive affect after cps through performance was not significant either. hence, performance did not mediate the relationship between positive affect before and after cps. the third and the fourth model (lower part of figure 3) investigated whether the relation between negative affect before and after cps was mediated by cv change respectively cv trend. the direct effect of negative affect before cps on negative affect after cps was again significantly positive. that is, participants who experienced negative affect after goal induction were likely to experience negative affect as well after having completed the tailorshop. in addition, the relationship between cv change respectively cv trend and negative affect after cps was significant and negative. hence, low performance led to comparatively high negative affect. finally, the influence of negative affect before cps on tailorshop performance was marginal for cv change and significant for cv trend. thus, approaching the tailorshop with highly negative affect led to low cv trend and low cv change values, which indicates bad performance. the indirect effect of negative affect before cps on negative affect after cps through cv trend was small but significant in both models (model 3: b = .04, bca ci [0.01, 0.09], κ2 = .04; model 4: b = .05, bca ci [0.02, 0.11], κ2 = .06). thus, the data suggest that the relationship between negative affect before and after cps was mediated by tailorshop performance, albeit with a small effect size. goal attainment. last, it was analyzed in an exploratory fashion whether the perceived degree and difficulty of goal attainment and the satisfaction with goal attainment differed across groups. for this purpose, a one-way independent manova with goal condition as independent variable and goal attainment, satisfaction with goal attainment, and difficulty of goal attainment as dependent variables was conducted. the overall effect of goal condition was significant, f(6, 238) = 3.51, p < .01. subsequent oneway anovas showed that goal type significantly influenced all three dependent variables, all f(2, 120) > 4.34, all p < .05. bonferroni-corrected post-hoc tests revealed that participants with motto-goals (m = 5.38, sd = 2.52) reported higher goal attainment than participants with learning (m = 3.83, sd = 1.90), p < .05, and performance goals (m = 3.55, sd = 2.88), p < .01. furthermore, subjects with motto-goals (m = 4.95, sd = 2.69) were more satisfied with goal attainment than subjects with learning (m = 3.39, sd = 2.02), p < .05, and marginally more satisfied than subjects with performance goals (m = 3.63, sd = 3.02), p = .07. last, participants with motto-goals (m = 6.57, sd = 1.85) judged goal attainment as easier than participants with learning (m = 7.90, sd = 1.36), p < .01, and performance goals (m = 8.03, sd = 1.99), p < .001. discussion the major aim of the present study was to examine the influence of three different goal types – motto, learning, and performance goals – on performance and affect in a cps task. other than expected, all three groups performed equally well in the tailorshop, which might be due to several reasons. first, participants in the motto-goal condition might have lacked enough familiarity with the tailorshop to develop an appropriate motto-goal that truly helped them to activate necessary resources. furthermore, goal commitment might not have been high enough to indeed influence performance (e.g., locke & latham, 1990; seijts et al., 2004). leaving aside motto-goals, the finding that performance goals led to the same performance as learning goals was surprising in the light of the number of past studies that ascribed learning goals an advantage in complex tasks. one plausible explanation for this finding is that most of the previous studies did not use truly complex tasks. combined with the rather low effect size of the advantage of learning over performance goals in complex tasks (average: d = -.39; seijts et al., 2013), the tailorshop might have been too complex for learning goals to outperform performance goals. instead of goal setting, the ability of participants to deal with such tasks might have been a stronger predictor of performance (locke & latham, 2002). furthermore, cps performance in the present study might be confounded with participants’ prior knowledge or intelligence, so that the measurement might have been too unreliable for group differences to gain significance (kretzschmar, neubert, wüstenberg, & greiff, 2016)11. last, the estimated power that we actually achieved (1 β = 0.77) was slightly lower than the power that we aimed at (1 β = 0.80), which might partly explain the missing group differences. however, group differences were so small that they probably still remained insignicant even in the case of a slightly larger sample size. although goal type did not affect cps performance, it did influence the affective state participants re11 thanks to an anonymous reviewer for this remark. 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 9 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving figure 3. models of affect before cps as predictor of affect after cps, mediated by cps performance. note. the indirect effect of affect before on affect after cps through cps performance is in parentheses. + = p < .10, * = p < .05, ** = p < .01, *** = p < .001. ported. the induction of motto-goals led to higher positive and lower negative affect than the induction of the other two goal types. hence, in accordance with past research, the development of motto-goals seemed to allow for a positive, possibly more optimistic approach towards a complex task. yet, the affective pattern turned more negative in all three groups in the course of the task, which shows that the advantage of motto-goals did not remain stable over time. instead, experiences with the tailorshop might have been similarly frustrating in all three groups, so that motto-goals were not able to buffer against the increase of negative and decrease of positive affect. however, the general increase of negative affect notwithstanding, participants with motto-goals reported significantly lower negative affect than participants with learning goals when aggregating all three measures. thus, although motto-goals could not completely avoid negative affect in the tailorshop, they at least reduced it. exploratory mediation analyses furthermore showed that – in line with past research – success in cps increased positive affect, while failure increased negative affect (e.g., barth & funke, 2010). this affective response on cps performance is congruent with common sense: if persons perform well, they feel good and vice versa. the results become more interesting if affect before cps is regarded as well, which in past research delivered ambiguous findings. in the current study, negative affect before cps negatively predicted performance when all three conditions were aggregated. thus, subjects completed the tailorshop more successfully if they approached it with low negative affect. this result at first sight appears to contradict the findings by barth and funke (2010). in their study, performance was higher when the tailorshop’s environment was characterized by bad performance feedback. considering that barth and funke used the same affect measure (items of the panas) and the same cps task (tailorshop) as the present study, this discrepancy of results is particularly astonishing. a possible explanation for this contradiction is that both studies applied different designs: barth and funke regarded affect as symptom of a nice or nasty environment, whereas the present study regarded affect as symptom of a particular goal induction. the main difference between these two approaches is that in the first case affect was assessed during tailorshop completion and was likely to be influenced by the ongoing tailorshop experience, while in the latter case affect was measured before the tailorshop was started. furthermore, taking a closer look at the results by barth and funke, it was not negative affect per se that increased performance. rather, nasty environments influenced negative affect as well as cps performance, but negative affect did not mediate this relationship. barth’s and funke’s study and the present findings could be integrated by assuming that persons perform well if they approach the tailorshop with low aversion, but if they experience negative affect to some degree during task completion, as this can foster a focus on the retrieval of important task information (spering et al., 2005). the significant indirect effect in the mediational analyses suggests that the change of baseline negative affect to negative affect after cps can partly be explained by cps performance. in other words, persons who approached the tailorshop with low negative affect performed better, which in turn further decreased their negative affect. yet, the results of the mediation analysis have to be interpreted cautiously as another reason for the indirect effect might be unreliable measurement of the applied variables (westfall & yarkoni, 2016)12. all in all, the present study suggests that 12 thanks to an anonymous reviewer for this remark. 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 10 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving negative affect might be detrimental for task performance. however, as past research did not find a relationship between affect and cps, further research is required to draw solid inference. yet, the present results demonstrate the importance of understanding cps not merely as a cognitive, but also as an emotional and motivational process (cf. funke, 2003, 2010, 2014). furthermore, exploratory analyses revealed that participants with motto-goals judged their goal attainment as higher, easier, and more satisfying than participants with learning or performance goals. these results are in accordance with the fact that high, specific goals can only be reached by a small percentage of a population. past research has shown that the attainment of personally important goals predicts life satisfaction (judge, bono, erez, & locke, 2005), subjective and psychological well-being, and even perceived meaning of life (stauner, 2013). above that, a long-term consequence of goal attainment might be an increase in self-efficacy. high goal attainment can therefore be seen as further positive feature of mottogoals beyond the positive affective pattern. this way, motto-goals might offer a solution to a goal setting dilemma addressed by locke (1996), which describes the conflict that high, specific goals can increase performance, but decrease satisfaction: motto-goals seem to be at least as successful for cps performance as high, specific goals, and at the same time they avoid the problem of low attainment and satisfaction. limitations of the present study two main limitations of the present study are important to be mentioned: first, the majority of participants were first semester psychology students with an exceptionally good final exam grade who were most likely highly motivated and ambitious. as we were not able to provide compensation for participation (e.g., in terms of course credit) for non-psychology students, they were not eager to participate. the second limitation refers to the goal manipulation: even though learning and performance goal manipulation was highly similar, the motto-goal condition differed in some aspects: motto-goals were developed completely freely, while the other two goal types were based on a specific outcome goal (final company value of at least 250,000/ learning at least 15 relationships between variables), which was specified individually. resulting group differences might thus be due to a difference in participation in the goal setting process rather than to goal type per se. to minimize this problem, a reflection process was encouraged in all three groups such that participants with learning and performance goals were instructed to consider until when, how, and why they wanted to achieve the goal. an additional limitation is the possibility that participants developed further, self-set goals (e.g., seijts & latham, 2011). although the self-setting of goals probably cannot be prevented, future research might benefit from asking participants after task completion whether they had developed any additional goal. furthermore, goals might have varied with regard to their proximity. in most cases, learning and performance goals referred to the last of the 12 months, whereas motto-goals described a general approach from the outset of the tailorshop. future research might benefit from controlling for goal proximity, for instance by combining high, specific distal goals with proximal sub-goals (cf. kozlowski & bell, 2006; seijts & latham, 2001). implications for further research to qualify and extend the present findings, several ideas for further research projects seem promising. first and foremost, the present study mainly tested main and interaction effects to analyze the influence of goal condition on different dependent variables. to better understand the mechanism underlying these relationships, the inclusion of further possible mediator and moderator variables seems important. in this regard, it might be analyzed whether the positive influence of motto-goals on affect is mediated by lower tension when compared to performance or learning goals. furthermore, the influence of variables like goal commitment, self-efficacy, or action orientation might be of interest. also an in-depth qualitative analysis of the developed goals could yield insights into mechanisms of goal setting. second, motto-goals may be tested against high, specific goals in a cps task participants are well acquainted with. doing this, motto-goal development could be better based on personal experiences with the task. third, further research may investigate whether goal setting influences specific discrete emotions (funke, 2010). for instance, a certain degree of nervousness or anxiousness might be beneficial (cf. yerkes-dodson-law; yerkes & dodson, 1908), while shame or hostility seem less adaptive in complex tasks. beyond, it might be interesting to analyze not only explicit, but also implicit affect, which also seems to be influenced by motto-goals (weber, 2013). fourth, further research may apply individual goal orientation as further control variable or analyze the fit between personal goal orientation and external goal setting. for instance, learning goals might be more adaptive if individuals exhibit a stable learning goal orientation. last, latest research has suggested that the simultaneous use of learning and performance goals can increase performance (masuda, locke, & williams, 2015). further research may extend these findings by investigating different combinations of goals. for instance, the combination of motto and learning goals might be adaptive in cps situations. practical implications bearing in mind that problem solving is one of the key competencies in today’s world, the practical relevance of this topic is apparent. complex technologies are all around, organizations apply complex tasks in personnel selection (meyer, grüttig, oertig, & schuler, 2009), and even the latest pisa study (programme for 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 11 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving international student assessment) acknowledged the importance of cross-curricular problem solving competencies by incorporating cps tasks (funke, 2013b; greiff et al., 2013). the present study revealed a slight advantage of motto-goals over high, specific goals. albeit performance did not differ across goal types, motto-goals increased positive and decreased negative affect directly after their induction and helped participants to maintain the low level of negative affect even in the light of frustrating experiences in the tailorshop. what is more, motto-goals led to a higher degree of and satisfaction with goal attainment, which might positively influence well-being. the present study thus extends the list of situations in which motto-goals can be beneficial. this is especially noteworthy with regard to the fact that high, specific goals enjoy great success not only in research, but also in practice – be it in psychotherapy, in coaching, or in economics. hidden under the acronym s.m.a.r.t. (specific, measurable, attractive, realistic, terminated), high, specific goals are well-known and are often the first choice in situations where goal setting is relevant (storch, 2011). the present study further corroborates storch’s argument that the potency of such s.m.a.r.t. goals is limited in complex situations. the advantage of mottogoals might be even more pronounced when considering real-life complex problems in which persons make use of a broad network of past experiences and options for action (kuhl & strehlau, 2014). apart from goal-setting, the findings regarding the interrelation between cps performance and affect also have practical implications. the results suggest that approaching a complex problem with highly negative affect lowers task performance. the rather small effect size notwithstanding, this finding might cautiously be transferred to real-world contexts. for instance, working tasks like managing projects, organizing an upcoming event, talking to psychiatric patients, teaching school children, or planning a construction site certainly require complex problem solving skills. the negative relationship between negative affect and cps performance found in the present study suggests that feelings of frustration or low satisfaction should be avoided when approaching such tasks. the lacking relationship between positive affect and cps performance shows that it is not required to feel enthusiastic about the upcoming task, but that a rather neutral affective state can be helpful to approach complex tasks successfully. hence, employers might want to avoid negative affect in their employees – not only to protect people from negative feelings, but also because they might have a direct influence on their task performance. conclusion the present study contributes to cps research as well as goal setting research by comparing a newly developed goal type – motto-goals – to the well-established high, specific goals in a cps task. with regard to cps research, it was tried to shed light on the complex interplay between affect and performance. the results revealed that low negative affect was associated with high cps performance, emphasizing the role of affective processes in cps. with regard to goal setting research, it was analyzed whether motto-goals can outperform learning and performance goals in cps. against our expectations, cps performance did not differ across the three goal conditions, which suggests that goal setting exerted a weaker influence than other factors, for instance personal problem solving competencies. despite that, motto-goals showed a clear advantage over the other two goal types: first, participants with motto-goals perceived their goal attainment as higher, easier, and more satisfying than participants with learning or performance goals. second, motto-goals animated subjects to approach the tailorshop with a more positive affective state and to maintain comparatively low negative affect in the course of the possibly frustrating cps experience. all these results show that motto-goals – even if developed in a short online tool – have the power to encourage persons to approach difficult tasks with a good feeling. acknowledgements: we express our sincere thanks to daniel danner and stephanie hammon for providing us with data of previous tailorshop studies. we also thank daniel holt for his help with the tailorshop software. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: the authors contributed equally to this work. supplementary material: supplementary material available at http://journals.ub.uni-heidelberg.de/index .php/jddm/rt/suppfiles/28510/0. copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: rohe, m. s., funke, j., storch, m., & weber, j. (2016). can motto-goals outperform learning and performance goals? influence of goal setting on performance and affect in a complex problem solving task. journal of dynamic decision making, 2, 3. doi:10.11588/jddm.2016.1.28510 received: 03 march 2016 accepted: 02 august 2016 published: 16 september 2016 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 12 http://journals.ub.uni-heidelberg.de/index.php/jddm/rt/suppfiles/28510/0 http://journals.ub.uni-heidelberg.de/index.php/jddm/rt/suppfiles/28510/0 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving references barth, c. m., & funke, j. (2010). negative affective environments improve complex solving performance. cognition & emotion, 24(7), 1259-1268. doi:10.1080/02699930903223766 biebrich, r., & kuhl, j. (2003). innere kapitulation beim komplexen problemlösen: dissoziative versus integrative verarbeitungsstrategien [inner capitulation in problem solving: dissociative versus integrative processing strategies]. zeitschrift für differentielle und diagnostische psychologie, 24(3), 175184. doi:10.1024/0170-1789.24.3.175 brown, j. d., & mankowski, t. a. (1993). self-esteem, mood, and self-evaluation: changes in mood and the way you see you. journal of personality and social psychology, 64(3), 421430. doi:10.1037/0022-3514.64.3.421 bruggmann, n. (2003). persönliche ziele: ihre funktion im psychischen system und ihre rolle beim einleiten von veränderungsprozessen [personal goals: their function within the psychic system and their role in initiating change processes] (master’s thesis). retrieved from http://zrm.ch/images/ stories/download/pdf/wissenschftl_arbeiten/seminararbeiten/ seminararbeit_bruggmann_20090904.pdf cianci, a. m., klein, h. j., & seijts, g. h. (2010). the effect of negative feedback on tension and subsequent performance: the main and interactive effects of goal content and conscientiousness. journal of applied psychology, 95(4), 618630. doi:10.1037/a0019130 daniels, l. m., stupnisky, r. h., pekrun, r., haynes, t. l., perry, r. p., & newall, n. e. (2009). a longitudinal analysis of achievement goals: from affective antecedents to emotional effects and achievement outcomes. journal of educational psychology, 101(4), 948-963. doi:10.1037/a0016096 danner, d., hagemann, d., holt, d. v., hager, m., schankin, a., wüstenberg, s., & funke, j. (2011). measuring performance in dynamic decision making. journal of individual differences, 32(4), 225-233. doi:10.1027/1614-0001/a000055 danner, d., hagemann, d., schankin, a., hager, m., & funke, j. (2011). beyond iq: a latent state-trait analysis of general intelligence, dynamic decision making, and implicit learning. intelligence, 39(5), 323-334. doi:10.1016/j.intell.2011.06.004 diener, c. i., & dweck, c. s. (1978). an analysis of learned helplessness: continuous changes in performance, strategy, and achievement cognitions following failure. journal of personality and social psychology, 36(5), 451-462. doi:10.1037/00223514.36.5.451 diener, c. i., & dweck, c. s. (1980). an analysis of learned helplessness: ii. the processing of success. journal of personality and social psychology, 39(5), 940-952. doi:10.1037/00223514.39.5.940 dijksterhuis, a., bos, m. w., nordgren, l. f., & van baaren, r. b. (2006). on making the right choice: the deliberation-without-attention effect. science, 311(5763), 10051007. doi:10.1126/science.1121629 dijksterhuis, a., & nordgren, l. f. (2006). a theory of unconscious thought. perspectives on psychological science, 1(2), 95109. doi:10.1111/j.1745-6916.2006.00007.x dörner, d., kreuzig, h. w., reither, f., & stäudel, t. (eds.). (1983). lohhausen: vom umgang mit unbestimmtheit und komplexität [lohhausen: dealing with uncertainty and complexity]. bern, switzerland: huber. dweck, c. s., & leggett, e. l. (1988). a social-cognitive approach to motivation and personality. psychological review, 95(2), 256273. doi:10.1037/0033-295x.95.2.256 earley, p. c., connolly, t., & ekgren, g. (1989). goals, strategy development, and task performance: some limits on the efficacy of goal setting. journal of applied psychology, 74(1), 24-33. doi:10.1037/0021-9010.74.1.24 estrada, c. a., isen, a. m., & young, m. j. (1994). positive affect improves creative problem solving and influences reported source of practice satisfaction in physicians. motivation & emotion, 18(4), 285-299. doi:10.1007/bf02856470 faul, f., erdfelder, e., lang, a.-g., & buchner, a. (2007). g*power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. behavior research methods, 39, 175-191. doi:10.3758/bf03193146 funke, j. (2003). problemlösendes denken [problem solving and thought]. stuttgart, germany: kohlhammer. funke, j. (2010). complex problem solving: a case for complex cognition? cognitive processing, 11(2), 133142. doi:10.1007/s10339-009-0345-0 funke, j. (2013a). human problem solving in 2012. journal of problem solving, 6(1), 2-19. doi:10.7771/1932-6246.1156 funke, j. (2013b). mit herz und verstand: schlüssel zu einer komplexen welt [with heart and mind: key to a complex world]. forschungsmagazin ruperto carola, 69(5), 3743. doi:10.11588/ruca.2013.3.11392 funke, j. (2014). problem solving: what are the important questions? in p. bello, m. guarini, m. mcshane & b. scassellati (eds.), proceedings of the 36th annual conference of the cognitive science society (pp. 493-498). austin, tx: cognitive science society. greiff, s., holt, d. v., & funke, j. (2013). perspectives on problem solving in educational assessment: analytical, interactive, and collaborative problem solving. journal of problem solving, 5(2), 71-91. doi:10.7771/1932-6246.1153 hayes, a. f. (2013). introduction to mediation, moderation, and conditional process analysis: a regression-based approach. new york, ny: guilford. huwyler, r. (2012). steigerung von zielbindung bei unangenehmen firmenzielen durch selbstmanagement [increasing goal commitment towards unpleasant company objectives through selfmanagement] (unpublished master’s thesis). university of st. gallen, switzerland. isen, a. m., daubman, k. a., & nowicki, g. p. (1987). positive affect facilitates creative problem solving. journal of personality and social psychology, 52(6), 1122-1131. doi:10.1037/00223514.52.6.1122 judge, t. a., bono, j. e., erez, a., & locke, e. a. (2005). core self-evaluations and job and life satisfaction: the role of selfconcordance and goal attainment. journal of applied psychology, 90(2), 257-268. doi: 0.1037/0021-9010.90.2.257 kanfer, r., & ackerman, p. l. (1989). motivation and cognitive abilities: an integrative/aptitude-treatment interaction approach to skill acquisition. journal of applied psychology, 74(4), 657690. doi:10.1037/0021-9010.74.4.657 kozlowski, s. w. j., & bell, b. s. (2006). disentangling achievement orientation and goal setting: effects on selfregulatory processes. journal of applied psychology, 91(4), 900916. doi:10.1037/0021-9010.91.4.900 kretzschmar, a., neubert, j. c., wüstenberg, s., & greiff, s. (2016). construct validity of complex problem solving: a comprehensive view on different facets of intelligence and school grades. intelligence, 54, 55-69. doi:10.1016/j.intell.2015.11.004 krohne, h. w., egloff, b., kohlmann, c.-w., & tausch, a. (1996). untersuchungen mit einer deutschen version der ’positive and negative affect schedule’ (panas) [investigations with a german version of the positive and negative affect 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 13 http://dx.doi.org/10.1080/02699930903223766 http://dx.doi.org/10.1024/0170-1789.24.3.175 http://dx.doi.org/10.1037/0022-3514.64.3.421 http://zrm.ch/images/stories/download/pdf/wissenschftl_arbeiten/seminararbeiten/seminararbeit_bruggmann_20090904.pdf http://zrm.ch/images/stories/download/pdf/wissenschftl_arbeiten/seminararbeiten/seminararbeit_bruggmann_20090904.pdf http://zrm.ch/images/stories/download/pdf/wissenschftl_arbeiten/seminararbeiten/seminararbeit_bruggmann_20090904.pdf http://dx.doi.org/10.1037/a0019130 http://dx.doi.org/10.1037/a0016096 http://dx.doi.org/10.1027/1614-0001/a000055 http://dx.doi.org/10.1016/j.intell.2011.06.004 http://dx.doi.org/10.1037/0022-3514.36.5.451 http://dx.doi.org/10.1037/0022-3514.36.5.451 http://dx.doi.org/10.1037/0022-3514.39.5.940 http://dx.doi.org/10.1037/0022-3514.39.5.940 http://dx.doi.org/10.1126/science.1121629 http://dx.doi.org/10.1111/j.1745-6916.2006.00007.x http://dx.doi.org/10.1037/0033-295x.95.2.256 http://dx.doi.org/10.1037/0021-9010.74.1.24 http://dx.doi.org/10.1007/bf02856470 http://dx.doi.org/10.3758/bf03193146 http://dx.doi.org/10.1007/s10339-009-0345-0 http://dx.doi.org/10.7771/1932-6246.1156 http://dx.doi.org/10.11588/ruca.2013.3.11392 http://dx.doi.org/10.7771/1932-6246.1153 http://dx.doi.org/10.1037/0022-3514.52.6.1122 http://dx.doi.org/10.1037/0022-3514.52.6.1122 http://dx.doi.org/0.1037/0021-9010.90.2.257 http://dx.doi.org/10.1037/0021-9010.74.4.657 http://dx.doi.org/10.1037/0021-9010.91.4.900 http://dx.doi.org/10.1016/j.intell.2015.11.004 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving schedule (panas)]. diagnostica, 42(2), 139-156. retrieved from https://www.researchgate.net/profile/boris_egloff/ publication/228079340_untersuchungen_mit_einer_deutschen _version_der_positive_and_negative_affect_schedule _(panas)/links/549969580cf21eb3df60cb28.pdf kuhl, j. (2000). a functional-design approach to motivation and self-regulation: the dynamics of personality systems interactions. in m. boekaerts, p. r. pintrich & m. zeidner (eds.), handbook of self-regulation (pp. 111-169). san diego, ca: academic press. kuhl, j. (2001). motivation und persönlichkeit: interaktionen psychischer systeme [motivation and personality: interactions of mental systems]. göttingen, germany: hogrefe. kuhl, j., & strehlau, a. (2014). handlungspsychologische grundlagen des coaching: anwendung der theorie der persönlichkeitssystem-interaktionen (psi) [behavioral psychology in the coaching context: applying personality-system-interactions (psi) theory]. wiesbaden, germany: springer. locke, e. a. (1996). motivation through conscious goal setting. applied and preventive psychology, 5(2), 117124. doi:10.1016/s0962-1849(96)80005-9 locke, e. a., & latham, g. p. (1990). a theory of goal setting & task performance. englewood cliffs, nj: prentice-hall. locke, e. a., & latham, g. p. (2002). building a practically useful theory of goal setting and task motivation: a 35-year odyssey. american psychologist, 57(9), 705-717. doi:10.1037/0003066x.57.9.705 masuda, a. d., locke, e. a., & williams, k. j. (2015). the effects of simultaneous learning and performance goals on performance: an inductive exploration. journal of cognitive psychology, 27(1), 37-52. doi:10.1080/20445911.2014.982128 meyer, b., grütter, j., oertig, m., & schuler, r. (2009). women’s underperformance in complex problem solving: stereotype threat in microworld performance. unpublished manuscript, department of psychology, university of zurich, switzerland. meyer, b., & scholl, w. (2009). complex problem solving after unstructured discussion: effects of information distribution and experience. group processes & intergroup relations, 12(4), 495515. doi:10.1177/1368430209105045 mone, m. a., & shalley, c. e. (1995). effects of task complexity and goal specificity on change in strategy and performance over time. human performance, 8(4), 243. doi:10.1207/s15327043hup0804_1 öllinger, m., hammon, s., von grundherr, m., & funke, j. (2015). does visualization enhance complex problem solving? the effect of causal mapping on performance in the computer-based microworld tailorshop. educational technology research and development, 63(4), 621-637. doi:10.1007/s11423-015-9393-6 pekrun, r., elliot, a. j., & maier, m. a. (2006). achievement goals and discrete achievement emotions: a theoretical model and prospective test. journal of educational psychology, 98(3), 583-597. doi:10.1037/0022-0663.98.3.583 pekrun, r., elliot, a. j., & maier, m. a. (2009). achievement goals and achievement emotions: testing a model of their joint relations with academic performance. journal of educational psychology, 101(1), 115-135. doi:10.1037/a0013383 schuler, p., & sandmeier, a. (2008). zrm auf der psychiatrischen station für jugendliche psj in brugg [zrm on the psychiatric station for adolescents (psj) in brugg]. unpublished manuscript. retrieved from http://zrm.ch/images/stories/download/pdf/ studien/studie_schuler_sandmeier_20081101.pdf schwarz, n., & skurnik, i. (2003). feeling and thinking: implications for problem solving. in j. e. davidson & r. j. sternberg (eds.), the psychology of problem solving (pp. 263-290). cambridge, uk: cambridge university. seijts, g. h., & latham, g. p. (2001). the effect of distal learning, outcome, and proximal goals on a moderately complex task. journal of organizational behavior, 22(3), 291307. doi:10.1002/job.70 seijts, g. h., & latham, g. p. (2011). the effect of commitment to a learning goal, self-efficacy, and the interaction between learning goal difficulty and commitment on performance in a business simulation. human performance, 24(3), 189204. doi:10.1080/08959285.2011.580807 seijts, g. h., latham, g. p., tasa, k., & latham, b. w. (2004). goal setting and goal orientation: an integration of two different yet related literatures. academy of management journal, 47(2), 227-239. doi:10.2307/20159574 seijts, g. h., latham, g. p., & woodwark, m. (2013). learning goals. a qualitative and quantitative review. in e. a. locke & g. p. latham (eds.), new developments in goal setting and task performance (pp. 195-212). new york, ny: routledge. spering, m., wagener, d., & funke, j. (2005). the role of emotions in complex problem-solving. cognition and emotion, 19(8), 1252-1261. doi:10.1080/02699930500304886 stauner, n. g. (2013). personal goal attainment, psychological well-being change, and meaning in life (doctoral dissertation). retrieved from https://escholarship.org/uc/item/3t34c68w storch, m. (2011). motto-ziele, s.m.a.r.t.-ziele und motivation [motto-goals, s.m.a.rt. goals, and motivation]. in b. birgmeier (ed.), coachingwissen (2nd rev. ed., pp. 185-207). wiesbaden, germany: springer. storch, m., gaab, j., küttel, y., stüssi, a.-c., & fend, h. (2007). psychoneuroendocrine effects of resource-activating stress management training. health psychology, 26(4), 456463. doi:10.1037/0278-6133.26.4.456 storch, m., keller, f., weber, j., spindler, a., & milos, g. (2011). psychoeducation in affect regulation for patients with eating disorders: a randomized controlled feasibility study. american journal of psychotherapy, 65(1), 81-92. retrieved from http://www.ingentaconnect.com/content/afap/ajp/2011/ 00000065/00000001/art00005 storch, m., & krause, f. (2014). selbstmanagement ressourcenorientiert. grundlagen und trainingsmanual für die arbeit mit dem zürcher ressourcen modell (zrm) [resource oriented self management. principles and training manual of the zurich resource model (zrm)] (5th rev. ed.). bern, ch: huber. storch, m., & olbrich, d. (2011). das gusi-programm als beispiel für die gesundheitspädagogik in präventionsleistungen der deutschen rentenversicherung [the gusi program as example for health education in preventative services of the german pension insurance fund]. in w. knörzer & r. rupp (eds.), gesundheit ist nicht alles was ist sie dann? gesundheitspädagogische antworten (pp. 111-126). baltmannsweiler, germany: schneider verlag hohengehren. tabachnick, b. g., & fidell, l. s. (2007). using multivariate statistics (5th ed.). boston, ma: pearson. temme, l. (2013). zielwirksamkeit als herausforderung für personalentwicklung eine empirische studie zu motto-zielen und hohen spezifischen zielen [goal efficacy as challenge for human resources development an empirical study investigating motto goals and high, specific goals] (unpublished master’s thesis). technical university of kaiserslautern, germany. watson, d., clark, l. a., & tellegen, a. (1988). development and validation of brief measures of positive and negative affect: the panas scales. journal of personality and social psychology, 54(6), 1063-1070. doi:10.1037/0022-3514.54.6.1063 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 14 https://www.researchgate.net/profile/boris_egloff/publication/228079340_untersuchungen_mit_einer_deutschen_version_der_positive_and_negative_affect_schedule_(panas)/links/549969580cf21eb3df60cb28.pdf https://www.researchgate.net/profile/boris_egloff/publication/228079340_untersuchungen_mit_einer_deutschen_version_der_positive_and_negative_affect_schedule_(panas)/links/549969580cf21eb3df60cb28.pdf https://www.researchgate.net/profile/boris_egloff/publication/228079340_untersuchungen_mit_einer_deutschen_version_der_positive_and_negative_affect_schedule_(panas)/links/549969580cf21eb3df60cb28.pdf https://www.researchgate.net/profile/boris_egloff/publication/228079340_untersuchungen_mit_einer_deutschen_version_der_positive_and_negative_affect_schedule_(panas)/links/549969580cf21eb3df60cb28.pdf http://dx.doi.org/10.1016/s0962-1849(96)80005-9 http://dx.doi.org/10.1037/0003-066x.57.9.705 http://dx.doi.org/10.1037/0003-066x.57.9.705 http://dx.doi.org/10.1080/20445911.2014.982128 http://dx.doi.org/10.1177/1368430209105045 http://dx.doi.org/10.1207/s15327043hup0804_1 http://dx.doi.org/10.1007/s11423-015-9393-6 http://dx.doi.org/10.1037/0022-0663.98.3.583 http://dx.doi.org/10.1037/a0013383 http://zrm.ch/images/stories/download/pdf/studien/studie_schuler_sandmeier_20081101.pdf http://zrm.ch/images/stories/download/pdf/studien/studie_schuler_sandmeier_20081101.pdf http://dx.doi.org/10.1002/job.70 http://dx.doi.org/10.1080/08959285.2011.580807 http://dx.doi.org/10.2307/20159574 http://dx.doi.org/10.1080/02699930500304886 https://escholarship.org/uc/item/3t34c68w http://dx.doi.org/10.1037/0278-6133.26.4.456 http://www.ingentaconnect.com/content/afap/ajp/2011/00000065/00000001/art00005 http://www.ingentaconnect.com/content/afap/ajp/2011/00000065/00000001/art00005 http://dx.doi.org/10.1037/0022-3514.54.6.1063 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 rohe et al.: goal setting and complex problem solving weber, j. (2013). „turning duty into joy!“: optimierung der selbstregulation durch motto-ziele ["turning duty into joy!": optimizing self-regulation through motto-goals] (doctoral dissertation). retrieved from https://repositorium.uni-osnabrueck.de/bitstream/urn: nbn:de:gbv:700-2014032712343/1/thesis_weber.pdf westfall, j., & yarkoni, t. (2016). statistically controlling for confounding constructs is harder than you think. plos one, 11(3), e0152719. doi:10.1371/journal.pone.0152719 winters, d., & latham, g. p. (1996). the effect of learning versus outcome goals on a simple versus a complex task. group & organization management, 21(2), 236250. doi:10.1177/1059601196212007 yerkes, r. m., & dodson, j. d. (1908). the relation of strength of stimulus to rapidity of habit-formation. journal of comparative neurology and psychology, 18(5), 459482. doi:10.1002/cne.920180503 10.11588/jddm.2016.1.28510 jddm | 2016 | volume 2 | article 3 | 15 https://repositorium.uni-osnabrueck.de/bitstream/urn:nbn:de:gbv:700-2014032712343/1/thesis_weber.pdf https://repositorium.uni-osnabrueck.de/bitstream/urn:nbn:de:gbv:700-2014032712343/1/thesis_weber.pdf http://dx.doi.org/10.1371/journal.pone.0152719 http://dx.doi.org/10.1177/1059601196212007 http://dx.doi.org/10.1002/cne.920180503 https://journals.ub.uni-heidelberg.de/index.php/jddm/10.11588/jddm.2016.1.28510 theoretical contribution the multiple faces of complex problems: a model of problem solving competency and its implications for training and assessment. andreas fischer1 and jonas c. neubert2 1department of psychology, heidelberg university and 2institute for economics, university of cottbus in this paper, we present a competency model for complex problem solving (cps) by building on the categories of knowledge, skills, abilities, and other components (ksao). we highlight domain-general and domainspecific components in each of these categories, review established conceptualizations of cps, and present a new model of cps competency that is meant to provide a starting point for systematic research on training and assessment. the model highlights the idea that complex problems differ with regard to the ksao components they demand from a problem solver and that performance in one problem does not necessarily predict performance in a different problem. implications for research on the training and assessment of cps competency are discussed, and a selection of well-established tests for various components of the ksao model is provided. keywords: complex problem solving, problem solving competency, ksao model the notion that the world around us is developing to-ward complexity has become something of a truism in recent times (see as far back as weaver, 1948). modern appliances release us from repetitive daily chores, such as washing the dishes or vacuuming our homes, as well as from routine work tasks, such as checking the usual suspects in a malfunctioning car engine or checking our spelling in a letter. in return, we are facing ever-increasing numbers of situations in which we need to handle these very appliances or generally deal with non-routine situations at work. consequently, an increasing amount of planning and problem solving is required: the dish washer needs to be programmed even when we have lost the manual, and an automotive technician has to analyze and understand an increasingly complex array of settings and errors in the electronic control unit of a car (e.g., related to the fine-tuning of an engine). as a result, problem solving competency1 is soaring in importance across occupations (e.g., neubert, mainert, kretzschmar, & greiff, 2014; spitz-oener, 2006). in 2012, one of the most important large-scale assessments in education, the oecd’s programme for international student assessment (pisa), therefore featured creative problem solving alongside its traditional tests of mathematics, science, and literacy. with the help of this additional problem-solving assessment of about 85,000 students from 44 countries around the world, the oecd tried to establish an empirical basis for suitable policies that could be applied to prepare students for the challenges facing them in the working world (oecd, 2014). similarly, information networks, such as the occupational information network (o*net) of the united states department of labor (http://www.onetonline.org/), introduced skills such as critical thinking or complex problem solving into their repertoire to account for the changing requirements in today’s jobs, thereby including new requirements in their standardized overviews of critical knowledge, skills, and abilities (national center for o*net development, 2009). while the world has been taking a closer look at how humans interact with complex problems, many conceptual questions have emerged: what are complex problems and, relatedly, what does it take to solve them? what are the major differences between someone who can solve an arbitrary problem effectively and someone who cannot? how can people who lack problem solving skills be trained? how can such skills be transferred to problems from different domains? questions such as these are related to the idea of complex problem solving competency (cps competency) and have important implications for training and assessment. in the paper at hand, we review different ways of conceptualizing cps competency, and we present a new model that can be applied to clarify the unique contributions of knowledge, skills, abilities, and other components (ksao) to the solution of different kinds of complex problem situations. thereby, we elaborate on a suggestion made by funke, fischer & holt (2015) to view problem solving competency “as a bundle of skills, knowledge and abilities, which are required to deal effectively with complex non-routine situations in different domains.” the descriptor ksao is borrowed from the industrial and organizational psychology literature where it is typically used to describe the requirements of different work situations with the help of competencies, a task that seems similar to our quest for a closer link between cps competency and actual complex problem situations (see, e.g., campion et al., 2011, for more details on ksao models in industrial and organizational psychology or peterson et al., 2001, for an exemplary and comprehensive ksao model). in order to systematically examine the nature of a cps competency from a coherent and unifying perspective, we have to specify what different complex problems have in common and, thus, what it takes to solve them. on the basis of fischer, greiff and funke’s (2012) theoretical framework for the process of solving complex problems, we decorresponding author: andreas fischer, heidelberg university, hauptstr. 47-51 69117 heidelberg (germany). e-mail: andreas.fischer@psychologie.uniheidelberg.de 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 1 mailto:andreas.fischer@psychologie.uni-heidelberg.de mailto:andreas.fischer@psychologie.uni-heidelberg.de http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems rived a model of cps competency that connects the complex problems of real life to the (psychological) constructs contributing to effective problem solving within and across a wide range of domains. in doing so, we offer a foundation for future research and for translating such research into practice. what are complex problems and why are they so hard to handle? according to duncker’s (1945) seminal definition, a problem arises when a person has a goal but does not know how to achieve it. this lack of knowledge might refer to the representation of the problem or it might refer to its goal-oriented application (dörner, 1979). in other words, there might be no known operator2 for reaching one’s goals, or an operator might be known to exist, but the operator might not be applicable in the current situation.3 in the example of a car’s electronic control unit, i might not have a good idea of how to deal with a problem concerning the timing of the ignition (i.e., lacking a suitable operator), or i might know that certain tools should be applied, but i might not have the necessary tools available. in both cases, a problem arises because i do not have a viable path by which to reach the desired goal state of repairing the ignition system. a problem is complex when multiple highly interrelated elements have to be considered in order to derive a solution (dörner, 1996; fischer et al., 2012; weaver, 1948). in a similar vein, the process of solving a problem can be considered complex when it involves multiple highly interrelated elements (e.g., multiple search spaces; klahr & dunbar, 1988; fischer et al., 2012). characteristically, as emphasized by funke (2003), in the process of solving complex problems, there are multiple interrelated goals to consider (a feature called polytely), for example, finding a solution to a car’s ignition problem that balances both the cost of additional parts and the time needed to solve the problem with their help. typically, solving complex problems also involves dealing with overwhelming amounts of information that is more or less relevant for solving the problem (i.e., complexity), for example, readouts from the engine control unit related to ignition timing and fuel injection but also service intervals, driving behavior, and the financial resources of the customer. furthermore, multiple effects of actions need to be considered (i.e., interconnectedness) because changes in the engine control unit can result in emission problems, legal risks, or problems with a supervisor to give just a few examples. finally, complex problems are typically characterized by incomplete knowledge about the status quo or the effect of interventions (i.e., intransparency) and the need for as well as the possibility of dynamically adapting one’s course of action at future points in time (i.e., dynamic decision making; cf. fischer, holt & funke, 2015), for example, when trying to find optimal ignition parameters in a car racing event when one needs to decide whether to repair the current engine or switch to a new one. the ksao-model of cps competency in the current paper, we argue that the features of complex problems give rise to a set of characteristic demands (funke, 2001). more specifically, we propose the ksaomodel of problem solving competency, consisting of different categories (knowledge, skills, abilities, and other components) and emphasizing both domain-general and domain-specific components in each category (see funke et al., 2015), for a more general perspective on cps competency). figure 1 illustrates how the components of cps competency are assumed to be related to cps performance. in our view domain-generality and domain-specificity are endpoints of a continuum with content-neutral cognitive structures (e.g., working memory) on one end and very specific knowledge stored in long-term memory on the other (funke et al., 2015). please note that the effect of the ksao components on cps performance will depend on various features of both the problems and the problem solvers (e.g., fischer et al., 2012; funke, 1991; hundertmark, holt, fischer, said, & fischer, 2015). for instance, the relation between intelligence and cps is known to depend on the situation’s transparency (cf. putz-osterloh & lüer, 1981) and on the problem solver’s prior knowledge (leutner, 2002). in the paper at hand, however, we will elaborate primarily on the ellipses in figure 1. that is, we will let future publications provide a detailed answer to the question of moderator variables (the interested reader should refer to funke, 2003, or süß, 1996, for a preliminary summary of empirical findings). 1) knowledge one of the most obvious and central determinants of cps performance is declarative knowledge (e.g., about means and ends; cf. funke, 2003). there is a long history of experiments and investigations into the role of knowledge and its relations to various aspects of (complex) problem solving (e.g., funke, 1985, kersting & süß, 1995, morris & rouse, 1985, see funke, 1992, and süß, 1996, for overviews). for example, kersting and süß (1995) investigated the construction of a content-valid knowledge test for a specific computer-simulated problem, differentiating knowledge from the simple recognition of relations to the prediction of strength and direction of numerical relations. relatedly, there is an even longer tradition in the area of industrial and organizational psychology and the analysis of work tasks and requirements also dedicated to the classification and comparison of knowledge in different (work) situations. for example, the classification of knowledge in the o*net content model is structured along domains of knowledge, such as business and management (e.g., areas of knowledge related to clerical procedures or sales and marketing), manufacturing and production (e.g., knowledge related to food production), or health services (e.g., medical knowledge, see constanza, fleishman, & marshallmies, 1999). this differentiation of knowledge requirements from the area of industrial and organizational psy1 in the remainder of this paper we will use the term cps competency to emphasize that in contrast to the simple research paradigms applied in laboratory research on problem solving (see funke, 2003) most problems in real life are complex (to varying degrees). 2 operators are actions that can be applied to transform a situation into a different one. they can be separated from tactics (i.e., chains of operators) and strategies (i.e., more general guidelines for when to apply which tactic or operator; cf. güss, tuason, & orduña, 2015). similarly, heuristics guide the problem solver toward specific operators in a given problem situation (e.g., gigerenzer & gaissmaier, 2011). 3 a lack of knowledge or specificity concerning goals (dörner, 1979) can also be subsumed under this kind of means-endanalysis-problem if knowledge acquisition is considered a means towards the end of solving the problem (cf. fischer, 2015a). 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 2 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems figure 1. the ksao model of cps competency: solid lines represent direct causal influences, dotted lines represent moderating effects. ellipses represent the different categories of cps competency, and solid rectangles represent manifest phenomena. we expect different components within each category to be relevant for different problem solvers or in problem situations (because of various kinds of moderator variables). with the exception of knowledge, the moderating influence of ksao categories – and their higher order interactions – have been omitted for visual clarity. chology can be helpful for understanding, differentiating, and comparing complex problem situations in very different work environments. for example, a comparison of knowledge requirements in different complex problems related to the engine control unit of a car can build on a problem’s relation to different knowledge areas (e.g., problems requiring knowledge about ignition parameters or emission regulations). knowing about the different kinds of knowledge involved in solving different problems is important for characterizing jobs, for predicting the performance of problem solvers (e.g., job applicants), and for training purposes. a small fraction of knowledge can be useful in a very broad range of complex problem situations: for instance, knowing about the definition of a complex problem and its characteristic features (i.e., cps-related concepts; schoppek & fischer, 2015), or knowledge about a range of exploration strategies can foster a systematic approach to solving such a problem. that is, knowing that complex problems typically involve one or more dimensions of evaluating potential goal states may help to explicate differences in goal states and to detect and resolve conflicts between the various dimensions of goals. human wisdom – knowledge and deep understanding of fundamental goals and how to reach them (fischer, 2015b), that is, knowledge about the “fundamental pragmatics of life” (baltes & smith, 1990) – can also be considered helpful for solving many kinds of complex problems because of its conceptual relation to balancing multiple interests, contextualizing action, and managing uncertainty (fischer, 2015a). however, most of the knowledge that is relevant for solving complex problems is highly domain-specific, for example, as knowledge about the analysis of engine problems is most certainly irrelevant for many cases of career planning (see also tricot & sweller, 2013). more specifically, basic knowledge about relevant problem characteristics is required in most (complex) problem settings, for example, knowledge about functional sub-elements, such as those that control fuel injection and spark timing is required for the application of basic exploration strategies, such as votat (see below). in turn, this basic knowledge is strongly linked to the domain at hand (e.g., the different sub-components of a car). furthermore, outside of very restricted and formalized problem-solving settings, prior knowledge of available operators is helpful or might even be a prerequisite for (complex) problem-solving attempts; when exploring problems in cars without electronic engine control units, “connecting the service computer” is not an option for obtaining information about an ignition problem, but it might be a prerequisite in other cases (e.g., when dealing with problems related to a modern car’s emissions). again, knowledge of available operators is strongly linked to the problem’s domain: if a person has no idea about the availability of an operator, he or she will not be able to use it. research comparing experts and novices offers important insights in this regard (e.g., chi et al., 1981, 1982). for example, domain-specific expertise seems to alter the ways in which (complex) problems are conceptualized on a very basic level (e.g., leading to the use of different categories of problems; see also below). what seems clear is the paramount role of knowledge for solving complex problems (see greeno, 1997, for a critical view on the general notion of separating knowledge from the situation it is utilized in). consequently, differentiating the knowledge requirements of different complex problems appears to be a worthwhile endeavor for cps research. 2) skills besides knowledge about concepts, solving complex problems requires skills that will allow the knowledge to be put into practice (cf. anderson, 1987). in contrast to “knowing 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 3 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems that”, it is also important to “know how” to apply heuristics, strategies, tactics, or operations the right way and at the right moment in time (dörner, 1986; 1996; güss et al., 2015; see süß, 1996, for a discussion). in ksao models from the area of industrial and organizational psychology, skills are closely associated with the level of proficiency needed to perform a given task (e.g., peterson et al., 2001). for example, the o*net content model mentioned above features a skill taxonomy that differentiates between two “higher-order constructs”, namely, basic skills (facilitating learning, e.g., reading, writing, and critical thinking) and cross-functional skills (facilitating performance across domains, e.g., social skills and resource management skills, see mumford, peterson, & childs, 1999, and peterson et al., 2001, for more details). with respect to cps competency, these skill taxonomies offer a way to compare different complex problem situations with regard to their requirements in terms of skills. complex problems in the context of repairing car engines differ in terms of their reliance on written materials (i.e., requiring a certain amount of reading comprehension) or in the degree to which theybenefit from a high level of social skills (e.g., exploring customer needs). some skills can be applied or adapted to a broad range of domains, such as social skills (weis & conzelmann, 2015), instance-based decision making (gonzalez, lerch, & lebiere, 2003; dutt & gonzalez, 2015) or scientific procedures such as the systematic testing of hypotheses. these kinds of skills can be applied in the case of the automotive mechanic but also when looking for factors that influence career development. other skills are related to a smaller set of domains, for example, calculating the cost of changing a part in the car on the basis of the cost of labor, the cost of the part, and other general costs such as rent or even a single domain (e.g., operating a specialized tool to read out the engine parameters from the car). the link or dissociation between declarative knowledge (see above) and procedural skills has been the subject of empirical research (see süß, 1996, for an overview). what seems important here is the consideration of both declarative knowledge and the translation into problem solving via skills when trying to analyze complex problem performance in given situations as they are potentially dissociated (see also nickolaus, 2011). that is, assessing declarative knowledge in a given domain of cps can lead to a false impression of cps competency when procedural aspects are not accounted for (as in the experiments conducted by berry & broadbent, 1988, see also süß, 1996). again, the link to research on expertise offers valuable insights here. for example, research on differences in problem solving between experts and novices has shown that an understanding of important characteristics of a problem situation is one of the key factors that differentiates expert problem solvers from beginners (i.e., experts focus on the problems’ “deep structure” as opposed to superficial features, e.g., larkin, mcdermott, simon, & simon, 1980, chi, feltovich & glaser, 1981, chi, glaser & rees, 1982). furthermore, research on the potentially adverse effect of educational interventions in the case of misaligned prior experience in learners (the so-called expertise reversal effect; e.g., kalyuga, ayres, chandler, & sweller, 2003; rey & fischer, 2013) warns conceptions of accounting for prior knowledge and skills that are too simplistic. naturally, such (procedural) knowledge is strongly bound to a domain of application, even though we expect that expertise in one field of complex problems will be helpful for shaping a specific (and more or less helpful) perspective in a different domain (see beckmann & guthke, 1995, for a more nuanced view). skills that are relevant in current assessments of cps – such as the (more or less systematic) application of the strategy “vary potentially influencing factors in isolation” to find out about relations between problem elements (e.g., the votat strategy; wüstenberg, stadler, hautamäki, & greiff, 2014) – fall somewhere in between the two extremes, being relevant in a range of problem situations, for example, in the context of scientific experiments from a variety of domains (e.g., scherer & tiemann, 2012). at the same time, there are certainly limits to the applicability of this type of skill to other cases of complex problems (see also tricot & sweller, 2013). the automotive technician is well-advised not to apply that kind of strategy to explore the dynamics of frustration in her marriage because the reversibility of actions is rather low (even though the situation exhibits features of a complex problem, e.g., intransparency, multiple goals, interrelatedness, and dynamics). furthermore, the recognition of suitable problem situations for the application of the operator is already strongly dependent on domain-specific knowledge structures (i.e., procedural knowledge related to the structure of the problem, i.e., the previously mentioned role of expertise in identifying the relevant cues in a situation). taken together, different complex problem-solving situations will require the problem solver to apply very different skills. and similar to declarative knowledge, there is a broad body of research that has explored skills across a wide range of settings from entrepreneurship (e.g., gustafsson, 2006) to car mechanics and office management (nickolaus, 2011). nonetheless, we see skills as an important element when looking at cps and its assessment. relatedly, the question of the skills that are relevant across complex problems, their influence, transferability, and domainspecific aspects seem promising roads for future research (e.g., classes of skills that connect complex problems across different domains). 3) abilities the utilization of knowledge and skills to deal with a complex problem situation generally rests on the fundamental abilities of the problem solver. abilities can be understood as “enduring capacities for performing a wide range of different tasks” (peterson et al., 2001, p.457). many abilities can be assumed to be relevant for successfully handling complex problem situations, and in the following, we highlight the role of some exemplary ones. comparably, the o*net content model features workers’ abilities as part of their framework, building strongly on the work of fleishman and the related taxonomy and measurement system (the fleishman-job analysis survey; fleishman, 1992; fleishman, constanza, & marshall-mies, 1999). in this classification, abilities are classified along four higherorder dimensions: cognitive abilities (e.g., deductive reasoning), psychomotor abilities (e.g., wrist-finger speed), physical abilities (e.g., stamina), and sensory abilities (e.g., peripheral vision; see peterson et al., 2001, and fleishman et al., 1999, for details). with regard to cps, there are certain abilities that, although they have been the focus of a wide range of empirical research, still deserve special attention from researchers and practitioners. on a basal level, general intelligence and the latent factors that are commonly used to define it 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 4 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems are important prerequisites for solving problems efficiently (süß, 1999). the importance of intelligence for (complex) problem solving comes as no surprise as definitions of intelligence have often included problem solving as a prominent part of the definition (sternberg, 1982). for instance, complex problems by definition require a large amount of information to be considered simultaneously (fischer, 2015a) and thus require the capacity to store and process information (i.e., memory). in the example of the ignition system, being able to keep in mind the characteristics of the engine control unit (e.g., fuel injection settings, ignition timing) is a prerequisite for handling the problem situation systematically. if the problem solver lacks the memory capacity to store these basic solution characteristics, a recourse to more heuristic strategies might be necessary, thereby completely changing the interaction with a complex problem situation. in the same vein, higher levels of reasoning ability can be required to transfer knowledge and strategies to a new problem situation or to analyze the suitability of a specific operator for a specific complex problem (e.g., süß, 1996; wittmann & hattrup, 2004). more specifically, only problem solvers with a certain level of reasoning ability will be able to apply the necessary basic (cognitive) operations to a given problem. süß (1996) provided a comprehensive collection of experimental findings concerning how the role of intelligence depends on various problem-solving characteristics, including, for example, specific predictions that were based on the novelty and the difficulty of the problem (i.e.,the increased importance of intelligence in the medium range; raaheim, 1974, hussy, 1985, see süß, 1996). also by definition, problems require deviations from business-as-usual. thus, creativity with regard to option generation or divergent thinking is also conceptually related to cps (kretzschmar, neubert, wüstenberg, & greiff, 2016). if the route to the desired goal state is already crystal clear, we are not faced with a real problem situation but rather a routine activity (funke, 2003). luckily for cps research, a host of research has investigated the roles of working memory capacity, reasoning ability, and other basic abilities in educational settings (e.g., adey, csapó, demetriou, hautamäki, & shayer, 2007). consequently, their roles in acquiring new knowledge and expertise in specific fields and even some examples of complex problems are well researched (e.g., ackerman, 1992). similarly, the role of intelligence for cps competency has been the target of empirical research throughout the history of cps research (e.g., kretzschmar et al., 2016). besides these general abilities that can be considered necessary or helpful across all complex problem situations, there are also abilities that are relevant for complex problems of a certain kind only: for example, some complex problems are characterized by a large degree of time pressure, and thus, cognitive speed is an issue for such problems, whereas other problems require the ability to work adeptly with one’s hands (i.e., the availability of operators depends on a specific degree of psychomotor ability). as in the case of skills (being more or less relevant across complex problem situations), we also see the potential of looking at ways to organize complex problems according to the abilities required for solving them (e.g., domains of complex problems that share similar requirements in terms of reasoning ability or physical strength). furthermore, the consideration of basic abilities is also relevant when exploring the malleability of complex problem solving performance. if solving a specific type of complex problems depends to a large extent on a specific ability (e.g., spatial reasoning), the effectiveness of training interventions directed toward advancing problem-solving will be underestimated if interindividual differences in basic abilities are not taken into account (see e.g., the studies by ackerman, 1992, on air traffic controllers). that is, if a certain kind of training strengthens the knowledge and skills needed to handle a specific complex problem but participants fail to handle it because their ability levels are insufficient (e.g., a lack of hand-eye coordination), the training might be prematurely dismissed. 4) other as the world of complex problems is as heterogeneous as our world at large, some other factors deserve consideration in relation to cps. the “other”-category of the ksao model is meant to be a category for all the requirements that can potentially arise in cps situations but are not contained in the previously mentioned categories (e.g., having a license to practice as a doctor or being able to handle emotional stress). the explicit inclusion of an “other”category thereby serves as a route for integration and as a reminder of the factors that are potentially at least as important as the ones presented above. most important, having a problem implies the frustration of goal achievement, and thus, frustration tolerance is, by definition, an important factor in every instance of cps (e.g., funke et al., 2015). in a similar vein, a positive attitude toward problems is an important prerequisite for solving them (d’zurilla & nezu, 2007) – for instance, given that a problem can be solved, viewing the problem as inevitable, challenging, and solvable is clearly preferable to viewing it as something undesirable, threatening, or that should not have happened in the first place. as solving complex problems requires a systematic and unbiased consideration of options as well as a monitoring of consequences, cognitive reflection (toplak, west, & stanovich, 2011) – a construct at the border between cognitive ability and cognitive style – might also foster cps in general (e.g., donovan, güss, & naslund, 2015). it is related to avoiding cognitive biases beyond intelligence and a wide range of other variables (toplak et al., 2011). further, as initial assumptions about complex problems are always incomplete and often false, openness to experience and learning motivation – in addition to a sufficient amount of achievement motivation – might also be helpful to varying degrees (cf. greiff & neubert, 2014). besides these highly domaingeneral aspects, domain-specific motivation, interest, and willingness are important aspects that are involved in every attempt to solve a problem. furthermore, there is a host of influencing factors relevant in some complex problem situations that lie outside the scope of cognitive considerations (e.g., formal qualifications). for example, the availability of operators might depend on formal qualifications or financial resources, thereby completely altering the problem situations for a subgroup of problem solvers. in summary, the ksao model of cps competency offers a way to systematically look at complex problem situations and highlights relevant categories of prerequisites for problem solving (see figure 1). at the same time, the model also points toward the need to include a broad array of components within each of these categories to capture an accurate picture of cps competency. even more, due to its alignment with models from the area of industrial and organizational psychology, the ksao model also points toward possible ways to link cps research to the insights that 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 5 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems have been gained in specific fields of application. how is the ksao model related to previous conceptualizations of cps competency? in the research literature, one of the most prominent positions is to assume that performance in complex problem situations primarilly depends on a small set of domaingeneral skills that determine problem-solving performance across different problems (e.g., greiff, wüstenberg, & funke, 2012). if this assertion of a stable set of cps skills is accurate, a reliable and valid assessment of this set of skills could be used to estimate a person’s level of competency in dealing with complex problem situations in general (e.g., handling complex problems related to career development, leadership, and the domain of mechatronics engineers; neubert, mainert, kretzschmar, & greiff, 2015). an alternative perspective questions this conception of cps competency and points toward the multitude of influences and requirements in educational and everyday problem situations (e.g., tricot & sweller, 2013). this perspective implies that the competency to handle complex problems successfully depends to a large degree on domainspecific knowledge and expertise that differs across complex problems from different domains. that is, proponents of this second perspective highlight the heterogeneity of requirements that complex problem solvers face and the specificity of knowledge. a third and somewhat conciliatory perspective – and the foundation of our ksao model – regards cps as a product of a combination of domain-general facets that are relevant across complex problems (e.g., intelligence) and domainspecific facets (e.g., problem-specific knowledge; e.g., süß, 1996, 1999) with the importance of these elements differing in accordance with the problem situation at hand (fischer, 2015a; funke et al., 2015). the third perspective thereby points toward an array of influences and constructs that work together when individuals deal with complex situations, but this perspective simultaneously acknowledges the relevance of constructs involved across complex problems (see also funke, 2010; süß, 1999). the ksao model includes certain highly domaingeneral constructs (e.g., intelligence), but it also emphasizes that the relevance of these constructs differs between problems. furthermore, it acknowledges the importance of domain-specific knowledge, skills, and even some abilities for solving specific complex problems. table 1 presents an exemplary list of problems that might occur while working on the fuel ignition system of a car and highlights the role of ksao components for analyzing and solving these problems. while the strategy of pressing some random buttons in the case of the dish washer might actually lead to some improvement in our understanding and, subsequently, the quality of the dish washing, we surely hope our automotive technicians have some more elaborate problem-solving strategies at their disposal, as a trial-and-error strategy for knowledge acquisition seems rather unsuitable. the complexity of the car engine problem might otherwise lead to very unfortunate situations where problematic side effects of operators are recognized at high speed on a motorway long after the car has been returned to the customer. it is easy to see how the picture becomes fairly complicated if situational and personal variables are taken into account. implications no matter which position is taken on how to map cps competency to performance or to the specific requirements of complex problem situations, cps competency implies everything a person needs to solve the complex problem(s) at hand (e.g., cognitive abilities, skills, self-regulation, motivation, knowledge of appropriate strategies, and more; weinert, 1999; greiff, wüstenberg et al., 2013; schoppek & fischer, 2015). however, the different perspectives have different implications for cps competency and for how it is related to cps performance and to the various factors this performance depends on. for instance, according to the ksao model, the mere concept of cps competency itself does not imply that these factors (the ksao components) are empirically correlated (funke et al., 2015; schoppek & fischer, 2015). more specifically, different ksao components are assumed to make unique contributions to performance in several complex problem situations. in line with this implication, wittmann and süß (1999) reported that problem-specific knowledge and reasoning ability made significant unique contributions to performance in three heterogeneous complex problems. also, greiff & neubert (2014) reported that reasoning, problem solving skills (measured by the microdyn test, see funke, 2010; fischer, 2015a), and computer anxiety each uniquely contributed to the prediction of grade point average. not all of the components of the ksao model are likely to be correlated (in a way that would allow for a single underlying cps factor to emerge), but of course, different components of the ksao model can be assessed reliably as there are well-established, reliable, and valid tests for several components of the ksao model (see table 2 for an exemplary list of tests). with regard to the requirements of problems, we expect a representative sampling of problem situations to reveal fundamental differences between groups of problems with regard to the ksao components required – a heterogeneous set of homogenous clusters (cf. fischer, 2015a). in line with this implication, empirical studies have repeatedly shown that the correlations between measures of performance in multiple heterogeneous complex problems are rather weak and not substantially different from zero (e.g., schaub, 1990; süß, 1999), a finding that might imply that they tap into different subsets of cps-related constructs (schoppek & fischer, 2015). please note that performance in multiple homogenous problems tends to converge on a latent construct that is separable from other potential confounding variables (e.g., reasoning ability) and is related to external criteria (e.g., school grades). for example, süß (1996) reported that performances in multiple trials of managing a simulated tailorshop were substantially correlated. similar results have been reported for exploring and regulating multiple complex systems based on formal frameworks (e.g., fischer et al., 2015; greiff, fischer et al., 2013; kretzschmar, et al., 2016). another implication of the ksao model is that the effects of various ksao components on cps performance are not constant across different complex problems, but rather, such effects are variable because of moderating variables (e.g., features of the problem, the problem solver, or the problem environment; cf. funke, 1991). for instance, the correlation between intelligence and performance in the complex problem of managing a tailorshop is known to depend on the transparency of the system structure (putzosterloh & lüer, 1981). some implications of the ksao model have been less 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 6 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems table 1. exemplary application of the ksao model of cps competency to the fuel ignition problem. problem facet domain-generality and transfer knowledge which settings of the electronic engine control unit are related to the emissions of the engine? finding the right settings requires declarative (world) knowledge of typical cues, such as emission parameters, e.g., from having prior experience with similar devices. low (world knowledge relevant in similar situations) alternatively, declarative knowledge about a suitable strategy might be a helpful point of departure for exploration. high (exploration strategy can be applied in a range of problems) skills finding out which combination of settings results in an acceptable combination of acceleration, gasoline consumption, and emission parameters. different combinations of input settings need to be evaluated. to this end, specific skills come in handy, such as checking the results of different parameter combinations(e.g., procedural knowledge related to theevaluation of charts and tables). low to medium (specific skills related to the evaluation of results) when trying out different combinations, the task also requires the application of a range of motor skills (e.g., hand-eye coordination, for example, whenmanually adjusting the settings of the engine in an old car without an electronic engine control unit). high (motor coordination, handeye coordination are relevant in most problems involving manual manipulations) abilities which combination exhibited the best overall quality? the evaluation of results requires the recall of information from long term memory as well as keeping in mind the outcomes of different attempts (i.e., requiring a certain amount of working memory). high (working memory capacity is relevant in all problems that require the comparison of outcomes) other how should the problem solver react when the testing equipment shows only erratic behavior (e.g., a malfunctioning connector between computer and car)? if an attempt at problem solving goes wrong, and the testing equipment exhibits erratic behavior, a positive attitude toward the problem and the belief that a solution is possible is necessary in order to decide not to give up. high (positive attitude toward problem solving should help in addressing difficult problems in general) researched but could be put to the test in future studies. one of these implications is that the correlation between performance in two complex problem situations depends on their similarity regarding the required ksao components and moderating variables: in general, performance should be correlated if both problems pose similar requirements (and are presented to similar samples of problem solvers in similar environments). more specifically, it should be possible to predict the correlation of performance in two complex problems by the degree of overlap between the ksao components that are required. if a person with certain abilities, skills, knowledge, and other features solves two reliable problems, systematic differences in performance should reflect differences in the requirements of the problems. analogously, if two people solve the same reliable problem and differ with regard to a single ksao component only (e.g., intelligence), systematic differences in performance should reveal that this ksao component is required to solve the problem. how can those who do it right be identified? after introducing ther ksao model, it seems vital to discuss some implications for the assessment of cps competency in more detail. naturally, conceptualizing cps competency as described above also has implications for the identification of interindividual differences in cps. for many (highly domain-specific or comparatively domain-general) components of the ksao model, wellestablished tests have been proposed, that can serve as a point of departure for the assessment of the different ksao components. table 2 lists some of the most important constructs that we have mentioned so far (along with suggested tests for each aspect). it is important to note that the components highlighted in the different categories of cps competency in table 2 are by no means exhaustive. accounting for the heterogeneity of different complex problems and the challenges they pose to problem solvers with different levels of proficiency will require to look at both the problemand domain-specific aspects of complex problems, as well as efforts to connect different complex problems and problem solvers. with regard to assessments of cps competency as a whole, some researchers have proposed that multiple heterogeneous problems should be applied (süß, 1999; wittmann & süß, 1999, neubert et al., 2014), whereas others have proposed that multiple homogenous problems should be applied (greiff et al., 2012). it is important to note that in both cases, the idea is to measure the level of competency as the average level of performance across multiple problems – and to view problem solving competency as a “reflective construct” that is reliably indicated by performance in different problems. this approach assumes that performance in different complex problems is correlated (because of the common influence of cps competency), an idea that seems to be at odds with some of the empirical evidence reported above (e.g., a lack of correlations between heterogeneous problems). however, a lack of correlation between heterogeneous problems in empirical studies does not tell us anything about the number of competent persons who solved all kinds of problems, whether a subgroup of the problems 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 7 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems table 2. the following table summarizes some examples of knowledge, skills, abilities, and other constructs relevant for cps. ksao component type suggested test description world knowledge (about problem solving & fundamental pragmatics of life) k berlin wisdom paradigm there are a range of assessment instruments targeting (declarative) knowledge, both, on a general level (i.e., general world knowledge), such as the bochum knowledge test (bowit, hossiep & schulte, 2008) domain-specific knowledge (occupational) k knowledge tests developed in the area of vocational education, task specific work analyses or the berlin wisdom paradigm (baltes & staudinger, 2000) and also domain-specific knowledge tests ranging from classical knowledge assessments in primary education to office administration and engine problems (see for example nickolaus, 2011, for an overview domain-specific knowledge (educational) k classical tests from (large-scale) educational testing in the area of vocational education). adapt plans and hypotheses to feedback s microdyn / microfin compared to the assessment of (declarative) knowledge, the case is more difficult for skills. developments towards the computer-based assessment of competencies in vocational education might offer domain-specific problem solving skills (vocational education) s domain-specific skill assessment (see nickolaus, 2011) some relief in terms of providing assessment instruments directed at domain-specific skills for a range of vocations (nickolaus, 2011, see neubert et al., 2015). similarly, the assessment of skills in the domain-specific problem solving skills (school assessments) s domain-specific skill assessment (see oecd, 2013) classical domains of school (e.g., mathematics and science education) have seen a rise in importance (e.g., in the context of large-scale assessments, such as pisa, oecd, 2014). reasoning a bis assessing basic human abilities is one of the bedrocks of modern psychological assessment, so instruments creativity a bis, unusual uses test, option generation indicating for example an individuals’ general mental ability or working memory capacity are readily implicit learning a artificial grammar available and well-established (e.g., the berlin structure intelligence test, jäger, 1984). cognitive reflecton o crt like the complex problems in our world, the assessment of the “other”-category is very heterogeneous. openness to experience o neo-ffi while there are established assessment instruments for some of the potentially relevant factors falling learning motivation o fam under this category, such as those connected to personality dimensions (e.g., openness to experience is achievement motivation o ami included in the five-factor model of personality), other constructs lack appropriate and well-validated frustration-tolerance o dts assessment instruments (e.g., ethical reasoning; however see lind, 2000). note. ksao component: exemplary component for the different categories of influencing factors within the ksao model of cps competency; type: related category of the respective component in the ksao model: k = knowledge, s = skills, a = abilities, o = other influencing factors; suggested test: microdyn / microfin: microdyn / microfin test of cps skills (neubert et al., 2014); bis = berlin intelligence structure test (jäger, süß, & beauducel, 1997); unusual uses test (guilford, merrifield, & wilson, 1958); option generation (johnson & raab, 2003); artificial grammar(mackintosh, 1998)); crt (toplak et al., 2011); neo-ffi = assessment of the five factor model of personality (costa & mccrae, 1992), fam (rheinrheinberg, vollmeyer, & burns, 2001), ami = achievement motivation inventory (schuler, thornton, frintrup & müller-hanson, 2004), dts = disstress tolerance scale (simons & gaher, 2005). 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 8 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems had similar requirements for some of the problem solvers, or “how” participants succeeded or failed at solving them (i.e., which were the critical components; see schoppek & fischer, 2015). a lack of correlations implies only that knowing that a person solved one problem is not informative for predicting whether that person will solve the other problems as well. in our view, a competent person is able to solve a wide range of problems by virtue of the components of the ksao model (whether or not performance in these problems is correlated in samples of cps novices). different problems will pose different demands on problem solvers, but a person with high levels on each of the ksao components is likely to solve a wide range of problems (e.g., not only those from the domain of fuel ignition but also those related to career development). each component of cps competency might be measured reliably, but cps competency itself – as suggested by schoppek and fischer (2015) as well as by funke et al. (2015) – does not seem to be a reflective construct (i.e., a latent construct that produces high levels on each of the components). based on the empirical evidence reported above, we argue that cps competency might be better conceptualized as a set of ksao components or – if a single score is preferred – as a formative construct (defined by high levels on each of the components; cf. edwards, 2010, bollen & bauldry, 2011). in future studies, depending on the goals of researchers or practitioners, ksao components can be studied in isolation, or they can be aggregated into a formative construct. furthermore, building on a coherent framework of cps competency facilitates a combination of insights from different studies via the ksao model even when the complex problems are very different (e.g., requiring different amounts of prior knowledge in a domain while sharing a reliance on social skills and creativity). discussion a host of problems in our daily lives are complex. we identified a wide range of knowledge, skills, abilities, and other factors, that are relevant for solving a wide range of complex problems (see table 2), but we also highlighted domain-specific aspects that are important for certain subsets of complex problems only. further, we emphasized that the domain-generality of each component might depend on several moderating variables (see figure 1). previous discussions of cps competency have either proposed a unitary conception of cps, highlighting the role of a single set of important skills for handling all complex problems (perspective 1) or denied the educational relevance of domain-general factors (perspective 2)4 . in this paper, we argued for a middle ground between these two extremes (perspective 3) and proposed a model of cps competency that might offer a point of departure for explaining differences and commonalities in performance for different complex problems. beyond a focus on the reliable assessment of important components, there are also more practical arguments for the use of competency models in cps research, such as the promotion of communication in applied settings, the facilitation of a developmental perspective, or the potential to strategically align individual resources with organizational needs (see campion et al., 2011). the heterogeneity of requirements included in the ksao model of cps competency (see table 2) as well as the empirical findings reported above should have made clear that a unitary conception of cps is probably not a realistic option, even just for capturing the differences between problem solvers. nevertheless, (cor-)relations between the components of cps competency should be explored in future research in order to determine which components are best suited for assessment purposes in different settings (and which components are redundant in situations of interest). at the same time, the new model of competency offers a way to compare and identify the requirements that overlap between the different domains of complex problems as these can be compared beyond domain-specific conceptions of competency (e.g., with regard to how to handle multiple goals or dynamically developing situations; e.g., schoppek & fischer, 2015). relatedly, for most practical purposes, it will not be feasible or necessary to assess all of the components of the ksao model. instead, it might be sufficient to assess the subset of ksao-component that is most relevant to the problems of interest. the ksao model thereby allows a systematic search of relevant components to be conducted, for example, when handling complex engine problems. in contrast to previous attempts to connect the domaingeneral and domain-specific elements of cps (e.g., süß, 1996), the ksao model allows for the integration and specification of a broader range of additional factors (e.g., including factors such as self-regulation in the context of specific domains of problems or manual skills related to a group of work tasks), thereby going beyond the combination of intelligence and knowledge, for example, as proposed by süß (1996). these additional factors might be important for better predicting (or systematically fostering) human performance across a wide range of complex problems. last but not least, while previous research on cps has primarily tried to identify the overlap or distinctiveness of cps skills with regard to intelligence or reasoning ability, the ksao model offers a route by which theoretical and empirical integration can occur: skills and abilities are different categories, and both of them are relevant for solving complex problems – in most (but not all) circumstances. intelligence and reasoning – among others – are important prerequisites for many instances of cps and are thus part and parcel of cps competency. nonetheless, there will be other components of cps competency that are much more relevant than intelligence for the handling of subgroups of complex problems (e.g., strategic knowledge; strohschneider & güß, 1999). limitations and concerns the proposed ksao model of cps competency offers some guidance in terms of factors to account for when looking at complex problem solving. it also can be applied as a framework by which to integrate different factors that are relevant in sets of complex problem situations. nonetheless, there are limitations to both the model itself and the explanations and examples we offered here. the roots of the ksao model in an applied setting of assessment, namely, that of industrial and organizational psychology, at the one hand, support a closer connection 4 the argumentation is a bit more nuanced with respect to the perspective 2 camp, as the relevance of basic abilities (e.g., working memory capacity, intelligence) for complex problem solving is well acknowledged. 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 9 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems and transfer of insights between cps research and the respective fields of application. for example, working with a similar distinction of important factors will facilitate the transferability of insights (e.g., from the work and task analysis literature) into the field of cps research. more specifically, we expect benefits from the use of established methods for characterizing complex problem situations and their requirements in specific work environments. at the same time, this close connection to an applied setting also weakens the link between cps competency and classical cognitive architectures of human functioning. that is, the model offers no explanation for or specificytion of the processes that integrate the different constituting factors, their relation to each other, or a coherent underlying model of human cognition and action (e.g., specifying the link between basic abilities and knowledge). for instance, we did not specify in detail the set of moderator variables that are likely to determine which complex problems are correlated with each other (as a result of similar requirements regarding the ksao components) or with the components of the ksao model (but see funke, 2003). a topic that is up for debate is whether this integration with cognitive science can be achieved in future research within the framework of the ksao model or whether different models that are more closely aligned with classical cognitive architectures5 or more detailed theories of human action in the work place (e.g., the “handlungsregulationstheorie” of winfried hacker, 1973) are necessary to account for these drawbacks. furthermore, the nature of the ksao model of cps competency builds on the (culturally embedded) concept of “competency”. in the context of vocational education, brockmann, clarke, and winch (2008) differentiate between a “skills-based model” of competency in the anglosaxon context and a “knowledge-based model” in germany, the netherlands, and france. while the “skillsbased model” focuses on learning outcomes and their certification in view of a specific work task without a close connection to the knowledge base, the “knowledge-based model” is oriented toward understanding the inputs to learning in order to build a broader conceptualization of (vocational) education (i.e., “berufsbildung”, with an emphasis on a holistic view of education). we can only speculate on whether and how this distinction will be relevant for the discussion of cps competency, although we suspect that a narrow and unitary view on cps competency makes sense only in the context of a “skills-based model” of competencies. highlighting the role of domain-specific components (e.g., knowledge about dish washers and fuel ignition systems) and broader components (e.g., human wisdom; see above) will offer a way to differentiate between the two perspectives in cps research and will facilitate an exploration of consequences for learning and instruction. what seems important to us is the necessity to account for various assumptions that underlie the construction and application of assessment instruments by researchers from various backgrounds, as well as the mental construction of situations by individual problem solvers (e.g., roth, 1998). in our view, the ksao model of cps competency offers a point of departure for future discussions along these lines. finally, there is the question of the practical relevance of the proposed ksao model: the notion of a modular cps competency that integrates the requirements of specific problem situations naturally leads to the question of the empirical relevance of its sub-components (and in this way, the utility of a modular conception of cps competency itself). that is, the question arises as to whether the integration of additional constructs (in addition to the elements already included in unitary conceptions of cps or intelligence theories) will offer additional value in predicting cps performance at large – and under what circumstances (i.e., the external validity of tests might well differ between different samples; e.g., fischer et al., 2015). the literature on the empirical relevancy of the sub-components of intelligence in predicting success across a range of job-related performance indicators warns of expectations that are too enthusiastic to say the least (see e.g., schmidt, ones, & hunter, 1992). time and again, the power of specific subcomponents has been shown to exhibit only marginal increments over indicators of general mental ability when predicting job or training performance (e.g., brown, le, & schmidt, 2006). similar findings might be expected for the case of cps competency. for the purposes of training cps competency (see kretzschmar & süß, 2015), the notion of a modular cps competency provides a way to look at the actual reasons for why some individuals fail when attempting to handle specific groups of complex problems, where they are potentially in need of further training or support, and in which area to look for further expertise when trying to set up such interventions (e.g., teaching explicit and domain-specific knowledge vs. selecting only individuals above a certain threshold of memory capacity vs. training the transfer of skills to a new domain; see mcclelland, 1973, for similar earlier arguments in favor of competency models). naturally, there is the need to actually deliver empirical evidence for this additional utility of a modular cps competency, and we can only point to future research for that purpose. conclusion according to dörner (1996), who initiated research on cps in the late 1970s, there are manifold failures that can be observed in attempts to handle complex problems. corresponding to the numerous sources of failures, we proposed a multifaceted model of cps competency, consisting of knowledge, skills, abilities, and other influencing components. we highlighted domain-general and domain-specific components in each category in order to demonstrate the large numbers of requirements that problem solvers face in complex problem situations and we discussed implications for research, assessment and training. as dörner (1986) noted, acknowledging complexity is an important prerequisite for the proper assessment and training of cps competency. this is not going to be a walk in the park, but given the high relevance of cps for modern life, it may well be worth the effort. acknowledgements: this research was self-funded. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. 5 it is interesting to note that one of the origins of competency models in industrial and organizational psychology is mcclelland’s (1973) work, which is directly related to the assessment of basic abilities such as intelligence. 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 10 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems author contributions: the authors contributed equally to this paper. handling editor: jan rummel copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: fischer, a. & neubert, j.c. (2015). the multiple faces of complex problems: a model of problem solving competency and its implications for training and assessment. journal of dynamic decision making, 1, 6. doi: 10.11588/jddm.2015.1.23945 received: 29 september 2015 accepted: 16 february 2016 published: 25 february 2016 references ackerman, p. l. (1992). predicting individual differences in complex skill acquisition: dynamics of ability determinants. journal of applied psychology, 77(5), 598–614. doi: 10.1037/00219010.77.5.598 adey, p., csapó, b., demetriou, a., hautamäki, j., & shayer, m. (2007). can we be intelligent about intelligence? educational research review, 2(2), 75–97. doi: 10.1016/j.edurev.2007.05.001 anderson, j. r. (1987). skill acquisition: compilation of weak-method problem situations. psychological review, 94(2), 192. doi: 10.1037/0033-295x.94.2.192 baltes, p. b., & smith, j. (1990). weisheit und weisheitsentwicklung: prolegomena zu einer psychologischen weisheitstheorie. [wisdom and how it develops: framework of a psychological theory of wisdom]. zeitschrift für entwicklungspsychologie und pädagogische psychologie, 22(2), 95-135. baltes, p. b., & staudinger, u. m. (2000). wisdom: a metaheuristic (pragmatic) to orchestrate mind and virtue toward excellence. american psychologist, 55(1), 122-136. doi: 10.1037/0003066x.55.1.122 beckmann, j. f., & guthke, j. (1995). complex problem solving, intelligence, and learning ability. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective (pp. 177–200). hillsdale, nj: erlbaum. berry, d. c., & broadbent, d. e. (1988). interactive tasks and the implicit-explicit distinction. british journal of psychology, 79(2), 251–272. doi: 10.1111/j.2044-8295.1988.tb02286.x bollen, k. a., & bauldry, s. (2011). three cs in measurement models: causal indicators, composite indicators, and covariates. psychological methods, 16(3), 265-284. doi: 10.1037/a0024448 brockmann, m., clarke, l., & winch, c. (2008). knowledge, skills, competence: european divergences in vocational education and training (vet) the english, german and dutch cases. oxford review of education, 34(5), 547–567. doi: 10.1080/03054980701782098 brown, k. g., le, h., & schmidt, f. l. (2006). specific aptitude theory revisited: is there incremental validity for training performance? international journal of selection and assessment, 14(2), 87–100. doi: 10.1111/j.1468-2389.2006.00336.x campion, m. a., fink, a. a., ruggeberg, b. j., carr, l., phillips, g. m., & odman, r. b. (2011). doing competencies well: best practices in competency modeling. personnel psychology, 64(1), 225–262. doi: 10.1111/j.1744-6570.2010.01207.x chi, m. t. h., feltovich, p. j., & glaser, r. (1981). categorization and representation of physics problems by experts and novices. cognitive science, 5(2), 121–152. doi: 10.1207/s15516709cog0502_2 chi, m. t. h., glaser, r., & rees, e. (1982). expertise in problem solving. in r. j. sternberg (ed.), advances in the psychology of human intelligence (vol. 1, pp. 7-75). hillsdale, nj: erlbaum. constanza, d. g., fleishman, e. a., & marshall-mies, j. c. (1999). knowledge. in n. g. peterson, m. d. mumford, w. c. borman, p. r. jeanneret, & e. a. fleishman (eds.), an occupational information system for the 21st century: the development of o*net (pp. 71–90). washington, d.c.: apa. doi: 10.1037/10313-005 costa, p. t., & maccrae, r. r. (1992). revised neo personality inventory (neo pi-r) and neo five-factor inventory (neo ffi): professional manual. odessa, fl: psychological assessment resources. d’zurilla, t. j., & nezu, a. m. (2007). problem-solving therapy: a positive approach to clinical intervention. new york, ny: springer. donovan, s.j., güss, c.d., & naslund, d. (2015): improving dynamic decision making through training and self-reflection. judgment and decision making, 10(4), 284-295. dörner, d. (1979). problemlösen als informationsverarbeitung [problem solving as information processing]. stuttgart: kohlhammer. dörner, d. (1986). diagnostik der operativen intelligenz [assessment of operative intelligence]. diagnostica, 32(4), 290–308. dörner, d. (1996). the logic of failure: recognizing and avoiding error in complex situations. new york, ny: perseus. duncker, k. (1945). on problem-solving. psychological monographs, 58(5), 1–113. doi: 10.1037/h0093599 dutt, v. & gonzalez, c. (2015). accounting for outcome and process measures in dynamic decision-making tasks through model calibration. journal of dynamic decision making, 1 ,2. doi: 10.11588/jddm.2015.1.17663 edwards, j. r. (2010). the fallacy of formative measurement. organizational research methods.14(2), 370–388. doi: 10.1177/1094428110378369 fischer, a., greiff, s., & funke, j. (2012). the process of solving complex problems. journal of problem solving, 4(1), 19-42. doi: 10.7771/1932-6246.1118 fischer, a., greiff, s., wüstenberg, s., fleischer, j., buchwald, f., & funke, j. (2015). assessing analytic and interactive aspects of problem solving competency. learning and individual differences, 39, 172-179. doi: 10.1016/j.lindif.2015.02.008 fischer, a., holt, d. v., & funke, j. (2015). promoting the growing field of dynamic decision making. journal of dynamic decision making, 1, 1. doi: 10.11588/jddm.2015.1.23807 fischer, a. (2015a). assessment of problem solving skills by means of multiple complex systems – validity of finite automata and linear dynamic systems (doctoral dissertation). urn:nbn:de:bsz:16heidok-196898 fischer, a. (2015b). wisdom the answer to all the questions really worth asking. international journal of humanities and social science, 5(9), 73-83. urn:nbn:de:bsz:16-heidok-197863 fleishman, e. a. (1992). fleishman-job analysis survey (f-jas). potomac, md: management research institute. fleishman, e. a., constanza, d. g., & marshall-mies, j. c. (1999). 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 11 http://dx.doi.org/10.1037/0021-9010.77.5.598 http://dx.doi.org/10.1037/0021-9010.77.5.598 http://doi.org/10.1016/j.edurev.2007.05.001 http://doi.org/10.1016/j.edurev.2007.05.001 http://dx.doi.org/10.1037/0033-295x.94.2.192 http://dx.doi.org/10.1037/0003-066x.55.1.122 http://dx.doi.org/10.1037/0003-066x.55.1.122 http://doi.org/10.1111/j.2044-8295.1988.tb02286.x http://doi.org/10.1037/a0024448 http://doi.org/10.1080/03054980701782098 http://doi.org/10.1080/03054980701782098 http://doi.org/10.1111/j.1468-2389.2006.00336.x http://doi.org/10.1111/j.1744-6570.2010.01207.x http://dx.doi.org/10.1207/s15516709cog0502_2 http://dx.doi.org/10.1207/s15516709cog0502_2 http://dx.doi.org/10.1037/10313-005 http://doi.org/10.1037/h0093599 http://doi.org/10.1/jddm.2015.1.17663 http://doi.org/10.1/jddm.2015.1.17663 http://doi.org/10.1177/1094428110378369 http://doi.org/10.1177/1094428110378369 http://dx.doi.org/10.7771/1932-6246.1118 http://dx.doi.org/10.7771/1932-6246.1118 http://doi.org/10.1016/j.lindif.2015.02.008 http://doi.org/10.11588/jddm.2015.1.23807 http://nbn-resolving.de/urn:nbn:de:bsz:16-heidok-196898 http://nbn-resolving.de/urn:nbn:de:bsz:16-heidok-196898 http://nbn-resolving.de/urn:nbn:de:bsz:16-heidok-197863 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems abilities. in n. g. peterson, m. d. mumford, w. c. borman, p. r. jeanneret, & e. a. fleishman (eds.), an occupational information system for the 21st century: the development of o*net (pp. 175–195). washington, dc: apa. doi: 10.1037/10313-010 funke, j. (1985). problemlösen in komplexen computersimulierten realitätsbereichen [problem solving in complex computersimulated domains of reality]. sprache & kognition, 4, 113-129. funke, j. (1991). solving complex problems: exploration and control of complex systems. in r. j. sternberg & p. a. frensch (eds.), complex problem solving: principles and mechanisms (pp. 185-222). hillsdale, nj: erlbaum. funke, j. (1992). dealing with dynamic systems: research strategy, diagnostic approach and experimental results. german journal of psychology, 16, 24-43. funke, j. (2001). dynamic systems as tools for analysing human judgement. thinking & reasoning, 7(1), 69–89. doi: 10.1080/13546780042000046 funke, j. (2003). problemlösendes denken [problem-solving thinking]. stuttgart: kohlhammer. funke, j. (2010). complex problem solving: a case for complex cognition? cognitive processing, 11(2), 133–142. doi: 10.1007/s10339-009-0345-0 funke, j., fischer, a., & holt, d. (2015). competencies for complexity: problem solving in the 21st century. in e. care, p.griffin, & m. wilson (eds.), assessment and teaching of 21st century skills (volume 3). manuscript submitted for publication. gigerenzer, g., & gaissmaier, w. (2011). heuristic decision making. annual review of psychology, 62(1), 451–482. doi: 10.1146/annurev-psych-120709-145346 gonzalez, c., lerch, j. f., & lebiere, c. (2003). instance-based learning in dynamic decision making. cognitive science, 27(4), 591–635. doi: 10.1016/s0364-0213(03)00031-4 greiff, s., fischer, a., wüstenberg, s., sonnleitner, p., brunner, m., & martin, r. (2013). a multitrait–multimethod study of assessment instruments for complex problem solving. intelligence, 41(5), 579–596. doi: 10.1016/j.intell.2013.07.012 greiff, s., wüstenberg, s., & funke, j. (2012). dynamic problem solving: a new assessment perspective. applied psychological measurement, 36(3), 189–213. doi: 10.1177/0146621612439620 greiff, s., wüstenberg, s., molnár, g., fischer, a., funke, j., & csapó, b. (2013). complex problem solving in educational contexts—something beyond g: concept, assessment, measurement invariance, and construct validity. journal of educational psychology, 105(2), 364–379. doi: 10.1037/a0031856 greiff, s., & neubert, j. c. (2014). on the relation of complex problem solving, personality, fluid intelligence, and academic achievement. learning and individual differences, 36, 37–48. doi: 10.1016/j.lindif.2014.08.003 greeno, j. g. (1997). on claims that answer the wrong questions. educational researcher, 26(1), 5-17. doi: 10.3102/0013189x026001005 guilford, j. p., merrifield, p. r., & wilson, r. c. (1958). unusual uses test. orange, ca: sheridan psychological services. güss, c. d., tuason, m. t., & orduña, l.v. (2015). strategies, tactics, and errors in dynamic decision making in an asian sample. journal of dynamic decision making, 1, 3. doi: 10.11588/jddm.2015.1.13131 gustafsson, v. (2006). entrepreneurial decision-making: individuals, tasks and cognitions. cheltenham, uk: elgar. hacker, w. (1973). allgemeine arbeitsund ingenieurpsychologie: psychische struktur und regulation von arbeitstätigkeiten [general industrial and engineering psychology–mental structure and regulation of working activities]. berlin: veb deutscher verlag der wissenschaften. hossiep, r., & schulte, m. (2008). bowit. bochumer wissenstest [bowit. bochum knowledge test]. göttingen: hogrefe. hundertmark, j., holt, d. v., fischer, a., said, n., & fischer, h. (2015). system structure and cognitive ability as predictors of performance in dynamic system control tasks. journal of dynamic decision making, 1, 5. doi: 10.11588/jddm.2015.1.26416 hussy, w. (1985). komplexes problemlösen eine sackgasse? [complex problem solving a dead end?]. zeitschrift für experimentelle und angewandte psychologie, 32, 55–74. jäger, a. o. (1984). intelligenzstrukturforschung: konkurrierende modelle, neue entwicklungen, perspektiven. [structural research on intelligence: competing models, new developments, perspectives]. psychologische rundschau, 35(1), 21–35. jäger, a. o., süß, h. m., & beauducel, a. (1997). berliner intelligenzstruktur-test (bis-test): form 4 [the berlin intelligence structure test: form 4]. göttingen: hogrefe. johnson, j. g., & raab, m. (2003). take the first: optiongeneration and resulting choices. organizational behavior and human decision processes, 91(2), 215-229. doi: 10.1016/s0749-5978(03)00027-x kalyuga, s., ayres, p., chandler, p., & sweller, j. (2003). the expertise reversal effect. educational psychologist, 38(1), 23–31. doi: 10.1207/s15326985ep3801_4 kersting, m., & süß, h. m. (1995). kontentvalide wissensdiagnostik und problemlösen: zur entwicklung, testtheoretischen begründung und empirischen bewährung eines problemspezifischen diagnoseverfahrens. [content valid diagnostics of knowledge and problem solving: development, testtheoretical establishment and empirical proof of a problem specific diagnostical instrument.] zeitschrift für pädagogische psychologie, 9(2), 83-93. klahr, d., & dunbar, k. (1988). dual space search during scientific reasoning. cognitive science, 12(1), 1-48. doi: 10.1207/s15516709cog1201_1 kretzschmar, a., neubert, j. c., wüstenberg, s., & greiff, s. (2016). construct validity of complex problem solving: a comprehensive view on different facets of intelligence and school grades. intelligence, 54, 55-69. doi: 10.1016/j.intell.2015.11.004 kretzschmar, a. & süß, h.-m. (2015). a study on the training of complex problem solving competence. journal of dynamic decision making, 1, 4. doi:10.11588/jddm.2015.1.15455 larkin, j., mcdermott, j., simon, d. p., & simon, h. a. (1980). expert and novice performance in solving physics problems. science, 208(4450), 1335–1342. doi: 10.1126/science.208.4450.1335 leutner, d. (2002). the fuzzy relationship of intelligence and problem solving in computer simulations. computers in human behavior, 18(6), 685-697. doi: 10.1016/s0747-5632(02)00024-9 lind, g. (2000). review and appraisal of the moral judgment test (mjt). psychology of morality and democracy and education, 1-15. mackintosh, n. j. (1998). iq and human intelligence. oxford, uk: oxford university press. mcclelland, d. c. (1973). testing for competence rather than for "intelligence". american psychologist, 28(1), 1–14. doi: 10.1037/h0034092 morris, n. m., & rouse, w. b. (1985). review and evaluation of empirical research in troubleshooting. human factors: journal of the human factors and ergonomics society, 27(5), 503-530. doi: 10.1177/001872088502700502 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 12 http://dx.doi.org/10.1037/10313-010 http://doi.org/10.1080/13546780042000046 http://doi.org/10.1080/13546780042000046 http://doi.org/10.1007/s10339-009-0345-0 http://doi.org/10.1007/s10339-009-0345-0 http://doi.org/10.1146/annurev-psych-120709-145346 http://doi.org/10.1146/annurev-psych-120709-145346 http://doi.org/10.1016/s0364-0213(03)00031-4 http://doi.org/10.1016/j.intell.2013.07.012 http://doi.org/10.1177/0146621612439620 http://doi.org/10.1037/a0031856 http://doi.org/10.1016/j.lindif.2014.08.003 http://doi.org/10.1016/10.3102/0013189x026001005 http://doi.org/10.1016/10.3102/0013189x026001005 http://doi.org/10.11588/jddm.2015.1.13131 http://doi.org/10.11588/jddm.2015.1.13131 http://doi.org/10.11588/jddm.2015.1.26416 http://dx.doi.org/10.1016/s0749-5978(03)00027-x http://dx.doi.org/10.1016/s0749-5978(03)00027-x http://doi.org/10.1207/s15326985ep3801_4 http://doi.org/10.1207/s15516709cog1201_1 http://doi.org/10.1207/s15516709cog1201_1 http://dx.doi.org/10.1016/j.intell.2015.11.004 http://doi.org/10.11588/jddm.2015.1.15455 http://doi.org/10.1126/science.208.4450.1335 http://doi.org/10.1126/science.208.4450.1335 http://doi.org/10.1016/s0747-5632(02)00024-9 http://doi.org/10.1037/h0034092 http://doi.org/10.1037/h0034092 http://doi.org/10.1177/001872088502700502 http://doi.org/10.1177/001872088502700502 http://dx.doi.org/10.11588/jddm.2015.1.23945 fischer & neubert: the multiple faces of complex problems mumford, m. d., peterson, n. g., & childs, r. a. (1999). basic and cross-functional skills. in n. g. peterson, m. d. mumford, w. c. borman, p. r. jeanneret, & e. a. fleishman (eds.), an occupational information system for the 21st century: the development of o*net (pp. 49–69). washington, dc: apa. doi: 10.1037/10313-004 national center for o*net development. (2009). new and emerging occupations of the 21st century: updating the o*netsoc taxonomy. neubert, j. c., kretzschmar, a., wüstenberg, s., & greiff, s. (2014). extending the assessment of complex problem solving to finite state automata: embracing heterogeneity. european journal of psychological assessment, 31, 181-194. doi: 10.1027/1015-5759/a000224 neubert, j. c., mainert, j., kretzschmar, a., & greiff, s. (2015). the assessment of 21st century skills in industrial and organizational psychology: complex and collaborative problem solving. industrial and organizational psychology, 8(2), 238–268. doi: 10.1017/iop.2015.14 nickolaus, r. (2011). die erfassung fachlicher kompetenzen und ihrer entwicklungen in der beruflichen bildung – forschungsstand und perspektiven [assessment of specialist competencies and their development in professional education state of the art and perspectives]. in o. zlatkin-troitschanskaia (ed.), stationen empirischer bildungsforschung (pp. 331–351). wiesbaden: vs verlag für sozialwissenschaften. doi: 10.1007/978-3-531-940250_24 oecd. (2013). pisa 2012 assessment and analytical framework: mathematics, reading, science, problem solving and financial literacy. paris: oecd publishing. doi: 10.1787/9789264190511-en oecd. (2014). pisa 2012 results: creative problem solving (volume v). paris; france: oecd publishing. doi: 10.1787/19963777 peterson, n. g., mumford, m. d., borman, w. c., jeanneret, p. r., fleishman, e. a., levin, k. y., & others. (2001). understanding work using the occupational information network (o* net): implications for practice and research. personnel psychology, 54(2), 451–492. doi: 10.1111/j.1744-6570.2001.tb00100.x putz-osterloh, w., & lüer, g. (1981). the predictability of complex problem solving by performance on an intelligence test. zeitschrift für experimentelle und angewandte psychologie, 28(2), 309-334. raaheim, k. (1974). problem solving and intelligence. oslo, norway: universitetsforlaget. rey, g. d., & fischer, a. (2013). the expertise reversal effect concerning instructional explanations. instructional science, 41(2), 407-429. doi: 10.1007/s11251-012-9237-2 rheinberg, f., vollmeyer, r., & burns, b. d. (2001). fam: ein fragebogen zur erfassung aktueller motivation in lernund leistungssituationen 12 (langversion, 2001) [a questionnaire to assess current motivation in learning situations]. diagnostica, 2, 57-66. doi: 10.1026//0012-1924.47.2.57 roth, w.-m. (1998). situated cognition and assessment of competence in science. evaluation and program planning, 21(2), 155–169. doi: 10.1016/s0149-7189(98)00004-4 schaub, h. (1990). die situationsspezifität des problemlöseverhaltens. [situational specifity of problem solving behavior]. zeitschrift für psychologie mit zeitschrift für angewandte psychologie, 198(1), 83-96. scherer, r., & tiemann, r. (2012). factors of problem-solving competency in a virtual chemistry environment: the role of metacognitive knowledge about strategies. computers & education, 59(4), 1199–1214. doi: 10.1016/j.compedu.2012.05.020 schmidt, f. l., ones, d. s., & hunter, j. e. (1992). personnel selection. annual review of psychology, 43(1), 627–670. doi: 10.1146/annurev.ps.43.020192.003211 schoppek, w. & fischer, a. (2015). complex problem solving—single ability or complex phenomenon? frontiers in psychology, 6, 1669. doi: 10.3389/fpsyg.2015.01669 schuler, h., thornton, g.c., frintrup, a., & mueller-hanson, r. (2004). achievement motivation inventory (ami). göttingen, bern, new york: huber. simons, j. s., & gaher, r. m. (2005). the distress tolerance scale: development and validation of a self-report measure. motivation and emotion, 29(2), 83-102. spitz-oener, a. (2006). technical change, job tasks, and rising educational demands: looking outside the wage structure. journal of labor economics, 24(2), 235–270. doi: 10.1086/jole.2006.24.issue-2 sternberg, r. j. (1982). handbook of human intelligence. new york, ny: cambridge university. süß, h.-m., kersting, m., & oberauer, k. (1991). intelligenz und wissen als prädiktoren für leistungen bei computersimulierten komplexen problemen [intelligence and knowledge as predictors of performance in computer-simulated complex problems]. diagnostica, 37(4), 334-352. süß, h.-m. (1996). intelligenz, wissen und problemlösen. kognitive voraussetzungen für erfolgreiches handeln bei computersimulierten problemen [intelligence, knowledge and problem solving: cognitive prerequisites for successful behavior in computersimulated problems]. göttingen: hogrefe. süß, h.-m. (1999). intelligenz und komplexes problemlösen [intelligence and complex problem solving]. psychologische rundschau, 50(4), 220–228. doi: 10.1026//0033-3042.50.4.220 strohschneider, s., & güss, d. (1999). the fate of the moros: a cross-cultural exploration of strategies in complex and dynamic decision making. international journal of psychology, 34(4), 235-252. toplak, m., west, r. f., & stanovich, k. e. (2011). the cognitive reflection test as a predictor of performance on heuristicsand-biases tasks. memory and cognition, 39, 1275–1289. doi: 10.3758/s13421-011-0104-1 tricot, a., & sweller, j. (2013). domain-specific knowledge and why teaching generic skills does not work. educational psychology review, 26(2), 1–19. doi: 10.1007/s10648-013-9243-1 weaver, w. (1948). science and complexity. american scientist, 36, 536-547. doi: 10.1016/b978-0-08-097086-8.25094-0 weinert, f. e. (1999). konzepte der kompetenz. [concepts of competence]. paris: oecd. weis, s. & conzelmann, k. (2015). social intelligence and competencies. in j. d. wright, international encyclopedia of the social & behavioral sciences (2nd ed.). (pp. 371–379). oxford: elsevier. doi: 10.1016/b978-0-08-097086-8.25094-0 wittmann, w., & hattrup, k. (2004). the relationship between performance in dynamic systems and intelligence. systems research and behavioral science, 21(4), 393–409. doi: 10.1002/sres.653 wittmann, w., & süß, h.-m. (1999). investigating the paths between working memory, intelligence, knowledge, and complex problem-solving performances via brunswik symmetry. in p. l. ackerman, p. c. kyllonen, & r. d. roberts (eds.), learning and individual differences: process, trait, and content determinants (pp. 77–108). washington, d.c.: apa. doi: 10.1037/10315-004 wüstenberg, s., stadler, m., hautamäki, j., & greiff, s. (2014). the role of strategy knowledge for the application of strategies in complex problem solving tasks. technology, knowledge and learning, 19, 127–146. doi: 10.1007/s10758-014-9222-8 10.11588/jddm.2015.1.23945 jddm | 2015 | volume 1 | article 6 | 13 http://dx.doi.org/10.1037/10313-004 http://dx.doi.org/10.1037/10313-004 http://doi.org/10.1027/1015-5759/a000224 http://doi.org/10.1027/1015-5759/a000224 http://doi.org/10.1017/iop.2015.14 http://doi.org/10.1017/iop.2015.14 http://dx.doi.org/10.1007/978-3-531-94025-0_24 http://dx.doi.org/10.1007/978-3-531-94025-0_24 http://dx.doi.org/10.1787/9789264190511-en http://dx.doi.org/10.1787/19963777 http://dx.doi.org/10.1787/19963777 http://dx.doi.org/10.1111/j.1744-6570.2001.tb00100.x http://dx.doi.org/10.1007/s11251-012-9237-2 http://dx.doi.org/10.1026//0012-1924.47.2.57 http://doi.org/10.1016/s0149-7189(98)00004-4 http://doi.org/10.1016/j.compedu.2012.05.020 http://doi.org/10.1146/annurev.ps.43.020192.003211 http://doi.org/10.1146/annurev.ps.43.020192.003211 http://doi.org/10.3389/fpsyg.2015.01669 http://doi.org/10.1086/jole.2006.24.issue-2 http://doi.org/10.1086/jole.2006.24.issue-2 http://doi.org/10.1026//0033-3042.50.4.220 http://doi.org/10.3758/s13421-011-0104-1 http://doi.org/10.3758/s13421-011-0104-1 http://doi.org/10.1007/s10648-013-9243-1 http://dx.doi.org/10.1016/b978-0-08-097086-8.25094-0 http://dx.doi.org/10.1016/b978-0-08-097086-8.25094-0 http://doi.org/10.1002/sres.653 http://doi.org/10.1002/sres.653 http://dx.doi.org/10.1037/10315-004 http://doi.org/10.1007/s10758-014-9222-8 http://dx.doi.org/10.11588/jddm.2015.1.23945 original research evidence for the dynamic human ability to judge another’s sex from ambiguous or unfamiliar signals justin gaetano school of psychology, university of new england, armidale, australia humans make decisions about social information efficiently, despite – or perhaps because of – the sheer scale of data available. of these various signals, sex cues are vitally important, yet understanding whether participants perceive them as static or dynamic is unknown. the present study addressed the related question of how expertise impinges on sex judgements. participants (80 caucasian, 80 asian) were asked to target female and male exemplars from a set of ownor otherrace hand images. data show: (1) that the own-race sex categorisation advantage observed previously using face stimuli can occur in relation to hands, and (2) sensitivity of asian participants, but not caucasian participants, is dynamic relative to how many fe/males there are in a set. implications of these findings are discussed as further evidence that there exists a pan-stimulus sex processor, and as fresh evidence that human sex perception can change probabilistically. keywords: cross-cultural judgement, perceptual decision making, dynamic sex discrimination, other-race effect, own-race advantage, prior target probability sex is one of an exclusive set of categories by which aperson may classify another person automatically. auditory (junger et al., 2013; li, logan, & pastore, 1991), olfactory (hacker, brooks, & van der zwan, 2013; kovács et al., 2004), and of course visual (kozlowski & cutting, 1977; yamaguchi, hirukawa, & kanazawa, 1995) information about others can lead to judgements of sex. focussing on vision, some behavioural correlates of sex perception have been demonstrated. for instance, male bias – the systematic tendency to judge perceptually noisy or androgynous stimuli as male – can arise from a diverse range of visual sex cues including whole-body data (motion cues: troje, sadr, geyer, & nakayama, 2006; amorphous drawings: brielmann, gaetano, & stolarova, 2015; wenzlaff, briken, & dekker, 2018), and silhouette, static representations of the face (davidenko, 2007) and hand (gaetano, van der zwan, blair, & brooks, 2014; gaetano et al., 2016). perceptual ambiguity is indeed a key predictor of male biased responding, however studies of child and adult participants imply that the viewer’s expertise might also interact with male bias (white, hock, jubran, heck & bhatt, 2018; wild et al., 2000; cf. bayet et al., 2015; tsang et al., 2018). understanding how sex perception works not just under noisy conditions, but more generally, as a dynamic function of the perceiver’s experience is the present objective. perceptual experience has shown to change social judgements over extended periods of time. a class of phenomena that demonstrate this point are other-race effects, which refer to participants’ differential processing of stimuli that bear a less familiar resemblance race-wise (meissner & brigham, 2001; o’toole et al., 1994). other-race effects can have a powerful impact on eyewitness testimonies (behrman & davey, 2001; pezdek, blandon-gitlin, & moore, 2003), forensic line-up identifications (smith, lindsay, pryke, & dysart, 2001; wells & olson, 2001), and even visual sex discrimination (o’toole et al., 1996 cf. zhao & hayward, 2010). in this work, the nomenclature reserved for cases that demonstrate heightened sensitivity for own-race cues are known as own-race advantages (oras). in o’toole and colleagues’ (1996) sex categorisation study, for example, caucasian and asian judges categorised caucasian and asian faces individually as ‘female’ or ‘male’. overall, caucasians and asians were equally proficient at the task, with both groups achieving higher-than-chance sensitivity (o’toole et al., 1996). particularly significant in the current context, it was found that both groups were more astute of own-race faces than other-race faces. therefore, ora is not defined by the judge’s race per se. plausibly then, ora could be the result of relative expertise for ownrace faces that develops over many years of experience. if sensitivity to sex cues depends on long term development per se, then such findings should not arise exclusively from face stimuli. evidence of ora for non-face stimuli would support this theory. another potential source of support is the hypothetical own-sex advantage, by which a person would show heightened sensitivity judging people who are the same sex as them. so far, this theory has been explored within the face perception research domain exclusively. current evidence suggests that women have an enhanced capacity to judge faces compared to men, particularly when the faces depict women (herlitz & lovén, 2013; lewin & herlitz, 2002; rehnman & herlitz, 2007). in other words, unlike the ora, the own-sex advantage is apparently specific to female perceptual development. whether this extends to non-face stimuli remains to be tested. the more immediate question is whether women develop an enhanced ability to judge social cues per se, irrespective of whether those belong to other men or women. the current study asks simply whether the general female advantage extends beyond face judgement scenarios, leaving the specific question of an own-sex advantage open to future research. corresponding author: justin gaetano, school of psychology, building s006, university of new england, armidale, nsw 2351, australia, email: jgaetan2@une.edu.au 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 1 mailto:jgaetan2@une.edu.au https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands in summary, the current focus is to investigate ora over own-sex advantage, because the former is a stronger, more prevalent class of phenomena with important ramifications (e.g. meissner & brigham, 2001), and it has at least been demonstrated in a sex judgement study before (i.e. o’toole et al., 1996). however, it remains to be tested whether ora in sex categorisation is a genuine, expertise-driven effect, or an artefact of stimulus labelling. in o’toole et al.’s (1996) study, participants were told the race of faces they would be shown at the start of each viewing sequence, leaving open the possibility that knowledge of the race categories might systematically affect outcomes. furthermore, facial features such as eye shape and colour obviously do differ by race, and sex signals can confound judgements of emotion from faces (e.g. taylor, 2017). on those grounds, testing ora using a less accessible set of cues may therefore yield different outcomes. finally, empirical accounts of ora seem to illustrate how categorical judgements of others are based upon population-based norms (jaquet, rhodes, & hayward, 2007; valentine, 1991), yet it is currently unknown whether the norms extend beyond just face-based norms. the sole study of ora in sex categorisation (o’toole et al., 1996) is, like other cross-race perception studies, focussed on perceptions of ownand other-race faces. while the face might be the primary target in social development, it is certainly not the only sexually dimorphic feature that participants seem attentive to (for a review, see gaetano et al., 2012). the visual system may in fact develop expertise with regard to hands, and inherent in those, the dynamic (albeit non-verbal) cues that hands contribute toward communication (e.g. cook & tanenhaus, 2009; goldinmeadow, wein, & chang, 1992). thus, the present research asks whether sex categorisation ora can arise without race priming and can generalise to perceptions of hand stimuli. if so, then the case could be made that sex processing has a common, expertise-dependent basis. of course, perceptions of sex can also be influenced by higher order information (bailey, lafrance, & dovidio, 2018; freeman & ambady, 2011). thus far, top-down sex perception has almost exclusively been tested in relation to stereotypes. in those terms, stereotypically feminine or masculine emotion (hess, adams, grammer, & kleck, 2009); or stereotypes associated with asian or african appearance (johnson, freeman, & pauker, 2012); may facilitate respectively ‘female’ or ‘male’ judgements of otherwise androgynous faces. in the absence of morphological signals, it is also apparently easier to judge ‘sad’or ‘angry’primed body motions as female or male (johnson, mckay, & pollick, 2011). such higher order face and body signal effects demonstrate the need for any comprehensive model of sex processing to take into account that sex perception is part of a dynamic person processing system (freeman & ambady, 2011). in light of the lower-level focus of the present study, the influence of stereotyped expressions and facial features should be minimised. whilst evidence from the face perception domain is divergent about whether expertise really does drive the ora (e.g. zhao, hayward, & bülthoff, 2014), use of stimuli other than faces might, in future studies, at least control for face-based stereotypes. thus, the present study seeks to infer how experience might shape perceptions of sex beyond solely face-based processing accounts. of course, expertise is a long term form of experience that has been defined and studied by way of participant race (o’toole et al., 1996), sex (lewin & herlitz, 2002), and age (wild et al., 2000); and as based on perceptual (jaquet et al., 2007; jaquet, rhodes, & hayward, 2008) and neuroimaging measures (gauthier, tarr, anderson, skudlarski, & gore, 1999; mcgugin, newton, gore, & gauthier, 2014). less extensively investigated is the role of changing experience in the short term. in many studies of sex discrimination, the prior probability of male and female stimulus presentation are static and equal. in such studies, participants engage in binary tasks, in which they must choose between two responses – ‘target sex’ or ‘not target sex’ (e.g. gaetano et al., 2014), or ‘female’ or ‘male’ (e.g. o’toole et al., 1996) – on each trial. as it happens, large human populations (e.g. all citizens of a city, state, or nation) are roughly composed of 50% female and 50% male individuals (central intelligence agency, 2014), so it seems reasonable to construct perceptual tasks with equal numbers of female and male stimuli. however, systematic demonstrations of male bias (e.g. wild et al., 2000) infer that participants overestimate the frequency of males relative to females, suggesting that the bias is determined by factors other than long term experience. this calls into question the stability of sex perception performance relative to changing perception in the short term. the question of just how susceptible sex judgements are to short term manipulations of sex ratio – or prior target probability (ptp) manipulations – was first addressed by gaetano and colleagues (2016). that study revealed that ptp has no systematic bearing on sex judgement bias – the tendency to judge a signal (e.g. a person’s face or hand) as female or male. independent from bias outcomes, sensitivity indicates the accuracy of sex judgements, both in terms of true positive decisions (e.g. viewing a male and deciding ‘yes, it is male’), and true negative decisions (e.g. viewing a female and deciding ‘no, it is not male’). to date, it is not known whether sex judgement sensitivity is dynamic, and thus can change relative to ptp. in summary, it is possible that sensitivity to sex cues might be tuned not only to long term or developmental experience, but also to recent experience, such that sensitivity may fluctuate as a function of the sex ratio to which participants are exposed. the current study aims to test the extent to which sex judgements depend on (i) the participant’s long term familiarity with stimuli, and (ii) the relative frequency of certain sex cues in the short term. in parallel to gaetano and colleagues’ (2016) study, these experiments involve a cross-race sample of adult female and male participants and non-face stimuli, allowing the influence of long term experience on sex discrimination accuracy to be investigated. firstly, assuming that ora is a phenomenon general to sex processing, it should occur for visual non-face stimuli and between participant groups. specifically, it is hypothesised that when colour and texture cues are available, sensitivity will be higher for own-race participants relative to other-race participants. when sex cues are more difficult to discern within(silhouette conditions) or between-groups (shorter presentation durations), the advantage is expected to dissipate. secondly, assuming that the female judgement advantage is also generalisable beyond face-based stimuli, it should manifest in relation to non-face stimuli in the current study. specifically, it is hypothesised that female participants will exhibit higher sensitivity for hand stimuli – absence of colour and textures cues or short presentation durations will negate the effect. third and finally, assuming sex perception sensitivity is dynamic in the short term, then performance should be affected by more or less ex10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 2 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands figure 1. caucasian (top panel) and asian (bottom panel) stimuli used in the current study. each image was reduced to the size of the smallest caucasian exemplar while preserving natural aspect ratios. within each stimulus condition 15 female and 15 male exemplars were represented. posure to female (or male) own-race cues. specifically, it is hypothesised that within a subset of participant groups (i.e. own-race participants), ptp will not affect sensitivity rates when participants are asked to target male or female hand stimuli. in this case, testing the null hypothesis is reasonable, in light of the negligible effect of ptp on bias outcomes observed in gaetano and colleagues’ (2016) study. method ethics statement all participants gave written, informed consent prior to participating in the study. all experiments were approved by the human research ethics committee, scu (approval numbers: ecn-11-236; ecn-12-280; ecn-13-032; ecn14-028). in addition, all experiments conducted in hong kong were approved by the human research ethics committee for non-clinical faculties, university of hong kong. this study complies with the ethical standards specified by the declaration of helsinki. participants and materials throughout this study, race was operationalised from a social constructivist perspective, in line with contemporary cited studies and ethical research protocols (e.g. brielmann, bülthoff, & armann, 2014; cao, contreras-huerta, mcfadyen, & cunnington, 2015; gaetano et al., 2016). here, participants and hand stimuli models who selfidentified culturally or ethnically as australian or hong kongese formed the caucasian or asian study groups, respectively. all caucasian participants reported being australian citizens. of those, a single participant (1%) reported spending one year in hong kong and/or china; all other caucasians indicated living in australia between 18 and 65 years (m = 31.29, sd = 10.28). the majority of asian participants (71%) reported being permanent residents of hong kong or chinese citizens. caucasian participants reported living 18 to 30 years in hong kong and/or china (m = 21.46, sd = 2.70), all of whom reported spending no time in australia. participants were 80 caucasians (47 female) and 80 asians (39 female), on average aged 32.49 (sd = 11.08) and 21.50 (sd = 2.72), respectively. the age difference was found to be significant (f 1,157 = 72.50, p < .001) and although the role of age in sex judgements is not a current theoretical focus, it was explored in an unplanned manner (see appendix). thirty caucasian (15 female) and 30 asian (15 female), size-standardised individual hands formed the basis of the stimulus set used in the present experiment. exemplars were reduced to the size (as indexed by total pixel count) of the smallest (female) caucasian hand (105,069px at 70.87px/cm resolution), as per the method developed by gaetano et al. (2014), such that natural aspect ratios were preserved. they were presented centrally on a crt monitor with 1024 × 768 px display resolution. the width and height of the grey background framing the stimulus 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 3 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands subtended 15.74◦ and 25.70◦, respectively, with an average distance of 57 cm between participant and monitor. images were presented with all hue and texture information preserved (‘colour’ condition), and also with those cues removed (‘silhouette’ condition). thus, for each experimental group, the omnibus stimulus set comprised 120 images (30 caucasian or asian hands [15 female, 15 male] × 2 surfaces [dorsal, palmar] × 2 conditions [colour, silhouette]). stimulus exemplars are depicted in figure 1. procedure and analyses an equal number of caucasian and asian participants were assigned randomly to one of two experiments that differed only by stimulus presentation duration (experiment 1: 1000 ms; experiment 2: 125 ms). within each, participants were further equally and randomly divided into an own-race (i.e. caucasian/asian participants of caucasian/asian hands) or other-race (i.e. caucasian/asian participants of asian/caucasian hands) group. with each participant race (caucasian or asian) and sex (female or male) treated as an independent group, there were 16 quasi-experimental groups in total (i.e. 2 presentation duration experiments × 2 participant races × 2 stimulus races × 2 participant sexes). each experimental trial comprised in chronological order: a blank screen for 1000 ms, a stimulus presentation lasting 125 ms or 1000 ms, and a response screen (centred cross, +, on black background) that extinguished when either the participant made a response or 1000 ms had passed. at the response screen of each trial, the participant’s task was to indicate via key press whether the image represented a target (‘yes’) or not (‘no’). ‘targets’ were defined as either female or male stimuli across separate blocks. trials were blocked by target sex (female, male) and prior target probability (25%, 50%, 75%). thus, each experiment consisted of 720 trials in total: 30 caucasian or asian individual hands (15 of each sex) × 2 hand surfaces (dorsum, palm) × 2 hue/texture conditions (colour, silhouette) × 2 target sex blocks (female, male) × 3 target probability blocks (25%, 50%, 75%). stimuli were presented in random order within blocks, and block order and response key alternatives were counterbalanced across participants. participant sex discrimination ability was measured using the standardised (z-score) sensitivity measure d-prime (d’; gaetano, 2017; stanislaw & todorov, 1999). performances by each participant were calculated as an average on all (palmar and dorsal) trials on each condition of interest. for the sake of analytic parsimony, between-group prediction tests were applied only to selected conditions. specifically, only data from blocks in which the targetto-lure ratio was equal were subjected to cross-race comparisons. further, sensitivity was averaged across target sex conditions (female, male), because this factor was consistently found to not affect within-group sensitivity in a study that used identical stimuli (gaetano et al., 2014). in the subsequent within-group analyses pertaining to each experiment, female and male participant data were combined to form a caucasian and an asian group. both independent groups included only participants of own-race hands. within each group, performance was contrasted (i) across 25% and 75% ptp conditions and (ii) across those conditions combined and 50% ptp conditions, separately for each level of ambiguity (colour, silhouette) and target sex (female, male). target sex conditions were also included for statistical comparison. predictions were tested via planned contrasts (winer, 1962) using the psy software package (bird, 2004). the assumption of orthogonality was satisfied for all betweenand within-group contrasts, hence no correction was made to the pairwise criterion of significance (α = .05). for every contrast, r was calculated as the measure of effect size, expressing the magnitude of relationship between contrasted variables (gonzalez, 2009). results experiment 1 participants in this experiment were afforded a full second (1000 ms) to view each hand stimulus and subsequently identify it as a target (female or male) or not. in line with the general expectation that ora is not specific to faces, sensitivity rates were first contrasted as a function of participant race (caucasian, asian), hand stimulus race (caucasian, asian) and the interaction between those factors. then, female and male participant sensitivity rates were compared within caucasian and asian, ownand other-race groups. those seven planned, betweengroup contrasts were applied independently to conditions of hue/texture (colour, silhouette), because sensitivity has consistently shown to separate between those respectively less and more ambiguous conditions (gaetano et al., 2014). after those between-group tests, sensitivity rates were compared within each group of interest and across target sex and ptp conditions. it was expected that performance would not fluctuate as a function of those conditions. between-group outcomes. the d’ statistics (m ± se) corresponding to each participant race and sex are presented in figure 2 as a function of viewing condition. in the less ambiguous colour condition (left panel), caucasian female participants (own-race hands: 1.47 ± 0.13; otherrace hands: 1.06 ± 0.12) discriminated sex with greater average sensitivity than did caucasian males (own-race: 0.99 ± 0.12; other-race: 0.78 ± 0.15). asian females (ownrace: 1.10 ± 0.07; other-race: 1.12 ± 0.10) also seemed more sensitive than asian males (own-race: 0.88 ± 0.07; other-race: 0.32 ± 0.12). similarly in the silhouette condition (right panel), caucasian females (own-race: 0.84 ± 0.08; other-race: 0.69 ± 0.07) performed with higher discriminability than did caucasian males (own-race: 0.43 ± 0.12; other-race: 0.19 ± 0.12), and asian females (own-race: 0.58 ± 0.08; otherrace: 0.49 ± 0.15) outperformed asian males (own-race: 0.53 ± 0.07; other-race: 0.12 ± 0.13). group performances in response to the colour hand cues are contrasted here first. overall, though caucasian participants were more sensitive than asian participants (f 1,144 = 8.22, p = .005, r = .23), no sensitivity difference was observed across stimulus races (f 1,144 = 0.06, p = .807, r = .02). importantly, the interaction between participant and hand race was significant (f 1,144 = 14.13, p < .001, r = .30). thus, performance was characterised by an ora (see figure 2, left panel). that is, participants of own-race hands discriminated sex with higher accuracy than did other-race participants, with one exception – asian females did not show an ora, as revealed via post hoc comparison (f1,17 = 0.02, p = .879, r < .01). considering now just the own-race participants, caucasian females were more sensitive on average than were caucasian males (f 1,144 = 9.47, p = .002, r = .25); no such female advantage was found for asian participants (f 1,144 = 2.02, 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 4 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands figure 2. group measures of sex judgement sensitivity, for 1000 ms presentations of hands shown in colour (left panel; less ambiguous condition) and in silhouette (right panel; more ambiguous condition). sensitivity rates (d’) are grouped by participant race, participant sex, and stimulus familiarity (open circles: own-race hands; filled circles: other-race hands). vertical bars represent ±1 se. p = .158, r = .12). of the other-race participants, whilst caucasian females seem to have had higher sex discriminability than caucasian males, the effect did not reach significance (f 1,144 = 3.46, p = .065, r = .15). finally, the female advantage was deemed significant among asian participants of other-race hands (f 1,144 = 27.02, p < .001, r = .40). performance under conditions in which hue/texture cues were removed from hand stimuli were considered next. overall, tests revealed that sensitivity rates varied neither by participant race (f 1,144 = 1.61, p = .207, r = .11) nor hand stimulus race (f 1,144 = 0.10, p = .748, r = .03). nonetheless a significant interaction between those factors was found (f 1,144 = 6.92, p = .009, r = .21): that is, an ora was surprisingly in evidence in the ambiguous, silhouette condition (see figure 2, right panel). within participants of own-race stimuli, caucasian females were more sensitive sex discriminators than were caucasian males (f 1,144 = 5.79, p = .017, r = .20). by contrast, participant sex did not overall mediate asian ownrace participant performance (f 1,144 = 0.11, p = .741, r = .03). with respect to other-race participants, female sex discrimination advantage was found for both caucasians (f 1,144 = 8.46, p = .004, r = .24) and asians (f 1,144 = 4.94, p = .028, r = .18). within-group outcomes. sensitivity statistics (m ± se) for caucasian participants judging both silhouette and colour hands appear in figure 3 (a) and (b). referring to the colour conditions (a), sensitivity decreased as a function of ptp (25%; 50%; 75%) when female hands were defined as the target (1.31 ± 0.11; 1.20 ± 0.11; 1.10 ± 0.12) but not when male hands were targets (1.13 ± 0.12; 1.21 ± 0.11; 1.15 ± 0.10). in the silhouette conditions (b), the trend between ptp and sensitivity was positive when female hands were targeted (0.53 ± 0.11; 0.56 ± 0.10; 0.83 ± 0.16), and negative when participants targeted male hands (0.82 ± 0.16; 0.67 ± 0.10; 0.51 ± 0.08). average d’ values corresponding to asian participants are depicted in figure 3 (c) and (d). in the colour conditions (c), performance was lower when the target-to-lure ratio was equal (50% female targets: 1.02 ± 0.08; 50% male targets: 0.93 ± 0.08), than when it tipped in favour of lures (25% female targets: 1.30 ± 0.07; 25% male targets: 1.13 ± 0.09), or targets (75% female targets: 1.57 ± 0.18; 75% male targets: 1.14 ± 0.13). with hue/texture information not present (d), a similar trend arose when participants were asked to target females: they discriminated sex with higher sensitivity when ptp was 25% (0.99 ± 0.14) or 75% (0.51 ± 0.13) than when it was 50% (0.45 ± 0.09). nevertheless, when asked to target silhouette males, group sensitivity seemed relatively stable across ptp conditions (25%: 0.65 ± 0.11; 50%: 0.66 ± 0.06; 75%: 0.57 ± 0.11). planned orthogonal contrasts revealed, first of all, that in the caucasian group, the instruction to target either females or males had no systematic impact on performance across either the colour (f 1,19 = 0.29, p = .598, r = .12) or silhouette conditions (f 1,19 = 0.10, p = .753, r = .07). once target sex was collapsed, and with hue/texture cues preserved, mean sensitivity was found to be uniform across ptp conditions (linear trend: f 1,19 = 1.57, p = .225, r = .28; quadratic: f 1,19 = 0.28, p = .602, r = .12). when hue/texture was removed from the hands, mean sensitivity did not differ as a linear (f 1,19 < 0.01, p = .975, r = .01) nor quadratic (f 1,19 = 0.65, p = .431, r = .18) function of ptp. turning now to the asian participant group, contrasts revealed that when colour/hue cues were visible, performance unexpectedly diverged by target sex: participants targeting female-present trials did so with greater sensitivity than when they were asked to target male hands (f 1,19 = 5.79, p = .027, r = .48). unplanned f tests indicated that this difference was likely significant when ptp was 75% (f 1,19 = 5.66, p = .028, r = .23) and not 25% (p = .250) or 50% (p = .450), though not at the alpha level corrected for multiple comparisons (α = .017). across colour trials, sex discrimination was just as proficient when target trials were sparse (25%) or frequent (75%), meaning that no significant linear trend was found (f 1,19 = 1.59, p = .223 r = .28). however, group performance did change as a quadratic function of ptp (f 1,19 = 15.09, p = .001 r = .67); performance was worse when targets and lures were equally probable relative to deviant (25% or 75%). 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 5 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands figure 3. within-group measures of sex judgement sensitivity for 1000 ms presentations of own-race hands. sensitivity scores (d’) corresponding to caucasian (top panels) and asian (bottom panels) participants are averaged over participant sex, and plotted as a function of target sex (crosses: female; squares: male), prior target probability (ptp: 25%, 50%, 75%), and whether hands were presented with (left panels) or without (right panels) hue and texture information. broken lines represent significant polynomial trends fitted to the ptp marginal means (quadratic: panel c; linear: panel d). vertical bars represent ±1 se. figure 4. group measures of sex judgement sensitivity for 125 ms colour (left panel) and silhouette (right panel) hand presentations. sensitivity scores (d’) are grouped by participant race, participant sex, and stimulus familiarity (open circles: own-race hands; filled circles: other-race hands). vertical bars represent ±1 se. 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 6 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands finally, when hue/texture cues were not available for observation, sensitivity collapsed by target sex (f 1,19 = 0.06, p = .809 r = .06). across those silhouette trials, participants discriminated sex more sensitively as ptp increased linearly (f 1,19 = 4.48, p = .048 r = .44). finally, sensitivity in the silhouette condition did not differ as a quadratic function of ptp (f 1,19 = 2.30, p = .146 r = .33). interim discussion. the significant effects of ora and ptp are summarised here, saving discussion of mixed or unplanned effects and trends for the discussion. to summarise, sex judgements from hands each presented for 1000 ms is subject to ora; the more experienced own-race participants were more sensitive to the differences between target and distractor sex of hands. this effect was not specific to one or the other race of participant – asians and caucasians exhibited ora and did so independent of stimulus ambiguity. considering just the own-race data, one surprising effect was that asian but not caucasian sensitivity tracked target-to-lure stimulus ratio via quadratic and linear trends under certain conditions, which are depicted in figure 3 (c & d; dotted lines). when the probability of fe/male stimuli deviated from the norm, asian participants used the signal to their advantage (e.g. quadratic trend in figure 3 [c]). in particular it can be seen that asian participants had heightened sensitivity when asked to discriminate common (ptp: 75%) female targets (from male lures) relative to male targets (from female lures). what these ptp effects seem to indicate is a dynamic learning difference across cultures – a notion entertained further on in the discussion. to summarise experiment 1 outcomes, ora appears to be a true perceptual phenomenon, given that the race of hands was manipulated across groups who were not made aware of the variable (cf. o’toole et al., 1996), and considering that the predicted outcome was produced even when sex signals were weak (as in the ‘silhouette’ condition). furthermore, this is the first time that the sex judgement ora has shown to be pan-stimulus in nature – it arose here without the assistance of familiar facial features or their associated stereotypes, and so appears to genuinely be a result of dynamic sex processing mechanisms. experiment 2 in experiment 2, the parameters of the sensitivity effects noted above are probed further. specifically, the stimulus inspection time is here limited to an eighth (i.e. 125 ms) of that used in experiment 1, to investigate the extent to which the sex categorisation ora is dependent on processing time. if ora is weaker at 125 ms, it would suggest that the advantage incurs a time-expense associated with comparing current sensory data with a stored norm of sex signals (valentine & endo, 1992). between-group outcomes. the sensitivity statistics obtained from 125 ms hand participants are shown in figure 4. when hands were presented with hue/texture intact (left panel), caucasian females (own-race: 0.75 ± 0.08; other-race: 0.64 ± 0.07) discriminated sex with heightened sensitivity group scores compared with caucasian males (own-race: 0.60 ± 0.10; other-race: 0.55 ± 0.15). asian females (own-race: 0.77 ± 0.09; other-race: 0.77 ± 0.17) similarly had higher sensitivity rates than did asian males (own-race: 0.47 ± 0.06; other-race: 0.39 ± 0.13). when hue/texture cues were eliminated from the hands (right panel), the same trend emerged: caucasian female participants (own-race: 0.60 ± 0.17; other-race: 0.48 ± 0.07) had higher group sensitivity rates than did caucasian males (own-race: 0.25 ± 0.12; other-race: 0.37 ± 0.06); likewise asian females (own-race: 0.50 ± 0.14; other-race: 0.65 ± 0.13) on average performed better than asian males (own-race: 0.23 ± 0.08; other-race: 0.46 ± 0.16). overall, the standard range of sensitivity means is narrow (d’ max d’ min = 0.54) compared to the range across 1000 ms groups (1.35; see experiment 1: between-group outcomes). tests applied to decisions made in the colour conditions revealed, for the most part, null effects. race of participant (f 1,144 = 0.19, p = .660, r = .04) and hand familiarity (f 1,144 = 0.04, p = .840, r = .02) did not systematically impact overall group performance, and the non-significant interaction between those factors (f 1,144 = 0.54, p = .462, r = .06) provides evidence against the existence of an ora at short durations (i.e. 125 ms). of the own-race groups, for both caucasian (f 1,144 = 0.97, p = .327, r = .08) and asian (f 1,144 = 3.79, p = .054, r = .16) participants, mean sensitivity did not diverge by participant sex. referring to other-race stimulus groups, caucasian participant sensitivity was not on average different between females and males (f 1,144 = 0.27, p = .606, r = .04). however, a difference was found among other-race, asian participants (f 1,144 = 6.24, p = .014, r = .20): within that group, females judged sex with higher sensitivity than did males. contrasts of performance under silhouette conditions also resulted in a lack of systematic differences. sensitivity varied neither by participant race (f 1,144 = 0.15, p = .704, r = .03) nor by hand stimulus race (f 1,144 = 1.16, p = .284, r = .09), and no interaction between those factors was found (f 1,144 = 1.14, p = .288, r = .09). considering performance relating to familiar (own-race) hands, caucasian female participants were slightly advantaged compared to caucasian males, though not significantly so (f 1,144 = 3.85, p = .052, r = .16). similarly, asian own-race performance did not diverge by participant sex (f 1,144 = 2.39, p = .124, r = .13). finally, female advantage was not detected in either caucasian (f 1,144 = 0.29, p = .594, r = .04) or asian (f 1,144 = 1.17, p = .281, r = .09) participants of unfamiliar (other-race) hands. as mentioned, the difference between the largest and smallest 125 ms sensitivity mean spans about half a standard deviation (0.54), thus the range in which a true effect can be detected is small. within-group outcomes. the d’ statistics for caucasian participants of stimuli each presented for 125 ms are represented in figure 5 (a) and (b). when hands were judged in colour (a), different trends emerged depending on target sex. when participants were asked to target female hands, sensitivity peaked in the condition of equal target versus lure trials (50%; 0.80 ± 0.08), and dropped when fewer (25%) or more (75%) female targets were present (respectively: 0.69 ± 0.09; 0.58 ± 0.10). the opposite trend was in evidence when male hands were targets: participants performed worse given a balanced ptp (50%; 0.60 ± 0.09) than when given a diminished (25%) or augmented (75%) one (respectively: 0.73 ± 0.12; 0.77 ± 0.10). when hue/texture cues were omitted from the hands (b), there was a slight, positive trend between ptp (25%; 50%; 75%) and sensitivity for deciding hands were female (0.48 ± 0.11; 0.54 ± 0.15; 0.55 ± 0.08). a similarly weak yet opposite trend emerged when male hands were being targeted (0.55 ± 0.13; 0.42 ± 0.12; 0.41 ± 0.13). the mean d’ values for asian participants of briefly presented (125 ms) stimuli are represented in figure 5 (c) and (d). when hue/texture was preserved (c), judgement sen10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 7 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands sitivity was lowest when targets and lures were presented in equal number (female targets: 0.62 ± 0.06; male targets: 0.63 ± 0.09), and improved as ptp either decreased (female targets: 0.81 ± 0.15; male targets: 0.72 ± 0.13), or increased (female targets: 0.84 ± 0.15; male targets: 0.87 ± 0.15). in the absence of hue/texture (d), sensitivity diminished as ptp grew (25%; 50%; 75%) when target sex was male (0.55 ± 0.10; 0.43 ± 0.14; 0.30 ± 0.14) but not female (0.30 ± 0.14; 0.30 ± 0.08; 0.70 ± 0.19). a set of orthogonal contrasts tested the within-group predictions described above, first for caucasian then asian participants. first, as expected, caucasian sensitivity rates did not differ as a function of target sex. this was the case both when hue and texture was present (f 1,19 = 0.02, p = .892, r = .03) and when absent (f 1,19 = 0.90, p = .356, r = .21). in the colour condition, caucasian participants’ sensitivity did not overall differ as a linear function of ptp (f 1,19 = 0.15, p = .707, r = .09); and group performance did not change in a quadratic direction either (f 1,19 = 0.01, p = .944, r = .02). finally, in the silhouette condition, group sensitivity did not differentiate across ptp blocks (linear: f 1,19 = 0.15, p = .705, r = .09; quadratic: f 1,19 = 0.03, p = .874, r = .04). orthogonal contrasts within the asian participant data revealed that sex discrimination performance did not diverge by target sex, regardless of whether hue and texture cues were shown (f 1,19 = 0.02, p = .879, r = .04) or not (f 1,19 = 0.01, p = .914, r = .03). when hue/texture cues were visible to asian participants, their average judgement sensitivity did not change linearly by ptp (f 1,19 = 0.43 p = .521, r = .15). however, sensitivity did change in a quadratic fashion such that judgement performance was worse in the condition with 50% targets and lures (f 1,19 = 6.21, p = .022, r = .50). finally, when hue/texture cues were absent, no linear (f 1,19 = 0.36, p = .554, r = .14) or (f 1,19 = 0.88, p = .359, r = .21) quadratic trend was detected in the sensitivity data across ptp blocks. interim discussion. experiment 2 outcomes showed that limiting hand presentations to just 125 ms rendered the ora non-significant, especially so when hue/texture properties were absent among stimuli. therefore, in conjunction with the positive result found when hands were presented for 1000 ms (experiment 1), it seems that the advantage afforded by expertise with own-race cues involves a processing time cost. this could be explained in terms of a dynamic sex cue space model. sensory evidence – in this case, a hand shape – is matched against stored fe/male norms that are tuned by ever-accumulating experience; if the evidence is too fleeting (e.g. 125 ms), it is not able to be processed as a familiar exemplar, and hence, does not lead to any behavioural advantage. similarly, the female participant advantage was in most cases nullified by the reduced presentation duration, though in every group, the female participant mean superseded that of male participants. accounts of female advantage might seem unsuited to the dynamic sex cue processor model, as it is reasonable to assume that adult participants have approximately as much experience with either female or male adult cues. however, loven and colleagues (2012) have suggested that the encoding of stimuli via experiencetuned perceptual norms is further mediated by motivation: female participants essentially enhance their social categorisation acuity by paying more attention to social (female and male) cues. finally, experiment 2 found additional evidence supporting the theory, that human sex judgement abilities are dynamic in relation to short term ptp changes. again, the qualifying factor is, mysteriously, participant race: asians but not caucasians showed higher sensitivity when target hands were uncommon (25% ptp) or common (75% ptp), relative to equiprobable (50% ptp). this u-shape shift in decision-making was weaker in experiment 2, because of the quick stimulus exposure time (125 ms); group sensitivity traced a u-shape only when asian participants were assisted by the presence of texture and colour signals. discussion the broad aim of this study was to explore the extent to which the ability to judge sex is shaped by relative, changing experience with certain signals. the specific objective was to determine whether sex judgements from hands – like faces – are influenced by racial familiarity (long term experience) as well as ptp (short term experience). participants were asked to report whether or not each presentation of a hand depicted a target sex, with the primary prediction that an ora would be observed. that prediction was mostly supported: given sufficient viewing time (1000 ms), caucasian and asian participants were more sensitive targeting sex from own-race hands than they were performing the same task with respect to hands of the other race. furthermore, it was predicted that the variable probability of target sex – which here represents change to real-time experience and not a priori knowledge – would not alter sex discriminability rates within groups. here, some unexpected trends were detected. intriguingly, those were race-specific: sensitivity changed for asians but not caucasians as a function of ptp manipulations. before those two key outcomes are discussed, two periphery findings should at least be mentioned. firstly, an overall female judge advantage was found – the trend was apparent in almost every condition and for caucasians and asians alike, and in many cases the trend was significant. gaetano et al. (2014) had speculated that sex judgements would differ between female and male participants, but lacked statistical power to definitively test that possibility. previous studies have reported systematic differences between female and male cortical structure (wang, shen, tang, zang, & hu, 2012) and functions (canli, desmond, zhao, & gabrieli, 2002). in terms of perceptual dimorphism, female and male participants have been found to inspect different areas of the face when categorising sex (armann & bülthoff, 2009), and females appear to have superior memory for faces, especially if those are female (herlitz & lovén, 2013; lewin & herlitz, 2002; rehnman & herlitz, 2007). the superior perceptual performance of females in the present study is the first using hands as stimuli (cf. schouten, troje, brooks, van der zwan, & verfaillie, 2010), so it would be of theoretical interest to study which region of the hands females and males are focussing on. secondly, unlike the participant-mediated effects of ora and female advantage, participant age did not seem to affect sensitivity measures (see appendix). however, this could be an artefact of each median split reducing the power of analyses. based on the lack of support from the sex judgement literature and the non-definitive findings here, a systematic role for age in these effects seems unlikely. that said, a future study could enlist separate 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 8 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands figure 5. sex judgement sensitivity measures corresponding to 125 ms presentations of own-race hands. caucasian (top panels) and asian (bottom panels) participants’ sensitivity scores (d’) are averaged over participant sex, and plotted as a function of prior target probability (ptp: 25%, 50%, 75%), target sex (crosses: female; squares: male), and whether hands were presented with (left panels) or without (right panels) hue and texture information. the broken line (panel c) represents a significant quadratic trend fitted to the ptp marginal means. vertical bars represent ±1 se. participant age groups to systematically explore the relationships. the own-race advantage in sex classification whilst the ora has been demonstrated under a range of different conditions, the present data represent the first demonstration of an ora with respect to judging the sex of human hands. indeed, the effect was detected in response to cues presented for 1000 ms, but no advantage was apparent given a much shorter processing time (125 ms). by contrast, o’toole’s (1996) face-based study evoked the effect with an exposure time of just 75ms. there though, participants were primed with the information that the aim of the task was to measure accuracy in response to ownversus other-race faces. so, whilst participants in that study were aware racial congruency was being manipulated across blocks of trials, participants in the present study viewed either own-race or other-race stimuli, and were not informed that stimulus race was a variable. the difference in participant expectation between these studies may explain why ora occurred at a brief presentation duration in the previous (o’toole et al., 1996) but not the present set of observations. a further explanation is that participants have more expertise viewing faces relative to hands per se, and so raceselectivity in sex judgement is nullified given a brief exposure time of the latter. certainly, this idea is supported by perceptual data: caucasian and asian face participants in o’toole’s (1996) study achieved sex classification sensitivity rates of d’ > 2.00, whereas hand participants afforded almost double the exposure time (125 ms as opposed to 75ms) averaged only d’ = 0.62. the sex classification ora may have an upper bound as well. in one study, chinese students were afforded unrestricted time to categorise each chinese and caucasian face by sex (zhao & hayward, 2010). on average, overall sex discriminability was markedly high for intact faces (d’ ≥ 3.00), yet participants did not exhibit an advantage for the own-race subset (zhao & hayward, 2010). nevertheless, under certain degraded signal conditions, the match between participant and face race did benefit sex categorisation (zhao & hayward, 2010; cf. hayward, rhodes, & schwaninger, 2008). in sum, despite the methodological differences between the current study, o’toole et al.’s (1996) study, and zhao and hayward’s (2010), together they support the notion that deciding 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 9 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands someone is female or male is a matter of accumulating experience. surprisingly, the ora does not explain the judgements of one subgroup in the current study: asian females. to the author’s knowledge, studies of ora do not typically compare measures across participant sexes or races. one study has investigated caucasian females’ proneness to ora when judging faces, but does not comment on whether findings would generalise to asian females (wallis, lipp, & vanman, 2012). thus, explanations of the current finding are speculative without further evidence, that asian males but not asian females possess an ora for sex cues. if this finding cannot be replicated, it could reflect an enculturated strategy specific to hong kong (i.e. where the current asian participants were recruited). for example, there may be more selection pressure for hong kongese males to identify in-group versus out-group membership, as males are the minority in the hong kongese population (cia, 2017). on the basis of these findings, it is plausible that there exist mechanisms which process sensory input from face(e.g. ffa; kanwisher, mcdermott, & chun, 1997) or hand-selective (e.g. left lateral occipitotemporal cortex; bracci, ietswaart, peelen, & cavina-pratesi, 2010) regions via a dynamic, sex signal space. such a space has already been modelled for face perceptions (campanella, chrysochoos, & bruyer, 2001; johnston, kanazawa, kato, & oda, 1997; valentine & endo, 1992). according to the normbased model of face recognition (e.g. valentine & endo, 1992), faces are encoded in a hypothetical space as points located around a population norm – those points are more densely clustered surrounding other-race prototypes than own-race prototypes, facilitating judgements about ownrace exemplars on various dimensions (e.g. sex, age) in the space. the sex judgement ora from face and non-face, male and female signals suggests there could be a panstimulus sex processor. if so, such a framework could be used to test predictions about how sex processing functions as part of a wider person judgement matrix (e.g. freeman & ambady, 2011). indeed, the ora as described here and previously in o’toole et al.’s (1996) work is a theoretical element of the wider, experience-dependent nature of how humans judge those around them. for instance, emerging research has shown that differential experience with racial groups can affect neural correlates of perceiving pain in other persons (contreras-huerta, baker, reynolds, batalha, & cunnington, 2013; contreras-huerta, hielscher, sherwell, rens, & cunnington, 2014), such that the neural bias associated with other-race faces is reduced as the level of everyday contact is increased (cao et al., 2015). in contrast to this support for the contact hypothesis, other-race effects can dissipate if participants are told that they share intrinsic characteristics with other-race individuals (zhou, pu, young, & tse, 2015). more broadly, participants seem better able to process biological stimuli that are more familiar to them not just by race (meissner & brigham, 2001) but also age (rhodes & anastasi, 2011) and species (dahl, chen, & rasch, 2014; sigala, logothetis, & rainer, 2011). in summary, these effects demonstrate that dynamic, panstimulus models of person perception – and sex perception in particular – are high in explanatory power for incorporating past judgements as a factor. ptp effects within groups present evidence suggests that ptp did mediate sensitivity in some unexpected ways. firstly, when ptp differed from the 50% level expected in binary decision tasks, sensitivity also changed for asians – that is, it increased if target trials were fewer (25%) or many (75%) – but only when the hues and textures of hands were visible. caucasians on the other hand showed no such sensitivity shift in response to colour hands. higher ptp equates to more trials in which the participant can make a ‘hit’, and less trials in which a ‘false alarm’ can be made. yet paradoxically, when cues had hue/texture preserved, asians did better in both the 25% and 75% conditions relative to the 50% condition, revealing a u-shape sensitivity pattern. that quadratic trend was significant irrespective of viewing time (125 ms or 1000 ms), but was stronger when participants were allowed a complete 1000 ms per stimulus view. in sum, this result provides tentative support for the novel suggestion that asians adopt a different strategy when performing the task: compared to caucasians, they discriminated sex ‘online’ or adaptively, by matching the dynamic proportion of targets to lures – and despite no explicit instruction that the proportion was shifting across experiment blocks. it is uncertain why asian participants might have behaved in this manner. speculating on the causes, cultural variation in problem solving strategies, or even the significantly unbalanced sex ratio in the population of hong kong may play a role. on the latter, the hong kongese population consists of only 87 males for every 100 females (cia, 2017). this female population bias has increased over time and is projected to continue increasing. in australia, the ratio of 101 males per 100 females is statistically balanced, and matches closely the global statistic (i.e. 102:100; cia, 2017). in the current study, hong kongese participants may have learned to judge sex with greater care when the ptp was unbalanced (25% or 75%), because a balanced ptp (50%) does not agree with everyday experience of the true sex ratio in hong kong. the observed effects of ptp on asian hand judges may also be explained by differing enculturated attentional strategies between participant races. for instance, convergent evidence from eye-tracking studies indicate that asians tend to scan facial images in a more holistic manner than do caucasians (e.g. blais, jack, scheepers, fiset, & caldara, 2008; brielmann et al., 2014). it is untested though indeed possible that race-based tracking differences exist for hand stimuli as well, and if so, could explain the asian ptp effect in the present study. it is also possible that in general, asians are more likely than caucasians to distinguish people by sex holistically. asians may have an advantage exploiting signals from hands and other areas as well as the face, and if so, then they may show heightened sensitivity to a dynamic ptp. although such hypotheses are beyond the scope of the present study, they are at least consistent with the flattening of sensitivity patterns observed when colour and texture cues were removed. a parallel study has been conducted to investigate whether this quadratic trend has any association with sex classification bias (gaetano et al., 2016). this is a question of legitimate theoretical concern, because the chosen index of sensitivity (d’) works on the premise that target and lure distributions are normal-shaped and have equal variances; violations of either condition will permit d’ to vary with response bias (c; stanislaw & todorov, 1999). paradoxically, such violations are more likely to occur when sex 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 10 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands signals are difficult to discern, which in turn is also when male bias is more likely to arise (e.g. gaetano et al., 2014). nevertheless, present data reveals a completely unexpected effect of ptp. in contrast to bias outcomes, which were found to be static relative to ptp changes (gaetano et al., 2016), sensitivity outcomes in the present study changed non-linearly as a function of participant race. breaking the asian-specific ptp phenomena down further, performance was compared across the ‘uncommon’ and ‘common’ conditions, ignoring those conditions in which the target-to-lure ratio was equal. when presentations contained hue/texture, sensitivity was generally no different between 25% and 75% ptp blocks. when presented with silhouette hands, the 1000 ms asian participant group discriminated sex with greater acuity when ptp was high compared to low. that said, it is difficult to determine whether or not this is a true effect. for instance, the associated significance value of .048 is close to the threshold of .050; the effect can explain only 19% of that particular group’s sensitivity variance, which is small in comparison to the 44% explained by the same group’s u-shaped effect mentioned above. certainly, this positive linear effect was not demonstrated within any of the other groups or ambiguity conditions. so in total, asian participants are better at discriminating sex from unambiguous hands when the probability of fe/male targets is deviant (i.e. 25% or 75%). finally, overall group performances did not vary by target sex, with just one exception: asian participants of stimuli presented for 1000 ms were on average more sensitive targeting females than targeting males when hue/texture cues were preserved. the relatively femalesaturated population of hong kong that these participants were exposed to could explain this result. nevertheless, the effect seemed to manifest only when there were fewer targets (25%) per block, and only if a liberal significance value was chosen (see figure 3 (c)). in sum then, as expected, sensitivity rates are uniform irrespective of whether the participant is looking for females or males. on the contrary, it has been demonstrated consistently that target sex does affect response criteria (gaetano et al., 2014; gaetano et al., 2016). the bulk of the evidence in the present study suggest that any such changes in decision bias occur independent of decision sensitivity. conclusion in summary, consistent with a general theory of sex judgement (freeman & ambady, 2011), the present data provide empirical support for the notions that sex categorisations: (i) partially dependent on the participant’s long term perceptual expertise with certain groups of dimorphic cue, and (ii) may fluctuate as a function of short term probabilistic changes across cues, at least for certain groups of participants. with respect to (i), it has been shown that ora is a pan-stimulus phenomenon that affects not just face judgements, but more generally sex judgements. of particular note, this phenomenon does not require participants to be aware of stimulus race manipulations. regarding (ii), the present study has revealed some interesting patterns of ptp-mediated sex judgement, which apparently arise for asians but not caucasians. specifically, when sex cues are relatively intact, asians adaptively change their decisionmaking acuity in a curvilinear fashion as ptp increases; when cues are degraded, they decrease their sensitivity linearly as ptp increases. caucasians, despite being afforded the same variable likelihoods of making a correct decision, overall did not change their sensitivity. these findings extend on the notion of a sex processing model analogous to the face-space model: human sex judgement depends not only on how different the female and male signals are in this space, but also on the participant’s dynamic experience with signals in the long and short term. acknowledgements: the author dedicates this work to the memory of dr kevin minotti. the author thanks former supervisors anna brooks and rick van der zwan for their foundational supervisory support and guidance. thanks also to william g. hayward and matthew oxner for their assistance with data collection. finally, thank you to ross cunnington, for sharing his insights on ora at the 12th international conference on cognitive neuroscience, brisbane, australia. declaration of conflicting interests: the author declares that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: gaetano, j. (2019). evidence for the dynamic human ability to judge another’s sex from ambiguous or unfamiliar signals. journal of dynamic decision making, 5, 3. doi:10.11588/jddm.2019.1.61118 received: 04 apr 2019 accepted: 28 jun 2019 published: 09 jul 2019 references armann, r., & bülthoff, i. (2009). gaze behavior in face comparison: the roles of sex, task, and symmetry. attention, perception and psychophysics, 71(5), 1107–1126. doi:10.3758/app.71.5.1107 bailey, a. h., lafrance, m., & dovidio, j. f. (2018). is man the measure of all things? a social cognitive account of androcentrism. personality & social psychology review. doi:10.1177/1088868318782848 bayet, l., pascalis, o., quinn, p. c., lee, k., gentaz, e., & tanaka, j. w. (2015). angry facial expressions bias gender categorization in children and adults: behavioral and computational evidence. frontiers in psychology, 6. doi:10.3389/fpsyg.2015.00346 behrman, b. w., & davey, s. l. (2001). eyewitness identification in actual criminal cases: an archival analysis. law and human behavior, 25(5), 475–491. doi:10.1023/a:1012840831846 bird, k. d. (2004). analysis of variance via confidence intervals. london: sage publishing. doi:10.4135/9781849208598 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 11 https://doi.org/10.11588/jddm.2019.1.61118 https://doi.org/10.3758/app.71.5.1107 https://doi.org/10.1177/1088868318782848 https://doi.org/10.3389/fpsyg.2015.00346 https://doi.org/10.1023/a:1012840831846 https://doi.org/10.4135/9781849208598 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands blais, c., jack, r. e., scheepers, c., fiset, d., & caldara, r. (2008). culture shapes how we look at faces. plos one, 3(8), e3022. doi:10.1371/journal.pone.0003022 bracci, s., ietswaart, m., peelen, m. v., & cavina-pratesi, c. (2010). dissociable neural responses to hands and non-hand body parts in human left extrastriate visual cortex. journal of neurophysiology, 103(6), 3389–3397. doi:10.1152/jn.00215.2010 brielmann, a. a., bülthoff, i., & armann, r. (2014). looking at faces from different angles: europeans fixate different features in asian and caucasian faces. vision research, 100, 105–112. doi:10.1016/j.visres.2014.04.011 brielmann, a. a., gaetano, j. m., & stolarova, m. (2015). man, you might look like a woman if a child is next to you. advances in cognitive psychology, 11(3), 84–96. doi:10.5709/acp-0174-y campanella, s., chrysochoos, a., & bruyer, r. (2001). categorical perception of facial gender information: behavioural evidence and the face-space metaphor. visual cognition, 8(2), 237–262. doi:10.1080/13506280042000072 canli, t., desmond, j. e., zhao, z., & gabrieli, j. d. e. (2002). sex differences in the neural basis of emotional memories. proceedings of the national academy of sciences of the united states of america, 99(16), 10789–10794. doi:10.1073/pnas.162356599 cao, y., contreras-huerta, l. s., mcfadyen, j., & cunnington, r. (2015). racial bias in neural response to others’ pain is reduced with other-race contact. cortex, 70, 68–78. doi:10.1016/j.cortex.2015.02.010 central intelligence agency [cia]. (2017). the world fact book: sex ratio. retrieved from: https://www.cia.gov/library/ publications/the-world-factbook/geos/xx.html contreras-huerta, l. s., baker, k. s., reynolds, k. j., batalha, l., & cunnington, r. (2013). racial bias in neural empathic responses to pain. plos one, 8(12). doi:10.1371/journal.pone.0084001 contreras-huerta, l. s., hielscher, e., sherwell, c. s., rens, n., & cunnington, r. (2014). intergroup relationships do not reduce racial bias in empathic neural responses to pain. neuropsychologia, 64, 263–270. doi:10.1016/j.neuropsychologia.2014.09.045 cook, s. w., & tanenhaus, m. k. (2009). embodied communication: speakers’ gestures affect listeners’ actions. cognition, 113, 98–104. doi:10.1016/j.cognition.2009.06.006 dahl, c. d., chen, c.-c., & rasch, m. j. (2014). own-race and own-species advantages in face perception: a computational view. scientific reports, 4. doi:10.1038/srep06654 davidenko, n. (2007). silhouetted face profiles: a new methodology for face perception research. journal of vision, 7(4), 1–17. doi:10.1167/7.4.6 freeman, j. b., & ambady, n. (2011). a dynamic interactive theory of person construal. psychological review, 118(2), 247– 279. doi:10.1037/a0022327 gaetano, j. m. (2017). signal detection theory calculator (1.2) [microsoft excel workbook]. retrieved from https://www.researchgate.net/publication/ 316642315_signal_detection_theory_calculator_12 doi:10.13140/rg.2.2.26215.85926 gaetano, j. m., van der zwan, r., & brooks, a. r. (2012). perceiving other people on the basis of categorical multisensory data: towards a unified theory of person perception. in r. van der zwan (ed.), current trends in experimental and applied psychology (vol. 1, pp. 105–115). brisbane, qld: primrose hall. gaetano, j. m., van der zwan, r., oxner, m., hayward, w. g., doring, n., blair, d., & brooks, a. r. (2016). converging evidence of ubiquitous male bias in human sex perception. plos one, 11(2), e0148623. doi:10.1371/journal.pone.0148623 gaetano, j. m., van der zwan, r., blair, d., & brooks, a. r. (2014). hands as sex cues: sensitivity measures, male bias measures, and implications for sex perception mechanisms. plos one, 9(3), e91032. doi:10.1371/journal.pone.0091032 gauthier, i., tarr, m. j., anderson, a. w., skudlarski, p., & gore, j. c. (1999). activation of the middle fusiform ’face area’ increases with expertise in recognizing novel objects. nature neuroscience, 2(6), 568–573. doi:10.1038/9224 gittings, n. s., & fozard, j. l. (1986). age related changes in visual acuity. experimental gerontology, 21(4–5), 423–433. doi:10.1016/0531-5565(86)90047-1 goldin-meadow, s., wein, d., & chang, c. (1992). assessing knowledge through gesture: using children’s hands to read their minds. cognition and instruction, 9(3), 201–219. doi:10.1207/s1532690xci0903_2 gonzalez, r. (2009). orthogonal, planned and unplanned comparisons. in data analysis for experimental design (pp. 211–241). new york, ny: guilford press. hacker, g., brooks, a., & van der zwan, r. (2013). sex discriminations made on the basis of ambiguous visual cues can be affected by the presence of an olfactory cue. bmc psychology, 1. doi:10.1186/2050-7283-1-10 haegerstrom-portnoy, g., scheck, m. e., & brabyn, j. a. (1999). seeing into old age: vision function beyond acuity. optometry & vision science, 76(3), 141–158. doi:10.1097/00006324199903000-00014 hayward, w. g., rhodes, g., & schwaninger, a. (2008). an own-race advantage for components as well as configurations in face recognition. cognition, 106(2), 1017–1027. doi:10.1016/j.cognition.2007.04.002 herlitz, a., & lovén, j. (2013). sex differences and the own-gender bias in face recognition: a meta-analytic review. visual cognition, 21(9–10), 1306–1336. doi:10.1080/13506285.2013.823140 hess, u., adams, r. b., jr., grammer, k., & kleck, r. e. (2009). face gender and emotion expression: are angry women more like men? journal of vision, 9(12), 1–8. doi:10.1167/9.12.19 jaquet, e., rhodes, g., & hayward, w. g. (2007). opposite aftereffects for chinese and caucasian faces are selective for social category information and not just physical face differences. quarterly journal of experimental psychology, 60(11), 1457–1467. doi:10.1080/17470210701467870 jaquet, e., rhodes, g., & hayward, w. g. (2008). racecontingent aftereffects suggest distinct perceptual norms for different race faces. visual cognition, 16(6), 734–753. doi:10.1080/13506280701350647 johnson, k. l., freeman, j. b., & pauker, k. (2012). race is gendered: how covarying phenotypes and stereotypes bias sex categorization. journal of personality and social psychology, 102, 116–131. doi:10.1037/a0025335 johnson, k. l., mckay, l. s., & pollick, f. e. (2011). he throws like a girl (but only when he’s sad): emotion affects sex-decoding of biological motion displays. cognition, 119(2), 265–280. doi:10.1016/j.cognition.2011.01.016 johnston, r. a., kanazawa, m., kato, t., & oda, m. (1997). exploring the structure of multidimensional face-space: the effects of age and gender. visual cognition, 4(1), 39–57. doi:10.1080/713756750 junger, j., pauly, k., bröhr, s., birkholz, p., neuschaefer-rube, c., kohler, c., . . . habel, u. (2013). sex matters: neural correlates of voice gender perception. neuroimage, 79, 275–287. doi:10.1016/j.neuroimage.2013.04.105 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 12 https://doi.org/10.1371/journal.pone.0003022 https://doi.org/10.1152/jn.00215.2010 https://doi.org/10.1016/j.visres.2014.04.011 https://doi.org/10.5709/acp-0174-y https://doi.org/10.1080/13506280042000072 https://doi.org/10.1073/pnas.162356599 https://doi.org/10.1016/j.cortex.2015.02.010 https://www.cia.gov/library/publications/the-world-factbook/geos/xx.html https://www.cia.gov/library/publications/the-world-factbook/geos/xx.html https://doi.org/10.1371/journal.pone.0084001 https://doi.org/10.1016/j.neuropsychologia.2014.09.045 https://doi.org/10.1016/j.cognition.2009.06.006 https://doi.org/10.1038/srep06654 https://doi.org/10.1167/7.4.6 https://doi.org/10.1037/a0022327 https://www.researchgate.net/publication/316642315_signal_detection_theory_calculator_12 https://www.researchgate.net/publication/316642315_signal_detection_theory_calculator_12 https://doi.org/10.13140/rg.2.2.26215.85926 https://doi.org/10.1371/journal.pone.0148623 https://doi.org/10.1371/journal.pone.0091032 https://doi.org/10.1038/9224 https://doi.org/10.1016/0531-5565(86)90047-1 https://doi.org/10.1207/s1532690xci0903_2 https://doi.org/10.1186/2050-7283-1-10 https://doi.org/10.1097/00006324-199903000-00014 https://doi.org/10.1097/00006324-199903000-00014 https://doi.org/10.1016/j.cognition.2007.04.002 https://doi.org/10.1080/13506285.2013.823140 https://doi.org/10.1167/9.12.19 https://doi.org/10.1080/17470210701467870 https://doi.org/10.1080/13506280701350647 https://doi.org/10.1037/a0025335 https://doi.org/10.1016/j.cognition.2011.01.016 https://doi.org/10.1080/713756750 https://doi.org/10.1016/j.neuroimage.2013.04.105 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands kanwisher, n., mcdermott, j., & chun, m. m. (1997). the fusiform face area: a module in human extrastriate cortex specialized for face perception. journal of neuroscience, 17(11), 4302–4311. doi:10.1523/jneurosci.17-11-04302.1997 kovács, g., gulyás, b., savic, i., perrett, d. i., cornwell, r. e., little, a. c., . . . vidnyánszky, z. (2004). smelling human sex hormone-like compounds affects face gender judgment of men. neuroreport, 15(8), 1275–1277. doi:10.1097/01.wnr.0000130234.51411.0e kozlowski, l. t., & cutting, j. e. (1977). recognising the sex of a walker from a dynamic point-light display. perception and psychophysics, 21(6), 575–580. doi:10.3758/bf03198740 lewin, c., & herlitz, a. (2002). sex differences in face recognition—women’s faces make the difference. brain and cognition, 50(1), 121–128. doi:10.1016/s0278-2626(02)00016-7 li, x., logan, r. j., & pastore, r. e. (1991). perception of acoustic source characteristics: walking sounds. journal of the acoustical society of america, 90(6), 3036–3049. doi:10.1121/1.401778 loven, j., rehnman, j., wiens, s., lindholm, t., peira, n., & herlitz, a. (2012). who are you looking at? the influence of face gender on visual attention and memory for ownand other-race faces. memory, 20(4), 321–331. doi:10.1080/09658211.2012.658064 mcgugin, r. w., newton, a. t., gore, j. c., & gauthier, i. (2014). robust expertise effects in right ffa. neuropsychologia, 63, 135– 144. doi:10.1016/j.neuropsychologia.2014.08.029 meissner, c. a., & brigham, j. c. (2001). thirty years of investigating the own-race bias in memory for faces: a metaanalytic review. psychology, public policy, & law, 7, 3–35. doi:10.1037/1076-8971.7.1.3 o’toole, a. j., peterson, j., & deffenbacher, k. a. (1996). an ’other-race effect’ for categorizing faces by sex. perception, 25(6), 669–676. doi:10.1068/p250669 pezdek, k., blandon-gitlin, i., & moore, c. (2003). children’s face recognition memory: more evidence for the crossrace effect. journal of applied psychology, 88(4), 760–763. doi:10.1037/0021-9010.88.4.760 rehnman, j., & herlitz, a. (2007). women remember more faces than men do. acta psychologica, 124(3), 344–355. doi:10.1016/j.actpsy.2006.04.004 rhodes, m. g., & anastasi, j. s. (2011). the own-age bias in face recognition: a meta-analytic and theoretical review. psychological bulletin, 138, 146–174. doi:10.1037/a0025750 schouten, b., troje, n. f., brooks, a. r., van der zwan, r., & verfaillie, k. (2010). the facing bias in biological motion perception: effects of stimulus gender and participant sex. attention, perception, and psychophysics, 72(5), 1256–1260. doi:10.3758/app.72.5.1256 sigala, r., logothetis, n. k., & rainer, g. (2011). own-species bias in the representations of monkey and human face categories in the primate temporal lobe. journal of neurophysiology, 105(6), 2740–2752. doi:10.1152/jn.00882.2010 smith, s. m., lindsay, r. c. l., pryke, s., & dysart, j. e. (2001). postdictors of eyewitness errors – can false identifications be diagnosed in the cross-race situation? psychology public policy and law, 7, 153–169. doi:10.1037//1076-8971.7.1.153 stanislaw, h., & todorov, n. (1999). calculation of signal detection theory measures. behavior research methods, instruments, & computers, 31(1), 137–149. doi:10.3758/bf03207704 taylor, a. j. g. (2017). the role of fixations and face gender in facial expression categorization. cognition, brain, behavior: an interdisciplinary journal, 21(2), 101–115. doi:10.24193/cbb.2017.21.07 troje, n. f., sadr, j., geyer, h., & nakayama, k. (2006). adaptation aftereffects in the perception of gender from biological motion. journal of vision, 6(8), 850–857. doi:10.1167/6.8.7 tsang, t., ogren, m., peng, y., nguyen, b., johnson, k. l., johnson, s. p. (2018). infant perception of sex differences in biological motion displays. journal of experimental child psychology, 173, 338–350. doi:10.1016/j.jecp.2018.04.006 valentine, t. (1991). a unified account of the effects of distinctiveness, inversion, and race in face recognition. quarterly journal of experimental psychology section a-human experimental psychology, 43(2), 161–204. doi:10.1080/14640749108400966 valentine, t., & endo, m. (1992). towards an exemplar model of face processing: the effects of race and distinctiveness. quarterly journal of experimental psychology section a: human experimental psychology, 44(4), 671–703. doi:10.1080/14640749208401305 wallis, j., lipp, o. v., & vanman, e. j. (2012). face age and sex modulate the other-race effect in face recognition. attention, perception and psychophysics, 74(8), 1712–1721. doi:10.3758/s13414-012-0359-z wang, l., shen, h., tang, f., zang, y., & hu, d. (2012). combined structural and resting-state functional mri analysis of sexual dimorphism in the young adult human brain: an mvpa approach. neuroimage, 61(4), 931–940. doi:10.1016/j.neuroimage.2012.03.080 wells, g. l., & olson, e. a. (2001). the other-race effect in eyewitness identification what do we do about it? psychology public policy and law, 7, 230–246. doi:10.1037//1076-8971.7.1.230 wenzlaff, f., briken, p., & dekker, a. (2018). if there’s a penis, it’s most likely a man: investigating the social construction of gender using eye tracking. plos one, 13(3), e0193616. doi:10.1371/journal.pone.0193616 white, h., hock, a., jubran, r., heck, a., & bhatt, r. s. (2018). visual scanning of male and female bodies in infancy. journal of experimental child psychology, 166, 79–95. doi:10.1016/j.jecp.2017.08.004 wild, h. a., barrett, s. e., spence, m. j., o’toole, a. j., cheng, y. d., & brooke, j. (2000). recognition and sex categorization of adults’ and children’s faces: examining performance in the absence of sex-stereotyped cues. journal of experimental child psychology, 77(4), 269–291. doi:10.1006/jecp.1999.2554 winer, b. j. (1962). statistical principles in experimental design. new york, ny: mcgraw-hill. doi:10.1037/11774-000 yamaguchi, m. k., hirukawa, t., & kanazawa, s. (1995). judgment of gender through facial parts. perception, 24, 563–575. doi:10.1068/p240563 zhao, m. t., & hayward, w. (2010). holistic processing underlies gender judgments of faces. attention, perception and psychophysics, 72(3), 591–596. zhao, m. t., hayward, w. g., & bülthoff, i. (2014). holistic processing, contact, and the other-race effect in face recognition. vision research, 105, 61–69. doi:10.1016/j.visres.2014.09.006 zhou, g., pu, x., young, s. g., & tse, c.-s. (2015). effects of divided attention and social categorization on the own-race bias in face recognition. visual cognition, 22(9–10), 1296–1310. doi:10.1080/13506285.2014.998324 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 13 https://doi.org/10.1523/jneurosci.17-11-04302.1997 https://doi.org/10.1097/01.wnr.0000130234.51411.0e https://doi.org/10.3758/bf03198740 https://doi.org/10.1016/s0278-2626(02)00016-7 https://doi.org/10.1121/1.401778 https://doi.org/10.1080/09658211.2012.658064 https://doi.org/10.1016/j.neuropsychologia.2014.08.029 https://doi.org/10.1037/1076-8971.7.1.3 https://doi.org/10.1068/p250669 https://doi.org/10.1037/0021-9010.88.4.760 https://doi.org/10.1016/j.actpsy.2006.04.004 https://doi.org/10.1037/a0025750 https://doi.org/10.3758/app.72.5.1256 https://doi.org/10.1152/jn.00882.2010 https://doi.org/10.1037//1076-8971.7.1.153 https://doi.org/10.3758/bf03207704 https://doi.org/10.24193/cbb.2017.21.07 https://doi.org/10.1167/6.8.7 https://doi.org/10.1016/j.jecp.2018.04.006 https://doi.org/10.1080/14640749108400966 https://doi.org/10.1080/14640749208401305 https://doi.org/10.3758/s13414-012-0359-z https://doi.org/10.1016/j.neuroimage.2012.03.080 https://doi.org/10.1037//1076-8971.7.1.230 https://doi.org/10.1371/journal.pone.0193616 https://doi.org/10.1016/j.jecp.2017.08.004 https://doi.org/10.1006/jecp.1999.2554 https://doi.org/10.1037/11774-000 https://doi.org/10.1068/p240563 https://doi.org/10.1016/j.visres.2014.09.006 https://doi.org/10.1080/13506285.2014.998324 https://doi.org/10.11588/jddm.2019.1.61118 gaetano: dynamic, cross-race sex judgements from hands appendix ancillary analyses of participant age the aim of this supplementary study was to test whether participant age might confound the sensitivity effects of interest presented in the main text. to that end, parallel analyses in which participant age was included as covariate were run. because negative links between age and general visual acuity measures have been documented (e.g. gittings & fozard, 1986; haegerstrom-portnoy, scheck, & brabyn, 1999), participant age in the present study was analysed in a gross sense. more to the point, there is no specific reason to suspect age should systematically affect handbased sex classifications, and so the current tests were conducted in a post hoc manner. specifically, sensitivity data corresponding to colour and silhouette conditions were each subjected to a 2 (own-other race participant group) × 2 (presentation duration group) ancova. to simplify analyses, data were not partitioned further by participant race or sex; each ancova significance criterion was .050 (rather than .025), in order to maximise the overall power of detecting age confounds. the post hoc prediction was hereby tested that participant age was not driving the ora described within the main text (for a face-based analogue, cf. o’toole et al., 1996). thus, it is expected that these a posteriori analyses will yield (a) non-significant outcomes for participant age, and (b) significant main and/or interaction effects of stimulus familiarity (i.e. ownvs. other-race hands), once participant age has been factored out. methods methods are described in full in the main text. participants were 80 caucasians (47 female) and 80 asians (39 female), on average aged 32.49 (sd = 11.08) and 21.50 (sd = 2.72), respectively. the unanticipated age difference was found to be significant (f 1,157 = 72.50, p < .001, r = .56), thus effects of participant age were explored via unplanned analyses. results the outcomes of both sets of analyses supported the notion that ora as described previously is unrelated to age. with respect first to the colour conditions, it was found that participant age did not influence sensitivity rates (f 1,153 = 2.53, p = .113, r = .13), nor did its removal nullify the influence of own-other race (f 1,153 = 6.86, p = .010, r = .21) or presentation duration (f 1,153 = 25.08, p < .001, r = .38) group differences; the interaction between those factors was not, however, significant (f 1,153 = 2.75, p = .099, r = .13). turning now to the silhouette conditions, participant age did not affect group sensitivity for male or female cues (f 1,153 = 0.84, p = .361, r = .07). when the covariate was factored out, a significant interaction between own-other race and presentation duration was detected (f 1,153 = 5.58, p = .019, r = .19), though neither main effect was significant (own-other race: f 1,153 = 0.82, p = .366, r = .07; presentation duration: f 1,153 = 0.07, p = .789, r = .02). summary in sum then, these ancillary analyses at least rule out the possibility that the ora is merely an artefact of systematic age differences across groups. given that participant age had no systematic impact on predicted between-group sensitivity outcomes, the reader can be confident that this variable was not skewing outcomes in the corresponding main text. moreover, with the age differences statistically accounted for, sensitivity rates differed as a function of familiarity but was in the silhouette condition qualified by stimulus exposure time; this agrees with outcomes in the main study in which participant age was ignored. 10.11588/jddm.2019.1.61118 jddm | 2019 | volume 5 | article 3 | 14 https://doi.org/10.11588/jddm.2019.1.61118 original research the qualitative face of big data: live streaming and ecologically valid observation of decision-making alexander nicolai wendt department of psychology, heidelberg university, and department of human sciences, university of verona the technological possibilities for new data sources in media psychology, such as online live recordings, called live streaming, are growing continuously. these sources do not only offer plentiful quantitative material but also access to ecologically valid and unobtrusive observation of problem-solving and decision-making processes. however, to exploit these potentials, epistemological and methodological reflection should guide research. the availability of big data and naturally occurring data sets (nods) allows to revise the historical controversies on the eligibility of self-description. drawing on such reflections, media psychology can contribute to renovate well established research methods, such as think aloud protocols, in order to improve their empirical potentials. among the attempts to enhance these methods are phenomenology and ethnomethodology which offer a fruitful account to develop innovative data sources for self-description. yet, these attempts do not support a recurrence of selfdescription’s previous application but proposes an epistemological shift towards more subtle observations. to convey the potentials of media psychology, the risk of repeating classical mistakes, such as introspectionism, must be regarded. beyond these fallacies, however, modern digital technology holds encouraging potentials which have already partly been sighted by video gaming research. due to the similarity of digital environments to laboratory setups, there is a continuity from offline to online research, from traditional data to big data. nevertheless, a true advance into new possibilities requires understanding the qualitative meaning of such data. keywords: live streaming, phenomenology, ethnomethodology, big data, decision-making, problem-solving media psychology has continuously gained relevance andreputation during the growth of digital media within the last decades. the scope of its applicability expands in correlation with the penetration of society by multimedia communication because the technological advances are intertwined with the digitalization of life. life becomes accountable in the structures of the digital age that form the premises of contemporary and future daily routine. this technization of life provides substantial potentials for media psychology. a remarkable realization of these potentials can be noticed in the field of research on so called “big data”. as the work of kern and colleagues (2014) on the representation of personality traits in social media demonstrates, the cyclopean amount of available behavioral data can be utilized to enhance the empirical reach of psychology. moreover, the focal point of this research is not only to exploit the immense amount of data but also to establish “unobtrusive methods” as an “ecologically valid vehicle for obtaining ‘big data‘” (p. 158). media psychology accesses behavior beyond the artificial setting of laboratory investigations without forfeiting the standardization that is requisite for reliable empirical research. despite these noteworthy potentials of media psychology, its epistemological aspirations remain rather modest. primarily, it strives to catch up with the mainstream psychology and to serve its epistemological projects continuing the traditions of cognitive psychology. hence, media psychology aligns with a primarily behavioral investigation of human life, which implies relevant shortcomings regarding the experiential complexity of psychological processes. in other words, as long as media psychology does not reflect its epistemological presuppositions, it might not be able to tap the full potential that resides in the available data sources. instead, a more integrative approach (that can be methodologically realized, e.g., as mixed-methods research) should be debated. so far, the research interest of media psychology is mainly directed at comprehending the new domain of human behavior in the context of media, such as assessing the participatory structures in interactive media (e. g. hamilton, garretson, & kerne, 2014), understanding the cultural communication via memes (e. g. mazambani, carlson, reysen, & hempelmann, 2015), or relating this behavior to other fields of psychological research, like aggression in video games (e. g. hasan, bègue, scharkow, & bushman, 2013) or the research on digital footprint in personality psychology (e. g. farnadi, sitaraman, sushmita, celli, kosinski, stillwell, davalos, moens, & de cock, 2016). in other words, the potentials of media psychology have been exploited to amplify psychological research within the limits of its prevalent concepts in the first place (for a critical account on cognitive psychology see, for example, wertz, 1993). nevertheless, the discipline bears further possibilities that lie beyond the frame of cognitive psychology as the current (epistemological) paradigm (this paradigm is conceptualized for example in neisser, 2014). these possibilities regard more fundamental matters of psychology’s epistemology – the discipline’s general faculty to provide understanding; and methodology – the discipline’s specific concept to procure this understanding. media psychology does not only expand the empirical basis for inductive hypothesis formation but also allows novel insight into elementary psychological questions, such as naturalistic (in the sense of ‘ecologically valid’) observation or the viabil10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 1 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data ity of self-descriptive methods 1. whereas the progress in inductive investigations provided by media psychology, as in the case of “big data”, can be depicted as a contribution to quantitative research, its yet uncharted epistemological potentials additionally require qualitative approaches. the discussion about fundamental questions of psychological methodology needs to be supported by descriptions of experiential qualities that become available in the analysis of new media. the access to human behavior by means of technology has changed the very conditions of psychological research. empirical research on the field of media psychology will not only provide evidence within the framework of cognitive psychology, but surpass it. however, this revolutionary potential of media psychology can only be furnished through theoretical controversy that faces the epistemological breaking points of (analogous) cognitive psychology. the enterprise of extending media psychology’s impact beyond cognitive psychology’s limitations requires alternatives providing descriptions for the relevant experiences. several fruitful approaches from alternate epistemological traditions can enrich the psychological debate. the perspective taken here employs phenomenology as an epistemological and ethnomethodology as a methodological contribution (for the methodological relation between dynamic decision-making research and phenomenology, see wendt, 2017a). phenomenology is a philosophical school of thought that emerged in the late 19th century, among others, in the work of edmund husserl (see spiegelberg, 1960; 1971). its central concern is the structural analysis of experience, viz. the constitution of consciousness. there has been a distinct influence of phenomenology on psychology throughout the history of the discipline. ethnomethodology is an empirical concept that derives from phenomenological thinking (pilnick, 2013). it has been established by harold garfinkel in north american sociology during the second half of the 20th century. ethnomethodology’s main concern is to understand the structures of lifeworld by investigating the “methods that persons use to carry out the activities that make up their everyday life” (churchill, 1971, p. 183). in the behavioral sciences, ethnomethodological thinking has influenced well-established concepts, such as conversation analysis (see, e.g., maynard & clayman, 2003). however, the choice of phenomenological perspective is not the only way to expand cognitive psychology’s reach. several other epistemological traditions point in the same direction. yet, as it shall be discussed, phenomenology provides the most promising conceptual horizon to promote the progress of psychology. to render the scope of media psychology’s epistemological and methodological contribution to controversies within theoretical psychology, this article aims to sketch out the qualitative empirical use of digital subject matters by exploring a new data source: live streaming (for a conceptual outline see wendt, 2017b). live streaming is a recent technological advance that is enabled by the increase of online bandwidth capacities. it is characterized as live broadcastings that “put the traditional consumer into the role of content creator” (sjöblom & hamari, 2016, p. 1). streamers – the content-creators of streaming platforms, such as www.twitch.tv – upload spontaneously created video material in real time, publishing it to a live audience. the content of these streams is not restricted to, but prominently features video games which are a well-established subject matter in media psychology (see reeves, yeykelis, & cummings, 2015). the psychologically most relevant property of live streaming is the detailed documentation of the streamers’ behavior, often by a webcam that is recording their analogous activity, and always by some output of their digital activity, such as gameplay, communication or creative production. live streaming bears a great resemblance to the material used by classical methods of empirical psychology, such as introspection and think aloud protocols (wendt, 2017b). however, the data obtained through live streaming differs from its analogous predecessor in crucial aspects, especially its ecological validity. with more than a hundred million unique users every month on the streaming platform www.twitch.tv (sjöblom & hamari, 2016), live streaming can be easily reckoned as a source of big data. in reference to relevant work about the subject matter of video gaming, the investigation of live streaming can help to excavate the contribution of media psychology to fundamental epistemological and methodological controversies. the following line of arguments consists of four subsequent steps. first, live streaming as a novel data sources is presented and analyzed regarding the fundamental psychological problem of the investigation of experience. in the course of it, the consideration of subjectivity shines forth as the crucial question for an adequate methodology. second, the historical and contemporary psychological account on this question of subjective experience is elucidated and criticized from the standpoint of phenomenology. phenomenology promotes the investigation of consciousness in a peculiar form unlike the well-established models from cognitive psychology. third, ethnomethodology is introduced as a methodological approach that aligns with phenomenology and, therefore, can be used to develop an alternative approach within psychology, as well. fourth, the outline of a phenomenological psychology of experience can be brought together with the data source of live streaming. this methodological concept offers a sufficiently complex framework for an advancement of research on dynamic decision-making. hence, the general purpose is to make a contribution to an eventual breakthrough of the “impasse” (ohlsson, 2012) in problem-solving and decision-making research. video games research and live streaming in the formal terms of media linguistics (schmitz, 2015), live streaming is a transient, current, oral form of communication based on dynamic images. in the prototypical case of video games as live streaming’s subject matter, the focal compounds are video capture of the streamer by webcam, video capture of the content by camera or computer screen capturing, the streamer’s audio track which is captured by microphone, the audio track of the content which is captured by microphone or direct computer audio capturing, and (optionally) the (most commonly) written interaction with the audience. however, live streaming is applied to record behavior in various domains of everyday 1 the notion of ‘self-description’ is used in a broad sense in this text. the terminology is relevant because of important epistemological concerns. one might think that the notion of ‘description’ entails a cognitive effort. thus, the term ‘self-description’ would be inadequate to subsume the concept of ‘think aloud protocols’ because they are designed as effortless verbalization (see ericsson & simon, 1980; fox, ericsson, & best, 2011). however, in this text, ‘description’ should be understood as a most fundamental form of self-reference that is characteristic for selfconsciousness. a different notation could be ‘self-report’. 10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 2 www.twitch.tv www.twitch.tv https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data life, such as tailoring objects, creating art (e. g., music, painting), outdoor sports, and political discussions. the only formal restriction is the recordability and the ethical limitations of the provider which is hosting the stream. research on live streaming is scarce, and the few available sources mainly focus on the participatory possibilities of the interactions with the audience (hamari, & sjöblom, 2016, 2017; hammilton et al., 2014; wendt, 2017b). moreover, live streaming is prevalently understood as a contribution to video gaming research. yet, without contradicting this approach of research, live streaming can also be framed in a wider sense as a data source for the general observation of behavior. the domain of video gaming, thus, becomes a mere example for the possibilities of the investigations to which live streaming can provide experiential data. in the words of reeves, greiffenhagen, and laurier (2016): “we instead provide an example-driven explication of the perspective to examine video game play phenomena as sites of social order” (p. 6). such a shift of perspective from the domain of video games to video games as an example of general behavior is analogous to the difference between continental and north american research on problem solving behavior (frensch & funke, 1995): while north american research focusses on problem solving in different domains, the european perspective highlights different types of problem solving behavior. equally, live streaming can either be an instantiation of video gaming as a domain of behavior, or it can be a data source to observe certain types of behavior. these views are complementary, not exclusive, but they switch the priorities of investigation. while for video gaming research the subject matter of observing streamers play video games is indispensable, the different types of situations within the video gaming context become optional. for problem solving research on the other side, it is facultative whether live streaming contains video games but the structures of experience are in the center of attention. this alternation of scientific attention can be seen as the main reason for a different range in research on video gaming. while domain specific investigations aim to explore the peculiarities of gaming, the more general approaches use gaming as an exemplary occasion to gain insight on universal properties of behavior. since current media psychology is mainly influenced by research on domains of behavior, it appears to be a comprehensible result that media psychology currently neither reaches out actively for a revision of psychology’s epistemological foundations, nor searches for an innovation of the observation of experience and behavior when reflecting on live streaming. nevertheless, these potentials are given and should be acknowledged as well as utilized. to redeem these potentials, two introductory steps are required: [1] the methodology of video gaming research will be sketched out as the exemplary frame of reference within media psychology, and [2] the epistemological context has to be specified. for the first step, the frame of video gaming research bears explicative use in the above mentioned limits of an “example-driven explication” since there is a sufficiently validated and applied congruence by structure between video gaming and several paradigms of psychological research (e. g. gordon, 2015; järvelä, ekman, kivikangas, & ravaja, 2014), such as problem solving paradigms (e. g. güss, tuason, & orduña, 2015; rach & kirsch, 2016). equivalently, cognitive tasks are used as a comparison to improve the understanding of video games (see järvelä et al., 2014, p. 93). the methodology of video gaming research the methodological reflection on video gaming research approaches its subject matter from two points of view, [a] content analysis and [b.1] experiential or [b.2] behavioral observation. content analysis aims to cover the uniqueness of video games by their material and structural properties analyzing the mechanisms and functions that formally construe the video game. therefore, content analysis renders video games as empirical paradigms which can be used in scientific research in so far as there exists a possible control to their stimuli which requires a thorough understanding of their architecture. experiential and behavioral observation on the other hand turn towards the subject experiencing the video game, trying to grasp the typical situation of exposure to this kind of experiential object. schmierbach (2009) offers a review of content analysis of video games, dealing with its accomplishments and challenges. he mentions six steps of content analysis: “unitizing, sampling, recording/coding, reducing data, drawing inferences, and narrating the result” (p. 148). “unitizing” is the task to select distinguishing units of description. such units can be “physical, syntactical, categorical, propositional, and thematic” (p. 152) and offer the possibility to isolate single events or actions to make them accessible for scientific interpretation. although the most elementary structure of computer application has a natural unit of bits, these can barely be used as material in behavioral sciences. as a result, content analysis needs to conceptualize valid units of description that serve the respective investigation without deforming the video game’s own structure. järvelä et al. (2014) use the term “eventbased analysis” in a similar fashion, pointing out some critical considerations: “event-based designs, however, introduce some additional considerations for the researcher. the choice of event coding is based not only on the game’s available actions, but also on how isolated these actions occur during gameplay. often there are over-lapping events that are hard to differentiate from each other” (p. 96). while “unitizing”, “sampling”, “recording/coding”, and “reducing data” are rather formal and fundamental procedures of content analysis, “drawing inferences” and “narrating the results” require more interpretation, such as the assessment of “types of players” (p. 158), e. g. in the investigation by klug and schell (2006), competitors, explorers, achievers, jokers, and performers are labeled separate types of players. these categories result from the combination of above mentioned “distinguishing units” in the experimental participants’ behavior. in the words of newell and simon (1972), the content analysis provides the elements to the “problem space” of a video game as an empirical paradigm. consequently, the empirical behavior can be seen as a combination of this problem space’s possible states. similarly, several authors proposed game taxonomies based on content analysis of gameplay, that often relate to general theory of games: järvelä et al. (2014, p. 94) mention “competition, chance, simulation, and vertigo”; “narrative, ludology, and simulation”; or “the level of chance vs. skill, fiction vs. non-fiction, and physical vs. virtual”. without a doubt, these steps of content analysis are necessarily proceeding any investigation based on live streaming data (in so far as it – exemplary – contains video games). yet, as oswald, prorock, and morphy (2014) comment, this “common mainstream psychological research strategy of evaluating game content in order to understand games and to determine psychological effects on players has 10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 3 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data inherent limitations” (2014, p. 120). as a complementary contribution, experiential or behavioral observation are required. however, the direction towards this matter is already tacitly acknowledged by schmierbach (2009, p. 150) as he says that “[t]wo particular challenges interactivity and multiplayer options — warrant further discussion” (see also järvelä et al., 2014). without this turn towards behavior and experience, video games become just another experimental paradigm added to the list of problem solving tasks, entirely missing out on the valid potentials of video gaming as well as live streaming research. it is especially interactivity that qualifies playing video games as complex problem-solving and dynamic decision-making (rach & kirsch, 2016). within the second methodological point of view in video gaming research, behavioral observation differs from experiential observation. from the standpoint of cognitive psychology, only the prior can be affirmed without fundamental concerns. an elaborated attempt on such behavioral observation can be found in cowley and colleagues (2014) who propose the ppax framework that connects so called “experience patterns” (measured as psychophysiological data and therefore rather “behavioral” than genuinely “experiential” in the present meaning) with “patterns of events” (in the sense of “distinguishing units” of content analysis). by the authors’ judgement, “the experience of gaming is not only a series of individual emotional reactions, but also of patterns of cognitions and emotions, all of which are reflected in the player’s real-time physiological reactions” (p. 42). consequently, they aim to integrate psychophysiological observation from the person playing the video game with the content analysis of the video game’s output, creating a genuinely cognitive psychological contribution to video gaming research. in behavioral observation of this kind, content analysis’ independent variables are used to predict behavioral dependent variables, such as the actions in gameplay or psychophysiological – especially neuropsychological – activity. the problem of subjectivity despite proposing a handy and fruitful approach to the observation and investigation of video gaming behavior, the authors subscribe to a neuro-reductionist account of experience, identifying it with psychophysiological activation. the main concern about this endeavor has to be the subjectivity of experience. cowley et al. attempt to account for it and try to solve the matter by adopting computationalism: “the subjectivity of the experience of play means that measurements cannot be easily validated or verified. [. . . ] thus, player learning needs to be accounted for by some refinement of machine learning” (p. 55). this solution to the problem of subjectivity has been widely accepted among cognitive psychologists since the paradigm of cognitive psychology subscribes to a mechanistic view understanding cognition as information processing (hutto, 2008). ppax is a data driven approach with high aspirations. yet its results do not point towards epistemology and methodology but remain within the predominant domain specific investigations. its main contribution is practical, e. g. allowing to enhance game designs. from a less orthodox point of view, such as phenomenology this solution cannot authentically suffice to account for subjectivity. without this critical step, video games research based on mere behavioral observation might still be able to deliver a productive contribution to cognitive psychology, investigating “a multitude of concepts central to psychology, from memory encoding, to social skills and decision making” (järvelä et al., 2014, p. 85), but it will not redeem its critical epistemological and methodological potentials. beyond cognitive psychology, not only the observation of behavior but also of experience is necessary to comprehend video gaming and its players. several investigations on video games have tried to approach this rather opaque and vague issue which cannot be addressed by behavioral observation alone. for example, ward and sonneborn (2009) described the great variety of creative expression in virtual worlds where “people represent themselves and interact with others” (p. 213). these aspects transcend the linearity certain video games appear to contain by concept. creativity in video games is an ideal indicator for what can be observed beyond the structural analogy to paradigms of experimental psychology. more directly concerned with the possibility of experiential observation, oswald et al. (2014) investigate the “perceived meaning of the video game experience”. the authors attempt to inquire the experiential and social aspects of gameplay by using open-ended questions. explicitly criticizing the limitations of content analysis, they conclude “that asking players about their perceptions of their game experience, rather than asking them to rate the content of the game played, can be a valid research tool” (p. 121). another approach to the experiential dimension of video gameplay is tanenbaum’s “hermeneutic inquiry for digital games research” (2015). in order to get hold of subjective experience, he frames “digital games as texts” (p. 59) in the tradition of hermeneutics (especially gadamer) and close reading. this allows him to account for subjectivity through the meaning of the text as its manifestation: “in a digital text, the reading must be able to account for the indeterminate nature of the experience” (p. 69). moreover, his approach allows to deal with the above mentioned problem of interactivity by framing it as “changing the ordering of a reading, as is the case in hypertext fiction” (ibid.). although these contributions approach the pertinent problem of subjectivity as a central epistemological topic, they cannot discard the methodological critique executed by cognitive psychology over the course of the last decades. oswald et al. (2014) and tanenbaum (2015) do not sufficiently consider the limitations of self-descriptive methods, such as introspection and think aloud protocols. throughout the history of psychology, these methods have been the occasion of major controversy about their applicability – and thereby: about the possibility of psychological research on crucial topics, such as cognition, consciousness and the self. video gaming research cannot neglect the difficulty to obtain a valid access to subjectivity. however – and this is the claim of this article –, research on new media, such as live streaming, can provide an incremental contribution to this eminent and everlasting controversy. before outlining the way in which live streaming can augment video gaming research to contribute to the epistemology of psychology, a word on this controversy. the controversy about the validity of self-description the second introductory step invokes the history of psychological research on self-description. the historical preface of the epistemological problem at hand reaches back to the origin of experimental psychology in the early twentieth century. a crucial theoretical antinomy between the psy10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 4 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data chological laboratories of leipzig and würzburg was the rejection of the observability of higher cognition by wundt, on the one side, and introspectionism advocating the observability of higher cognition by külpe and his colleagues (see galliker, 2016; wendt, 2020), on the other. while wundt proposed a psychological methodology that traces back experience to elementary behavior which could be demonstrated in laboratorial experiments, the psychology of thought elaborated complex instructions for experimental subjects to accurately describe their self-perception, such as the “systematical experimental self-observation” (ach, 1905). on introspectionism these methods to observe higher cognition proposed by the würzburg school are the prototypical case of introspection. its basic structure contains the exposure of the subjects to an experimental scenario, usually a problem solving task, and afterwards a protocol of their experience in dialogue with the scientist. the main challenges to improve introspection which have been faced by the würzburg psychologists, were of a technical kind. a major concern was the relation between thought and language since the protocols of introspection had to be elaborated in verbal exchange while cognition was assumed to be of a different nature. by experimental manipulation, the experiential circumstances were to be varied in order to control the congruence between linguistic description and experience (see fahrenberg, 2015). the würzburgian psychologists proposed the education of participants as an important step to improve the reliability of introspection. experts of self-observation were thought to be able to pay attention to the crucial changes in cognition more closely and to apply a vocabulary that precisely represents the involved processes. wundt’s comment on the introspective attempts in würzburg was thoroughly critical. in his opinion, the procedures were lacking experimental control and could not be repeated, therefore being unreliable. he was demanding strictness in experimental designs and the foundation of empirical work in elementary processes that could be manipulated and observed unequivocally. the initial direction of experimental psychology towards laboratorial settings with meticulous control of stimuli, as elaborated by wundt himself, was clearly opposed to the introspective approach. the course of further development in behaviorism and cognitive psychology mainly favored wundt’s approach, but also brought forward valid critique of introspectionism, such as most prominently by nisbett and wilson (1977), which appeared decisive, especially from the standpoint of the prevailing paradigms, ultimately leading to “the disappearance of introspection” (lyons, 1986). the main doubt against methods to investigate higher cognition is the fallibility of self-description: it is not evident insight into the own mind guiding the observations of subjects, which is subsequently corrupting all self-descriptive data. nevertheless, the validity of introspection remains to be a topic even after the method has been renounced by the mainstream scientific community. recent publications reexamine the possibilities to make use of introspection in contemporary psychology, such as jäkel and schreiber (2013). still, introspection is only discussed as a methodological reference, not as an equal alternative to the established forms of psychological investigation. think aloud protocols, however, the second major selfdescriptive method, have not been renounced quite as strictly as introspection within cognitive psychology. their usage has been popularized in the course of the growth of problem-solving research following the methodical innovations by simon and newell in the 1960s (for a discussion of the methodological relation between problemsolving research and psychology of thought, see wendt 2019). the first major difference between the two selfdescriptive methods is that, in think aloud protocols, there is no immediate conceptual influence onto the data. for the case of the classical experiments of the würzburgian psychology of thought, this means that the thought-protocols of the educated participants were imbued with interpretations based on their own theories of mind. second, think aloud protocols are usually created during the experiential episodes of the subject and not afterwards. consequently, think aloud protocols can be understood as mere protocols of (verbal) behavior over the course of a psychological experiment. all the same, the misdirecting name of think aloud protocols symbolizes an inherent ambivalence. the apparent claim of think aloud protocols is to offer some access to thinking. nevertheless, contemporary psychology mainly utilizes the method to observe behavior, not to investigate higher cognition. likewise, the above introduced separation between behavioral and experiential observation delimits the subject matter of think aloud protocols – strictly speaking, they are used as “speak aloud protocols”, a title that demystifies and clarifies the method’s current application, but at the same time reveals that the method’s current interpretation has only a small conceptual scope. the rich body of recent work on think aloud protocols (e. g. barkaoui, 2011; elling, lentz, & de jong, 2012; koro-ljungberg, douglas, therriault, malcolm, & mcneill, 2013) does not coincide methodologically with the aspirations of the würzburg school to observe higher cognition. in other words, cognitive psychology has excluded the introspectionist approach to observe experience for methodological reasons and the think aloud research used in cognitive psychology is not concerned with experiential observation. in cognitive psychology, self-descriptive methods exclusively try to observe behavior. the phenomenological approach to self-description beyond the mainstream tradition of psychology, however, the endorsement of self-description by phenomenology remained unharmed by the attempts of critique and dismissal. the primary reason for this dissociation between different traditions of theory is that phenomenology was never depending on experimental sciences but continuously drew on philosophical discourse. consequently, experimental methodology remained secondary to phenomenology. the empirical claims of phenomenology rather originate from its fundamental conviction that only experience is an original source of insight. consequently, phenomenological thought bears the potential to inspire new directions in experimental psychology. this understanding structurally coincides to a certain degree with self-descriptive methods as they were proposed by the würzburg school, offering a separate attempt to advocate self-description as experiential observation. to put it differently, since critique of self-description does not apply to phenomenology, the discussion about these the value of these methods can be renewed. it transgresses to a different epistemological level. doubts against the historical dismissal of introspec10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 5 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data tion arise. notwithstanding, this expansion of the theoretical horizon of psychology in the direction of phenomenology does not imply a return to naïve self-descriptive methodology. phenomenology does not uncritically rehabilitate the empirical mistakes caused by irresolute or speculative experimental designs (for an attempt to reconcile phenomenology and psychology of thought in experimental psychology, see wendt, 2020). however, it allows for a review of the discourse about self-description that has been silenced prematurely, discarding crucial empirical potentials within experimental psychology. the fundamental endeavor to criticize the rejection of selfdescription as experiential observation can be found in petitmengin and bitbol (2009). the authors show that a rehabilitation of self-descriptive methods requires a third point of view which deviates from both, the naïve rationalist attitude that claims the evidence of thought and the naïve sensualist attitude which claims the inaccessibility of thought. what has traditionally been called thinking, is a complex phenomenon that needs to be revisited from a more elaborated epistemological perspective. investigating possibilities to reutilize self-descriptive methods does not require to reject the empirical critique carried out by cognitive psychology, nor to approve the naïve concepts of infallible comprehension of one’s own state of mind. however, cognitive psychology’s insights are detached from their epistemological premises. to support self-description does not imply an uncritical affirmation but an integrated critical view that considers a usage of self-description that has not been explored yet. this renewal of controversy about self-description is not exclusive to phenomenology. there are several other epistemological traditions that propose a comparable third way of approaching thought. cultural-historical psychology, established by vygostky (1934/1986), understands thought as the internalization of communication via egocentric speech. from this point of view, understanding thought becomes accessible when approaching it through intersubjectivity. equally, symbolic interactionism assumes the priority of interaction: “if we had not talked with others and they with us, we should never talk to and with ourselves” (dewey, 1958, p. 170). another similar and more recent approach is dialogism as proposed by linell (2009). he understands cognition as embedded, enacted, extended and ecological, thereby claiming that it is accessible to scientific observations. in active externalism, a cognate contemporary concept by clark and chalmers (1998) is labeled “extended mind” (in the broader context of the philosophy of embodiment, see fuchs, 2017). these alternate traditions share the epistemological common ground to oppose cognitive psychology as monologism. following steffensen (2015, p. 110), “monologism is not a theory, but rather a handy term for a conglomerate of long-held views in the communicative and cognitive sciences”, such as the information processing model of cognition, the transfer model of communication, and the code model of language. for example, monologism conceptualizes language as internal, instrumental and individual and therefore inaccessible for observation. the very basic epistemological assumption of monologism is the primacy of the first person perspective. this view can be supported, for example, by the ontological assumption of a monadic self as the core of personal existence as it has been assumed in rationalism, or by the concept of private agency of all cognitive processes in empiricism. consequently, monologism cannot consider experiential observation as valid or possible. by opposing monologism, other epistemological approaches implicitly scrutinize this methodological restriction. as a result of the critique, the rejection of selfdescriptive methods is exposed as the consequence of its epistemological premises. nevertheless, phenomenological thinking offers a decisive further asset for theoretical psychology that is not contained in the other traditions. the extensive body of reflection on psychological problems in the history of phenomenology anticipates many possible epistemological objections. phenomenology does not commence its thought with the epistemological foundations of psychology but reaches further to the philosophical origins. the work of the original phenomenologists, such as edmund husserl, martin heidegger, maurice merleau-ponty, or jean-paul sartre (see, e.g., zahavi, 2005), deals with the question at hand as part of more fundamental reflection. already one of the very first phenomenological contributions, the prolegomena to husserl’s “logical investigations” (1901), were dedicated to the critique of psychologism – the reduction of logic to psychology. in other words, phenomenology does not only oppose cognitive psychology’s take on self-description, but also delivers extensive arguments that consider the roots of the opposition to monologism. however, this cannot be the place to replicate the entirety of phenomenological thought, but to offer exemplary insight. overall, there is no use to carry these debates deeply into psychology. their place will remain to be the philosophical reflection. but these remarks still are important because they demonstrate that there exist alternate schools of thought that are well-founded. however, the practically more exigent question is how these alternate views can augment empirical psychology regardless of its contemporary affiliation to cognitive psychology. throughout the twentieth century, phenomenology has unsuccessfully tried to have a vital influence on empirical psychology. its traces are the so-called “phenomenological psychology” as a side issue to the main developments in the discipline (see the exhaustive historical analysis by spiegelberg, 1972). but media psychology opens a new window of opportunity for further inspiration by phenomenology. psychological perspectives from ethnomethodology the two introductory steps set a context for the possible advances of psychological research. experiential observation and self-description are two original issues within psychological methodology. from the methodological standpoint, these possible advances are of a qualitative semblance. they do not immediately favor greater mathematical yield from data but an enrichment of its interpretation. due to the current focus on domains of behavior, however, psychology research commonly prioritizes the prior and refuses the latter. only to prove any omitted qualitative potentials would not pose an actual contribution. it requires an epistemological justification of possible benefit from a revision of these two issues. based on elaborated philosophical reasoning, phenomenology proposes the (re)integration of both approaches into psychology. yet, even if the arguments of phenomenology on these issues were generally accepted by psychologists, this would not evoke any factual change to empirical psychology – this constraint is widely underestimated in philosophical discourse. the epistemological justification alone is not sufficient to motivate a development in scientific systems, such 10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 6 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data as psychology (as pointed out in the debate about critical rationalism by lakatos against popper, see lakatos, 1978; macho, 2016). additionally, it requires a methodological program that employs the epistemologically demonstrated advantages. this program cannot consist alone of sheer ad hoc propositions as frequently and vaguely arrogated in purely theoretical debate. empirical progress must be founded on a methodological system that can unite the epistemological innovation with the state of the art. returning to the already standards of research which have already been overcome, e. g., the systematical experimental self-observation, does not serve the purpose to benefit psychology, either. a reactionary step like that would not correspond with the possible improvement that has been sketched out based on phenomenology. the past applications of self-description can only be seen as a critical counter-image to the novel efforts. a similar situation of implementing phenomenological thought to empirical science has already occurred in sociology. after schutz’ comprehensive effort to apply phenomenological thought to the discipline’s epistemology, it was upon garfinkel’s “ethnomethodology” in north america and berger’s and luckmann’s “sociology of knowledge” in europe to establish empirical programs (see eberle, 2012). by now, both of these programs spawned fruitful traditions of empirical research. still, as graumann (1991) pointed out, a similar implementation of phenomenology has still not succeed in psychology. yet, psychology can learn from the ethnomethodological tradition and adapt the blueprint for empirical implementation of phenomenological thought. these circumstances are decisive in the application of media psychology in decision-making research. here, phenomenological thinking can redeem the lost potentials of experiential observation which reside in self-description after it was omitted in twentieth century’s psychology due to the lack of adequate data sources. live streaming serves as a prime example for this potential as it is a digital data source to be explored by media psychology. the missing link is an empirical program that is directed towards this possibility. ethnomethodological practice supplies phenomenology with the skeletal frame. the relation between ethnomethodology and the question of experiencethere is no great value to a lengthy introduction to ethnomethodological concepts. they are already available in literature and mostly oriented on sociological application (see, e.g., button, 1991; lynch & peyrot, 1992). the use of ethnomethodology for psychology resides in the contribution of the theory to the adjustment of psychological methodology and the phenomenological review of psychological epistemology. on a side-note to bypass misunderstandings, it should be added that “ethnomethodology is not a new methodology, but rather a theoretical perspective” (churchill, 1971, p. 185). still, in this context the most useful part of this theoretical perspective is its methodological contribution to an application of phenomenological thought in empirical sciences. therefore, the benchmark is self-description and the aim is to renovate it, although not in the way in which they have been used so far in psychology, but rather, through transformation by phenomenological reflection. the prospect of this approach depends entirely on media psychology: as shown above, the failure of self-descriptive methods is not a result of their epistemological unsuitability but of the deficiency in observational opportunities. these opportunities are now given through technological progress, such as the emergence of live streaming. thus, on the one hand, the phenomenological understanding of self-description is different from introspectionism (such as würzburg school), on the other hand it differs from cognitive psychology and its wundtian predecessors. the vital difference from both other standpoints is the object of investigation. phenomenology does not seek to inquire higher cognition as conceptualized by experimental psychology, but the essential constitution of its even more fundamental experiential compounds, such as pre-reflective consciousness. this is an implicit critique of introspectionism which postulates its subject matter as an object of experience although it is an abstract concept that cannot be confirmed by phenomenological analysis. at the same time phenomenology offers a critique of cognitive psychology’s sensualism which claims the validity of physical measurement of behavior while neglecting that the original source of all data is experience. on the matter of self-description, the decisive leap into phenomenology is taken by regarding heidegger’s concept of “dasein” (1927/2006), in sartre’s (1944/1993) words, human reality: “realité humaine”; or schutz’ concept of “lifeworld” (schutz & luckmann, 1973). in short, the existential situation of the experiencing subject does not occur in lucid confrontation with the material immediacy of what is merely existing, but mediated by an approach to the world that perceives things and objects, not as essences. this “natural attitude” (schutz & luckmann, 1973) of everyday life’s lifeworld is a way of experiencing that which conceals the very existential situation in which every subject is originally involved. however, in certain occasions, the immediacy of being appears through the veil of everyday life, for example, in experiences of existential fear. if so, the rules that constitute the natural attitude of everyday life appear as the structure of being-in-the-world (dasein). in reference to the question of self-description, it is a naïve fallacy to assume congruence of the object language which is used by subjects in order to describe their experiences with the language of psychological description. the experimental situation cannot be encountered by the subject with the same attitude as it is encountered by the scientist. self-description may not be interpreted as genuine expressions of what is really happening, but as expressions of a spontaneous creation of an order to things in beingin-the-world (dasein). reading think aloud protocols and introspection as objective communication inevitably leads to a misunderstanding of the actual experience. these protocols and dialogues with the experiencing subject are the documentation of the spontaneous emergence of everyday life as order that transcends the experimental design. ethnomethodology serves to discover this exact process of creating rules that structure everyday life and, thus, analyzes the order that is fundamental for the experimental situation. rules and norms are considered “methods” from the subject’s experiential point of view: “the most important assumption that drives ethnomethodological approaches is the methodic and orderly character of everyday activities that appear chaotic and messy at first glance” (reeves, greiffenhagen, & laurier, 2016, p. 23). in selfdescription, psychology does not observe immediate processes of thought and action. behavior and experience are always located within an order. in the “natural attitude”, this order is generated pre-reflectively. this does not exclude linguistic processes. however, the verbal acts cannot be seen as neutral communication because they are a constitutive part of the creation of order: “ethnomethodologists analyze ‘indexical communication‘ (as opposed to ‘ob10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 7 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data jective communication‘) and use the documentary method of interpretation” (gallant & kleinman, 1983, p. 5) – “in the documentary method of interpretation the individual uses conversational utterances as documents to create a fictive sense of social order” (ibid.). as a result, the meaning of the situation becomes salient. for psychology, this means that there is no sufficient reason to assume that different subjects who partake in the same experience are actually comparable. the comparison is not enabled by the design alone but must be traced back to the individual case. the observation of experience is not contingent but a sufficient condition for holistic psychological research. the most important methodological consequence of ethnomethodological thinking is to focus on how subjects intuit the situations that they encounter instead of investigating what they are doing on the material level of experimental design. this shift equals the above mentioned shift from domains to types of behavior and experience. observational setups in different contexts do only instantiate occasions for the creation of order. the main effort should not be to simply obtain a greater variation of occasions but a greater variation of types resp. “typifications” (schutz & luckmann, 1973). psychologically speaking, focusing on the creation of order in experience allows to understand the normative and cognitive construction of events. ethnomethodologists concentrate on social interactions as subject matter, yet ultimately “[i]n looking for an underlying structure, ethnomethodologists bracket interaction, effectively making actors and their audiences epiphenomenal” (gallant & kleinman, 1983, p. 12). this means that social interaction primarily is nothing but the location of the spontaneous emergence of order and “nothing is ‘brought in‘ from outside the interaction” (dennis, 2011, p. 351). every situation is the scenery to the accomplishment of reality. as garfinkel, the founder of ethnomethodology, concludes, these circumstances imply “that actors make their actions observable, tellable, reportable or—in his famous wording—‘accountable’” (eberle, 2012, p. 288). in self-descriptions, psychology can reveal the traces of the creation of order in being-in-the-world which follows the rules of the “natural attitude” or possible other attitudes which are evoked by the situation. the methodological utility of ethnomethodology for psychology for the methodology of psychology that is concerned with self-description, this means that the attention shifts from the – naively assumed possible – observation of objective experience in an experimentally given order to the observation of the creation of subjective order. this approach does not quest for higher cognition, as the würzburg school did, since the situation and not the monologue is primary, “thinking is derived from communication” (eberle, 2012 p. 284). it rather investigates “the grammar or logic which orders or systematizes the articulation of acts, and in particular, speech acts” (gallant & kleinman, 1983, p. 8). think aloud protocols and introspection cannot be seen as direct access to immediate behavior, anymore. they bear witness to the formation of social order which constitutes the meaning of the words in which they are articulated. despite losing this naïve claim on an observation of immediate experience, self-descriptive methods only gain influence because they are recognized as the site of the formation of order which even precedes the experience itself. the investigation of idiosyncratic mental situations in subjective life must be complemented by the understanding of structures that precede these situations. in a think aloud protocol, every spoken word is evidence of the experiential order which (self-)constitutes it, but the single mind’s cognitive process itself remains absconded. it can become object to speculative models of cognitive psychology, but in the end, all relevant processes occur in and as the order that constitutes them. regardless of whether it is a speculative reality, such as neurons, electric circuits or eternal souls, that operates, the content is the experiential order. this shift has a distinct impact on the methods’ use. ethnomethodological self-description is not used as a data source for behavioral observation: word count, fluency or vocabulary have no transparent meaning by themselves. instead, ethnomethodology studies the variety of order in human experience – as experiential observation. therefore, the use of self-description cannot target the experimental design as loosely as it is currently applied in psychology anymore. on the contrary, the value of observation becomes dependent on how well the design can serve the selfdescription. the course of methodology has to be inverted. for example, experiments that preemptively frame the order of the situation by their setup do not favor the use of self-description, no matter the content. the issue of compliance in psychological experiments transforms from a question about the quality of data into a constituent of experience and its observation. at this point, the pieces fall together as video game research and live streaming unfold their remarkable contribution to psychology under the conceptual influence of phenomenology and ethnomethodology. video games are games. from an ethnomethodological perspective, kew (1986, p. 305) writes that “games are subsumed under the paramount reality of the social world hence the socalled paradoxical nature of games”. game theory highlights that games offer a space for artificial rules within the higher-level order of everyday life. in ethnomethodological terms, they offer a primary example to observe the encounter with novel situations in which subjects create order. still, as dennis emphasizes, this gaming context remains to be only contingent as it as an exemplary domain: “ethnomethodologists are suspicious of the notion of context, as it rarely provides for the accurate description of particular settings of interaction. instead it renders particular interactions ’instances of’ broader sociological or lay categories” (dennis, 2011, p. 354). however, the aim of research that employs self-description should be redirected to the observation of the emergence of order in experience. consequently, situations in which the interaction with novel rules is pertinent are ideal material for such investigations. these methodological circumstances qualify live streaming as a promising data source because they are structurally equivalent to self-description and regularly contain video games as their subject matter. reeves et al. (2016) deliver a comprehensive review on ethnomethodological investigations of video gaming. considering various prior publications, the authors state that the observation of video game play allows a prime access to, e. g., sequentiality as a constituent of social order: “simple but exquisitely timed sequences actions constitute individual players’ own analyses; analysis that is not a purely cognitive phenomenon (hidden inside a player’s head), but that is visible through the unfolding actions on the screen (e.g., whether to run left or right; stop or continue running). consequently, through their on-screen actions, players can build up a shared understanding of the ongoing game activity. put another way, the players display in, and through, their torqueing of torsos and changes 10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 8 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data in forms of talk that they are both oriented toward this point in the sequence as an opportunity for further actions to take place away from the focus of the screen” (reeves et al., 2016, p. 14). furthermore, they conclude that “[t]he point of note is, once again, to highlight the interdependence of actions ‘in the game’ that are available on-screen and player talk as an ongoing conversation that analyses play ‘at the screen’” (p. 15). here, the various previously distinguished aspects of video game research unite. content analysis and behavioral observation convey experiential observation as located by phenomenology in each present situation. the remaining methodological gap is to investigate these processes in the natural environment of digital experience which is the crucial contribution of media psychology. the natural environment of digital experience as shown before, live streaming is a data source that resembles the data sources used by the traditional methods of self-description, think aloud protocols and introspection. moreover, live streaming is a source of naturally occurring data sets (nods) and big data. nods are data that occur as a product of social institutions of any kind and can be investigated by empirical sciences, such as “patterns of website links, dictionaries, logs of group interactions, collections of images and image tags, text corpora, history of financial transactions, trends in twitter tag usage and propagation, patents, consumer product sales, performance in high-stakes sporting events, dialect maps, and scientific citations” (goldstone & lupyan, 2016, p. 548). the advantages of nods are various: external validation for experiments, distinguishing theoretical accounts of real-world outcomes, discover patterns of information latent in environments, create stimuli for experiments, inform the construction of computational models of cognition (see goldstone & lupyan, 2016). this usability of nods originates in their ecological validity which is also the key feature to enable the ethnomethodological account on the field. the subject-matter of live streaming as big data live streaming is not designed to serve psychological investigations. its properties coincidentally concur with the data requirements of self-descriptive methods (even surpassing the methodic standard of think aloud research which, until now, only optionally employs video recording). from the standpoint of theoretical reflection of experimental design, a decisive feature of data must be the standardization of production. for live stream material, this standard, however, is met in detail equivalent to experimental practice of psychological laboratories. the main difference between live streaming and traditional data sources for self-descriptive methods is that the former, until now, cannot be object to experimental manipulation. yet, this is the case for all types of nods. it is not necessarily a disadvantage. it only reduces the variability of hypotheses that may be tested with the available data. the major advantage, however, is, that nods host “the possibilities of discovering principles of behavior without conducting experiments” (goldstone & lupyan, 2016, p. 548) – an enterprise which concurs with phenomenology and ethnomethodology. in other words, not to satisfy the condition for experimental manipulation does not disqualify the data. it necessitates a profound and detailed understanding of the situation which is encountered by subjects in a naturalistic environment, an expertise promoted by ethnomethodology: “an ethnomethodologist must not only be acquainted with the field but be a competent practitioner of that type of work setting himor herself” (eberle, 2012, p. 294). this demand is familiar to a recommendation proposed by media psychology: “having firsthand experience enables scholars to make informed decisions about the suitability of available games as stimulus material for research purposes” (elson & quandt, 2016, p. 54). the nods of live streaming remain opaque without advanced qualitative methodology which serves to prepare the psychological interpretation. but this challenge can be dealt with by employing non-reductionist concepts. upon liberating the empirical program from the limited scope of exclusively experimental research, the actual opportunity inherent to investigating live streams can be cherished. goldstone and lupyan (2016, p. 552) label research based on nods “cultural neo-ecological psychology” which relates to the tradition of ecological psychology. the focus is to explore rich naturalistic environments that offer access to the observation of genuine behavior based on a theoretical take on what a situation is. such a naturalistic environment is present in live streaming, it is the natural environment of digital experience. in a psychological laboratory filled with computers, on the other hand, an artificial atmosphere prevails which is evoked by the institutional circumstances of the laboratory. in such cases it is ensured and required that the subject is aware of this context. a forcible effect on the subject’s experiential order is dominant. streamers, on the other hand, are immersed into the situation without any prior coinage of social rules as an experimental setup. in any other situation with an equivalent degree of naturalism, it has been impossible to reproduce the same consistency with data from self-descriptive methods. technological advance created a primordial field of experiential observation in a natural environment of digital experience. these circumstances allow (media) psychology to respond to the criticism that has been articulated by ecological accounts throughout its history, classically by thinkers such as lewin (for a controversial take on the notion of the situation in the history of psychology see schott, 1991). the most important consecutive question is how research of this kind relates to the established psychological methodology. a defensive approach to advertise a possible benefit from naturalistic observational sources, especially in so far as qualitative interpretation prevails, has been to use them in an exploratory or validating manner. either they serve to procure supplements of future laboratory research or they scale up laboratorial investigations to a bigger circumference. a more confident approach to the usage of nods can be to emphasize the autonomy of their impact. ultimately, it is pragmatically impossible to reproduce a truly naturalistic setting in an experimental laboratory. on the one hand, laboratories cannot substitute natural environments, and on the other hand, “nods should supplement, not supplant, experiments” (goldstone & lupyan, 2016, p. 551). there is no inherent priority or hierarchy among them, the application of these methodological alternatives depends on the subject matter. research traditions cannot hot-wire this phenomenological principle of subject adequacy. in other words, methods have to serve the subject matter and should not preempt it. 10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 9 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data difficulties for the new research program to advocate the implementation of naturally occurring data sets, such as live streams, ultimately leads to an endorsement of ambitious psychological research. the potentials that can be made available by exploiting new media will only be redeemed by qualitatively elaborated research approaches and substantial hypotheses. a blind application of methods on new data sets is committed to its unconsidered methodological scope. the qualitative face of big data remains to be the face of analogous psychology if media psychology does not emancipate from its predecessor which could not anticipate the recent technological developments. one risk about a merely fragmentary advance in media psychology is to reduce big data to its quantitative side which can be labeled “bigger is better ideology”. in the words of goldstone and lupyan (2016, p. 563): “beginning with what this topic is not about is arguing for a ’bigger is better’ ideology. in particular, we have eschewed framing this topic in terms of ’implications of big data for cognitive science’ despite the current zeitgeist surrounding ’big data.’ bigger is not necessarily better when it comes to data (e.g., roberts & winters, 2013 for discussion). many computer scientists interested in big data are interested in developing technologies that allow users to process tera-, peta-, and exabytes of data. however, some of the data sets that have been most psychologically revealing, like john anderson’s emails and taxi drivers’ logs, are mere megabytes or less”. it is important to see that the current approaches to utilize media psychology’s potentials do not yet exhaust them. schmierbach (2009, p. 160) recommends the “careful training of players”. such an approach might be creative and bold but needs to be compared to the efforts by the würzburg school which relied on experts of introspection. the difference between amateurs of introspection and experts is just gradual. in order to reintegrate self-descriptive methods as experiential description, however, psychology requires a leap, not a step. tanenbaum’s (2015) hermeneutic attempt also does not sufficiently regard the challenges to self-descriptive methodology. psychology has accumulated valuable critique of introspection and think aloud protocols throughout a century of research. this critique cannot be the enemy of self-description but purifies empirical methodology of deficient application. the author employs naïve introspectionism which has been overcome methodologically. to use self-descriptive data requires an elaborated foundation on phenomenology that directs the method towards the implicit constituents of experience in being-in-the-world and the “structures of the lifeworld” (schutz & luckmann, 1973). the theoretical reflection on self-descriptive methods based on new data sources has led to the submission of a neoecological account of media psychology inspired by ethnomethodology. these abstract contemplations must be complemented by a pratical perspective that displays the immediate empirical utility of such thinking. the features of such neo-ecological and phenomenological selfdescription research demonstrate two advantages. first, they grant access to the investigation of phenomena that have been excluded from psychology by the paradigm of cognitive psychology. this is the realm of experiential observation. second, they cut the gordian knot of naturalistic research designs by encountering the requisite standardization of laboratory research in the field. however, as a matter of practicability, it is not clear yet how such research can be performed. some exemplary problem solutions can give an idea. the challenges of live streaming research there is only little use to an unprepared examination of live streaming material. it can give an introductory impression of its structure to introduce the scientist to the medium. but this cannot and should not be the launch of a critical investigation because this would be nothing but the notorious exploratory survey, but even more fatally, it would mean to concede to the naïve perspective of everyday lifeworld and being-in-the-world. ethnomethodologically revised self-description seeks to encounter the breaking points of experiential order. these breaking points are rarely encountered in laboratorial investigations because the experimental social order does not depend on the individual’s experiences, it is too stable. from an ethnomethodological point of view, “members of the society are continuously engaged, without hope of relief, in creating and maintaining the social and natural world” (churchill, 1971, p.185). in the setting of a laboratory, this is a rather simple task because the situation is open and spontaneous for the subject who just engages passively in whatever experience he might encounter (for a phenomenological reflection of the laboratory as the place of situated experience, see wendt, 2018). in the setting of live streaming on the other hand, the streamer is continuously exposed to the obligation to maintain the situational order by herself; e. g., in the case of a video game “[o]ne has only to consider, for example, the reactions towards undue violence in games to be reminded that games [. . . ] are fragile and permeable enterprises, defined as they are by constraints on conduct and also by action opportunities that do not apply in everyday life” (kew, 1986, p. 308). an outburst of fury is nothing to regularly happen in an experimental laboratory – if it ever happens, it is most likely that the experiment will be interrupted – but it is a common event in live streams. experimental settings are not existentially fragile but integrated and stable scenarios. in its starting point, research on live streaming, ultimately, transcends the structures of lifeworld by empirically determining the breakages of the given order. this approach avoids the psychological tendency to affirm the given social order. wundtian methodology assumes that the elementary processes underlying the macroscopic phenomena are essentially neutral and independent of the situation. yet, from a phenomenological point of view, this is a fatal assumption because these elementary processes are equally influences by the experiential rules of a laboratorial situation as the higher order phenomena they aggregate. the moment of a collapse, emergency, or confusion of the experiential order, however, is not the target, but the access of research on self-description. such moments themselves are also phenomena that are object of experiential observation. they are the empirical occasion for a phenomenological bracketing of the circumstances which isolates essential constituents of experience. this experiential observation must be led by a concurrent construction that frames the horizon of such constituents (see holzhey, 1991, p. 9). in other words, it requires an educated expectation of what to encounter at the fracture of lifeworld. in the case of a problem-solving situation, for example, the notion of the problem as an ordered situation appears when the subject suffers a loss of experiential stability. goldstone and lupyan (2016, p. 549) enumerate several of such possible subject matters: “principles of judgment, perception, 10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 10 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data categorization, decision-making, language use, inference, problem solving, and mental representations”. under these circumstances, the laboratorial situation is not the gold standard of observation but a reference. live streaming offers unique observational opportunities that cannot be replicated in the laboratory. some streamers, such as the canadian octavian morosan (see https://www.twitch.tv/nl_kripp), have uploaded material of more than 10.000 hours over the course of a decade. these individual cases are not exceptional to the medium of live streaming. laboratorial investigations can only complement research that is based on big data of such extent. ultimately, the value of ecological validity must be reframed. without phenomenological reflection, the notion of ecological validity remains to be abstract, a mere property of data. seen from the perspective of ecological psychology, however, naturalistic settings are a tempting liberation to breach out into the “real-world data sets that affect and reveal human behavior” (goldstone & lupyan, 2016, p. 549). media psychology offers the opportunity to embrace this liberation without forfeiting precision. indubitably, the concern of the “(un)trustworthiness of the data” (greiffenhagen et al., 2015, p. 469) is not eliminated but its further solution is methodological reflection since the basic concern of standardization is dealt with in new media, such as live streaming. equally, the “data collection process” (tanenbaum, 2015, p. 75) must be considered an obstacle to establishing self-description. but this problem is not essentially different from data collection in traditional methods of self-description. these problems can only be temporary because the continuous technological advance favors the program of such self-descriptive research. elson and quandt (2016) already propose modding as a possible solution to increase the complexity of available data sets: “the most important properties of this data are that it be accessible, easily perused, and easily cited” (p. 76). all in all, in terms of ecological psychology, real-world research allows to consider the entire variety of possible situations while the laboratory frames only one specific context of social order that predetermines experience and its observation. this is the qualitative face of big data. they are not only an amplification of available material but the leap into new subject matters of observation. however, these possibilities cannot be exploited adequately unless under the guidance of critical methodological reflection. phenomenology and ethnomethodology offer a complex and resilient foundation to reform psychological self-description in the occasion of new media, such as live streaming. these circumstances configure the revolutionary potentials of media psychology. to harvest them is a matter of intelligent and creative research design. a more practical question for experimental research is what a design for a concrete investigation could look like. a promising example can be given thanks to the recent rise in popularity of the game of chess in the live-streaming community. just as problem-solving and decision-making research has received a valuable impulse by virtue of the analysis of the game of chess in grand masters and amateurs as early as in the seminal work of adriaan de groot (1946), big data analysis of problem-solving experience in live streaming might benefit from the starting point of a rather simple problem that nonetheless allows the research to benefit from the valuable properties of live streaming as big data. however, the most important task is to develop a means of preserving the qualitative depth in the data without losing the width of big data. a promising solution could be theory of action on the side of the theoretical foundation and video processing on the side of data analysis. ultimately, this consecutive step is the task of projects to come. acknowledgements i would like to thank andreas fischer as the responsible editor for reliable supervision and helpful consultation as well as the two anonymous reviewers for their valuable suggestions. conflict of interest statement there are no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript. declaration of conflicting interests: the author declares he has no conflict of interests. author contributions: the author is solely responsible for the content of this paper. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: wendt, a. n. (2020). the qualitative face of big data: live streaming and ecologically valid observation of decision-making. journal of dynamic decision making, 6, 3. doi: 10.11588/jddm.2020.1.69769 published: 31 december 2020 references ach, n. (1905). über die willenstätigkeit und das denken. vandenhoeck & ruprecht. barkaoui, k. (2011). think-aloud protocols in research on essay rating: an empirical study of their veridicality and reactivity. language testing, 28(1), 51-75. https://doi.org/10.1177 button, g. (ed.). (1991). ethnomethodology and the human sciences. cambridge: cambridge university press. churchill, l. (1971). ethnomethodology and measurement. social forces, 50(2), 182-191. https://doi.org/10.1093/sf/50.2.182 clark, a. & chalmers, d. j. (1998). the extended mind. analysis, 58(1), 7–19. cowley, b., kosunen, i., lankoski, p., kivikangas, j. m., järvelä, s., ekman, i., ... & ravaja, n. (2014). experience assessment 10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 11 https://www.twitch.tv/nl_kripp https://doi.org/10.11588/jddm.2020.1.69769 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data and design in the analysis of gameplay. simulation & gaming, 45(1), 41-69. https://doi.org/10.1177 dennis, a. (2011). symbolic interactionism and ethnomethodology. symbolic interaction, 34(3), 349-356. https://doi.org/10.1525/si.2011.34.3.349 dewey, j. (1958). experience and nature. new york: dover. eberle, t. s. (2012). phenomenological life-world analysis and ethnomethodology’s program. human studies, 35(2), 279-304. https://doi.org/10.1007/s10746-012-9219-z elling, s., lentz, l., & de jong, m. (2012). combining concurrent think-aloud protocols and eye-tracking observations: an analysis of verbalizations and silences. professional communication, 55(3), 206-220. https://doi.org/10.1109/tpc.2012.2206190 elson, m., & quandt, t. (2016). digital games in laboratory experiments: controlling a complex stimulus through modding. psychology of popular media culture, 5(1), 52-65. https://doi.org/10.1037/ppm0000033 ericsson, k. a., & simon, h. a. (1980). verbal reports as data. psychological review, 87(3), 215-251. fahrenberg, j. (2015). theoretische psychologie – eine systematik der kontroversen. lengerich: pabst science publishers. farnadi, g., sitaraman, g., sushmita, s., celli, f., kosinski, m., stillwell, d., ... & de cock, m. (2016). computational personality recognition in social media. user modeling and user-adapted interaction, 26(2-3), 109-142. https://doi.org/10.1007/s11257016-9171-0 fox, m. c., ericsson, k. a. & best, r. (2011). do procedures for verbal reporting of thinking have to be reactive? a meta-analysis and recommendations for best reporting methods. psychological bulletin, 137(2), 316–344. https://doi.org/10.1037/a0021663 frensch, p. a. & funke, j. (ed.) (1995). complex problem solving: the european perspective. hillsdale: psychology press. fuchs, t. (2017). ecology of the brain. oxford: oxford university press. gallant, m. j., & kleinman, s. (1983). symbolic interactionism vs. ethnomethodology. symbolic interaction, 6(1), 1-18. https://doi.org/10.1525/si.1983.6.1.1 galliker, m. (2016). ist die psychologie eine wissenschaft? ihre krisen und kontroversen von den anfängen bis zur gegenwart. wiesbaden: springer. goldstone, r. l., & lupyan, g. (2016). discovering psychological principles by mining naturally occurring data sets. topics in cognitive science, 8(3), 548-568. https://doi.org/10.1111/tops.12212 gordon, r. (2015). alternate reality games for behavioral and social science research. pittsburgh, pa: etc press. groot, a. van de (1946). het denken van de schaker. amsterdam: noord-hollandsche uttgevers maatschapij. güss, c. d., tuason, m. t., & orduña, l. v. (2015). strategies, tactics, and errors in dynamic decision making in an asian sample. journal of dynamic decision making, 1(1). https://doi.org/10.11588/jddm.2015.1.13131 hamari, j., & sjöblom, m. (2017). what is esports and why do people watch it? internet research, 27(2). https://doi.org/10.1108/intr-04-2016-0085 hamilton, w. a., garretson, o., & kerne, a. (2014). streaming on twitch: fostering participatory communities of play within live mixed media. in: proceedings of the 32nd annual acm conference on human factors in computing systems (pp. 1315-1324). acm. https://doi.org/10.1145/2556288.2557048 hasan, y., bègue, l., scharkow, m., & bushman, b. j. (2013). the more you play, the more aggressive you become: a long-term experimental study of cumulative violent video game effects on hostile expectations and aggressive behavior. journal of experimental social psychology, 49(2), 224-227. https://doi.org/10.1016/j.jesp.2012.10.016 heidegger, m. (2006). sein und zeit. tübingen: max niemeyer. original 1927. herzog, m., & graumann, c. f. (1991). vorwort der herausgeber, in: herzog, m., & graumann, c. f. (eds.). sinn und erfahrung. phänomenologische methoden in den humanwissenschaften (pp. ix-xvi). heidelberg: asanger. holzhey, h. (1991). zu den sachen selbst! über das verhältnis von phänomenologie und neukantianismus, in: herzog, m., & graumann, c. f. (eds.). sinn und erfahrung. phänomenologische methoden in den humanwissenschaften (pp. 3-21). heidelberg: asanger. husserl, e. (2009). logische untersuchungen. hamburg: meiner. original edition 1901. hutto, d. d. (2008). articulating and understanding the phenomenological manifesto. abstracta, 4(3), 10-19. jäkel, f. & schreiber, c. (2013). introspection in problem solving. the journal of problem solving, 6(1), 20-33. https://doi.org/10.7771/1932-6246.1131 järvelä, s., ekman, i., kivikangas, j. m., & ravaja, n. (2014). a practical guide to using digital games as an experiment stimulus. transactions of the digital games research association, 1(2), 85-115. https://doi.org/10.26503/todigra.v1i2.16 kern, m. l., eichstaedt, j. c., schwartz, h. a., dziurzynski, l., ungar, l. h., stillwell, d. j., ... & seligman, m. e. (2014). the online social self: an open vocabulary approach to personality. assessment, 21(2), 158-169. https://doi.org/10.1177 kew, f. (1986). playing the game: an ethnomethodological perspective. international review for the sociology of sport, 21(4), 305-322. https://doi.org/10.1177 koro-ljungberg, m., douglas, e. p., therriault, d., malcolm z., & mcneill, n. (2013). reconceptualizing and de-centering thinkaloud methodology in qualitative research. qualitative research 13(6): 735–753. https://doi.org/10.1177 lakatos, i. (1978). the methodology of scientific research programmes. cambridge: cambridge university press. linell, p. (2009). rethinking language, mind, and world dialogically. charlotte, nc: information age. lynch, m., & peyrot, m. (1992). introduction: a reader’s guide to ethnomethodology. qualitative sociology, 15(2), 113-122. https://doi.org/10.1007/bf00989490 lyons, w. e. (1986). the disappearance of introspection. cambridge: mit press. macho, s. (2016). wissenschaft und pseudowissenschaft in der psychologie. göttingen: hogrefe. maynard, d. w., & clayman, s. e. (2003). ethnomethodology and conversation analysis. in l. reynolds, & n. herman-kinney (eds.), handbook of symbolic interaction (pp. 173-204). walnut creek ca: rowman-altamira. mazambani, g., carlson, m. a., reysen, s., & hempelmann, c. f. (2015). impact of status and meme content on the spread of memes in virtual communities. human technology, 11(2), 148-164. https://doi.org/10.17011/ht/urn.201511113638 neisser, u. (2014). cognitive psychology: classic edition. psychology press. newell, a., & simon, h. a. (1972). human problem solving. englewood cliffs, nj: prentice-hall. 10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 12 https://doi.org/10.11588/jddm.2020.1.69769 the qualitative face of big data nisbett, r. e., & wilson, t. d. (1977). telling more than we can know: verbal reports on mental processes. psychological review, 84(3), 231-259. https://doi.org/10.1037/0033-295x.84.3.231 ohlsson, s. (2012). the problems with problem solving: reflections on the rise, current status, and possible future of a cognitive research paradigm. journal of problem solving, 5(1), 101–128. https://doi.org/10.7771/1932-6246.1144 oswald, c. a., prorock, c., & murphy, s. m. (2014). the perceived meaning of the video game experience: an exploratory study. psychology of popular media culture, 3(2), 110-126. petitmengin, c. & bitbol, m. (2009). listening from within. journal of consciousness studies, 16(10-12), 363-404. pilnick, a. (2013). from trust to telephone calls, from discrimination to giving directions: phenomenology, ethnomethodology, and the analysis of everyday life. symbolic interaction, 36(3), 362–364. https://doi.org/10.1002/symb.61 rach, t., & kirsch, a. (2016). modelling human problem solving with data from an online game. cognitive processing, 17(4), 415-428. https://doi.org/10.1007/s10339-016-0767-4 reeves, s., greiffenhagen, c., & laurier, e. (2016). video gaming as practical accomplishment: ethnomethodology, conversation analysis, and play. topics in cognitive science, 9(2), 1-35. https://doi.org/10.1111/tops.12234 reeves, b., yeykelis, l., & cummings, j. j. (2016). the use of media in media psychology. media psychology, 19(1), 49-71. sartre, j. p. (1993). das sein und das nichts. reinbek: rowohlt. original 1944. schmierbach, m. (2009). content analysis of video games: challenges and potential solutions. communication methods and measures, 3(3), 147-172. https://doi.org/10.1080/19312450802458950 schmitz, u. (2015). einführung in die medienlinguistik. darmstadt: wissenschaftliche buchgesellschaft. schott, e. (1991). psychologie der situation. humanwissenschaftliche vergleiche. asanger: heidelberg. schutz, a. & luckmann, t. (1973). the structures of the lifeworld. translated by richard m. zaner and h. tristram engelhardt, jr. evanston, il: northwestern university press. sjöblom, m., & hamari, j. (2016). why do people watch others play video games? an empirical study on the motivations of twitch users. computers in human behavior, 75, 985-996. https://doi.org/10.1016/j.chb.2016.10.019 spiegelberg, h. (1960). the phenomenological movement. den haag: nijhoff. spiegelberg, h. (1971). the phenomenological movement ii. den haag: nijhoff. spiegelberg, h. (1972). phenomenology in psychology and psychiatry: a historical introduction. northwestern university press. steffensen, s. v. (2015). distributed language and dialogism: notes on non-locality, sense-making and interactivity. language sciences, 50, 105-119. https://doi.org/10.1016/j.langsci.2015.01.004 tanenbaum, j. (2015). hermeneutic inquiry for digital games research. the computer games journal, 4(1-2), 59-80. https://doi.org/10.1007/s40869-020-00108-2 vygotski, l. s. (1986). denken und sprechen. frankfurt am main: fischer. russian original edition 1934. ward, t. b., & sonneborn, m. s. (2009). creative expression in virtual worlds: imitation, imagination, and individualized collaboration. psychology of aesthetics, creativity, and the arts, 3(4), 211-221. https://doi.org/doi/10.1037/2160-4134.1.s.32 wendt, a. n. (2017a). on the benefit of a phenomenological revision of problem solving. journal of phenomenological psychology, 48(2), 240-258. https://doi.org/10.1163/1569162412341330 wendt, a. n. (2017b). the empirical potential of live streaming beyond cognitive psychology. journal of dynamic decision making, 3(1). https://doi.org/10.11588/jddm.2017.1.33724 wendt, a. n. (2018). is there a problem in the laboratory? frontiers in psychology, 9(2443). https://doi.org/10.3389/fpsyg.2018.02443 wendt, a. n. (2019). lösung oder einfall? über die verlorenen spuren der phänomenologie in der denkpsychologie. in t. kessel (hrsg.), philosophische psychologie um 1900 (pp. 189-214). berlin: springer. https://doi.org/10.1007/978-3-47605092-2_11 wendt, a. n. (2020). the problem of the task. pseudointeractivity as an experimental paradigm of phenomenological psychology. frontiers in psychology, 11(855). https://doi.org/10.3389/fpsyg.2020.00855 wertz, f. (1993). cognitive psychology: a phenomenological critique. journal of theoretical and philosophical psychology, 13(1), 2-24. https://doi.org/10.1037/h0091109 10.11588/jddm.2020.1.69769 jddm | 2020 | volume 6 | article 3 | 13 https://doi.org/10.11588/jddm.2020.1.69769 original research a dual processing approach to complex problem solving wolfgang schoppek institute of psychology, university of bayreuth, bayreuth, germany this paper reflects on dietrich dörner's observation that participants working on complex dynamic control tasks exhibit a “tendency to economize”, that is, they tend to minimize cognitive effort. this observation is interpreted in terms of a dual processing approach; it is explored if the reluctance to adopt type 2 processing could be rooted in biological energy saving. there is evidence that the energy available for the cortex at any point in time is quite limited. therefore, effortful thinking comes at the cost of neglecting other cortical functions. the proposed dual processing approach to complex problem solving is investigated in an experiment where cognitive load was varied by means of a secondary task to make type 1 or type 2 processing more likely. results show that cognitive load had no effect on target achievement and knowledge acquisition. even in the single task condition, many participants seem to prefer type 1 processing, supporting dörner's observation. keywords: dynamic decision making; problem solving; dual processing; theory progress in an area of research is stimulated by discoveries and new theories. in the area of complex problem solving (cps), where the handling of uncertain and dynamic situations is investigated1 , both are scarce. as for discoveries, one can even doubt if there were any. one candidate for both is dörner’s (1996) observation that failure in the process of cps follows a certain logic, with the features of complex problems and the limitations of human thinking as premises. for example, problem solvers often focus on a central variable, to which they attribute too much explanatory power (e.g., job satisfaction in an economic scenario). the resulting failure, based on the neglect of other important variables, can be deduced from the conjunction of a tendency to economize thinking (dörner, 1996) on the side of the problem solver, and the feature of complexity and connectedness on the side of the problem. a virtue of dörner’s conception is its comprehensiveness: he fruitfully brought together ideas from very different sources. because (complex) problem solving is a vast research topic, which intersects with many established areas of psychology, such as memory, decision-making, motivation, or judgement, i am convinced that only a comprehensive, holistic approach can yield progress. in the present paper, i pick up dörner’s concept of the tendency to economize, connect it with the idea of dual processing, and explore what predictions can be derived from this. for this purpose, the dual processing approach is contrasted with the current “standard model of cps” (fischer, greiff, & funke, 2012; schoppek & fischer, 2017) by means of an exploratory experiment, which is in part replicated in a second experiment. kahneman (2011) called the human judge (or problem solver) a “cognitive miser” – a person who mostly relies on intuitive judgement and uses reasoning sparingly. kahneman assigns intuitive judgment to “system 1” and thinking to “system 2” (see below). the resemblance of cognitive miserliness to the tendency to economize establishes a connection between dörner’s idea and dual processing theories: the tendency to economize consists of a strong preference for system 1 and reluctant use of system 2. conceptual preliminaries some terms in problem solving research are used with varying meanings. therefore, before the presentation of the research questions i shall define the core concepts. i use the term “complex problem solving” in the tradition of dörner (1996) for human goal-directed activities in situations which are characterized by a relatively large number of relevant variables (complexity), which influence each other in various ways (connectedness), and some of which change their values autonomously (dynamics). the problem solver neither knows exactly what variables are relevant, nor knows all current values (intransparency). in such situations, more than one goal can be reasonably pursued, whereby the goals typically cannot be maximized at the same time (polytely). this definition has been criticized for lacking precision and operationalization (e.g. quesada et al., 2005). however, in problem solving research it is widely accepted that it depends on the knowledge of the problem solver whether a task can be classified as a problem or not (öllinger, 2017). similarly, cps research situations can be more or less corresponding author: wolfgang schoppek, institute of psychology, university of bayreuth, bayreuth, germany. e-mail: wolfgang.schoppek@unibayreuth.de 1a more detailed, current definition is given by dörner & funke (2017) and below. 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 1 mailto:wolfgang.schoppek@uni-bayreuth.de mailto:wolfgang.schoppek@uni-bayreuth.de https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving typical for a “complex problem”. in my opinion, the focus should be on the processes on the side of the problem solver; and insights about these do not depend on the exact classification of the problem. for the problems that persons are asked to solve, i often use the term “complex dynamic control (cdc) tasks”. this term originates in the literature on technical process control (e.g., woods et al., 1990), and many authors use it (e.g., osman, 2010; davis et al., 2020). in cognitive psychology, the term “strategy” is often used generally for any course of action or sequence of cognitive processes. in its original (military) context, a strategy is an abstract approach to a problem, which needs to be substantiated in real situations (clausewitz, 1832/1991). if a course of action can be implemented directly, it is a tactic rather than a strategy. a tactic is a relatively concrete procedure. with this terminology, much that is referred to as strategy could be more precisely be called a tactic. i shall discuss a last problematic term, “intuition”, after a short introduction to the basic concepts of the dual processing approach. the dual processing account for human thinking, decision making, and problem solving the core proposition of the dual processing (dp) account is that there are two modes (or types) of information processing, which differ in their characteristics, which work in parallel, and which may come to different conclusions about the given information. for example, in view of a piece of cake, type 1 processes may quickly raise the impulse of eating it, whereas type 2 processes may involve the recollection of an intention to lose weight and mobilize resistance against the temptation. initially, the two modes of processing were described as systems with characteristic features. for example, system 1 typically works fast, parallel, automatic, and modality specific; in contrast, system 2 is described as being slow, serial, controlled, and flexible (evans, 2008). the problem with these characterizations is that they are neither sufficient nor necessary. it is simply not true that all information processing that is slow, is also serial, controlled, and flexible. in addition, it is unlikely that there are exactly two systems for processing information. particularly the processes that are assigned to system 1 (e.g., pattern recognition, procedural knowledge) are too diverse as to subsume them under a unitary system. therefore, the characterization as two systems was abandoned, and newer conceptions classify processes as belonging to two types of processing. according to stanovich and toplak (2012), the defining feature of type 1 processes is their autonomy: “the execution of type 1 processes is mandatory when their triggering stimuli are encountered, and they are not dependent on input from highlevel control systems” (p. 7). likewise, the central feature of type 2 processes is the function of decoupling representations created by hypothetical reasoning and representations of the real world (ibid.). evans (2012) assigns working memory a critical role for that function. taken together, type 2 processes largely overlap with the contemporary conception of executive functions (diamond, 2013): working memory, inhibitory control, and cognitive flexibility. previous approaches for explaining cps behavior how can extant approaches for explaining cps behavior be located in the framework of dp? some of them describe problem solving behavior in terms of type 1 processing. broadbent, fitzgerald, and broadbent (1986) found that participants who successfully controlled simple dynamic systems (e.g., the sugar production task, viz. sugar factory) were not able to answer questions about the causal structure of the systems correctly. they were also not able to predict what effects given input values have on the target variables. from this, broadbent et al. concluded that participants have learned to control the systems by using a mental “lookup table”. dienes and fahey (1995) followed up on these considerations and showed that a model based on logan’s (1988) instance theory could replicate most of the empirical findings unless the system’s behavior was governed by a highly salient rule. in that case, a rule-based model made the best predictions. buchner, funke, and berry (1995) offered a different explanation for the negative correlations between verbalizable knowledge and control performance. participants who encountered a greater variety of system states had a good chance of answering the knowledge questions correctly but were obviously not successful in reaching the targets (because success meant that the system states did not vary much around the target state). in an additional experiment, however, dienes and fahey (1998) found stochastic independence between repeating successful inputs in situations previously encountered and recognition of these situations as known. this corroborates broadbent et al.’s (1986) assumption that the relevant knowledge for controlling these systems is learnt and known implicitly. implicit learning can clearly be identified as type 1 process (evans & stanovich, 2013; sun, slusarz, & terry, 2005). in contrast, the rule-based model relies primarily on type 2 processing. taatgen and wallach (2002), as well as fum and stocco (2003) presented act-r models that simulated the learning process in the sugar factory. the former model relies on declarative memory of known input-output sequences and assumes a partial matching mechanism; the latter model uses learning of procedural parameters. although act-r differs from logan’s instance theory, and both models differ from each other, they simulate implicit learning rather than explicit rule learning. osman, glass, and hola (2015) presented a model of cps that is based on reinforcement learning (slider model – single limited input, dynamic exploratory responses). this type of learning is also a process of 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 2 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving type 1. however, the system in that research deviates from those that are commonly used in cps research (such as microdyn, tailorshop, or dynamis2): it has only one output variable that depends linearly on two input variables; a third input variable has no effect. paradoxically, osman and colleagues report next to nothing about the fitting procedure and the performance of their model. anyhow, it is obvious that a system with one output variable lends itself more readily to reinforcement-based control than a system with more output variables. this is because in systems with only one output variable no side effects are possible (additional effects of an input variable to other than the targeted output variable). the presence of side effects often requires sophisticated input tactics, which involve considerations about how fixed vs. free the input variables are. for example, if one target variable can only be controlled by a single input variable, the latter is relatively fixed and cannot easily be used to control another target variable. (this extends the concept of controllability according to beckmann and goode, 2017, who focused on the number of dependencies of an output variable, with the number of effects of an input variable). the proof that more complex systems can be controlled based on pure reinforcement learning is still outstanding. in his own framework (schoppek, 2002; schoppek et al., 2017), the author has used the term “i-o knowledge” (input-output knowledge) for declarative knowledge about input values and their specific effects. this conception was inspired by the act-r cognitive architecture (anderson & lebiere, 1998), which does not go well together with a dp approach. nevertheless, some aspects of act-r can be classified as type 1 processes, for example, the learning rules that govern parameter changes on the subsymbolic level, or procedural learning. so far, i have presented explanations for complex problem solving behavior that can largely be assigned to type 1 processing. we now turn to explanations that are primarily based on type 2 processing. the most prominent proponent is the model that has been developed in the context of the “multiple complex systems” approach (greiff, wüstenberg, & funke, 2012). the model assumes that problem solvers first try to detect the causal structure of a system. the success of this phase of problem solving depends on the use of appropriate strategies such as votat (“vary one thing at a time”; tschirgi, 1980; vollmeyer, burns, & holyoak, 1996). after that, problem solvers try to reach goal states using the knowledge they have acquired in the first phase. many studies involving multiple complex systems such as microdyn (greiff et al., 2012), or microfin (neubert et al., 2015) adopted that model, referring to the first phase as knowledge acquisition, and to the second phase as knowledge application (fischer, greiff, & funke, 2012; greiff & funke, 2009; greiff et al., 2013; kretzschmar & süß, 2015; wüstenberg et al., 2012). in most of those studies, cps competency is measured as a construct comprising these two correlated, yet discriminable dimensions. due to the prevalence of that model, we have introduced the name “standard model of cps” (schoppek & fischer, 2017, 2) for it. note, however, that in some studies a 1-dimensional measurement model fitted at least equally well (kretzschmar et al., 2017). the knowledge acquisition process is mainly characterized by induction: from observations of the system’s responses to certain inputs, the problem solver induces causal relations among variables. knowledge application involves deductive processes in addition: from the induced rules, the problem solver deduces a sequence of actions to be taken in order to reach the desired goal state. admittedly, this is a strong simplification of the real processes going on during cps. however, it demonstrates the similarity between the processes assumed in the standard model of cps and the induction-deduction cycle that is characteristic for many problems used in intelligence tests (hunt, 2010). therefore, it is coherent that the performance in controlling simple systems (as used in microdyn) is closely correlated with measures of intelligence (greiff et al., 2013; stadler et al., 2015). the assignment of these processes to type 2 is justified by their high demands on working memory. this synopsis shows that dual processing ideas are hidden in theorizing about complex problem solving, but that the pertinent assumptions are not combined within one framework. a subtle hint in that direction can be found in the abstract of the broadbent et al. (1986) paper: “the results challenge a common view of the discrepancy between performance and verbal accounts, and suggest rather that there are alternative modes of processing in human decision making, each mode having its own advantages” (p.33). however, this idea has not been picked up in subsequent research. to my knowledge, there is no published attempt to combine both accounts for the topic of cps. the tendency to economize and related concepts dietrich dörner observed in many studies that participants minimized cognitive effort (dörner, 1980, 1996; dörner & schaub, 1994). for example, problem solvers tend to identify a central variable in a complex system and hypothesize that many other quantities almost exclusively depend on it. (in the minds of many people today such a variable might be “uncontrolled immigration”. this way of thinking may also contribute to the development and adoption of conspiracy theories). dörner (1996) attributes these and some other shortcomings of human decision-making in complex situations to the slowness of human thinking (of type 2) and has coined the term “tendency to economize” (ökonomietendenz). in everyday language, one would say people are lazy-minded. other researchers have also observed that humans deploy type 2 processing sparsely or reluctantly. herbert simon broached the issue of the narrowness of 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 3 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving human cognition and observed that persons tend to “satisfice” instead of optimizing (barnard & simon, 1947). this is due, among other things, to the inherent uncertainty of induction, but also to the limited capacity of the reasoning apparatus (simon, 1993). the concept of satisficing, i.e., making decisions based on simple criteria, takes into account these limitations and is thus related to the tendency to economize. the heuristics and biases program (kahneman, slovic, & tversky, 1982) was another important field of study with strong relations to the tendency to economize. several authors demonstrated in many experiments that human judgement is guided by simple heuristics, which often lead to wrong conclusions. this program is so well known that it is unnecessary to go into details here. kahneman (2011) interprets these earlier findings in terms of a dp framework now. gigerenzer and brighton (2011) in their abc program (adaptive behavior and cognition) gave the topic a different twist: this group investigated how people use heuristics to make good decisions (gigerenzer, hertwig, & pachur, 2016). they postulate that simple rules of thumb rely on the results of basic skills that have been developed through evolution (“evolved capacities”). as an example, the authors often describe the gaze heuristic. when the goal is to hit a moving object, such as a ball to be caught, one moves so that the angle of view to the object is constant. the perception of the angle of view is provided by the perception system, and the rule of keeping it constant is simple. although gigerenzer is decidedly opposed to the dp approach (kruglanski & gigerenzer, 2011), his conception fits well into this framework: the evolved capacities can be classified as type 1 processing and the rules of thumb as type 2. through their simplicity, the latter take the limited capacity of system 2 into account. in this context, it is important to clarify the meaning of intuition and its role in cps. kahneman (2011) classified type 1 processing as intuitive. gobet and chassy (2009) define intuition as “the rapid understanding shown by individuals, typically experts, when they face a problem” (p. 151). other characteristics are the “essential role of perception, the fluid, automatized, and rapid behavior characteristic [. . . ], and the long time required to become an expert” (p. 172). this characterization is compatible with kahneman’s, even though these authors are not advocating a dp approach. however, gobet and chassy’s (2009) computational model of expert problem solving in chess, which incorporates intuitive and analytic components and their interplay, is a valuable example of how dual processing ideas can be stated more precisely in cognitive models. the emphasis on experts points to the problem that intuition can refer to different processes, depending on the amount of experience and practice of the respective person. while i mostly agree with the conception of gobet and chassy (2009), i do not assign intuition to experts alone. persons with little experience in a domain can also have intuitions about the nature of the problem or about a certain course of action, because of perceived similarities with familiar situations (schoppek, 2019). in such cases, the intuitions will more likely be misleading than in the case of experts. beckmann (2019) warned not to use “intuition” as a pseudo-explanation for behaviors that cannot be classified as specific strategies. to be precise in that respect, i use the term “intuitive approach” for problem solving behavior that is characterized by rather unsystematic trial and error and the attempt to reach goals by gradually adapting an input tactic (see also beckmann & goode, 2017). why do humans deploy type 2 processing so sparsely? a potential explanation for cognitive miserliness is the energy demand required by type 2 processes. researchers in rich western industrial societies tend to forget that the abundant supply with calories they experience today was not given during the time when homo sapiens appeared during evolution. therefore, it seems plausible that the large frontal lobes that are characteristic of humans should be energized only when necessary (baumeister & thierney, 2011). the problem with this account is that the human brain consumes about 20% of the energy available in the blood almost independently from its specific activity (fox & raichle, 2007). the pattern of activity that can be observed during rest or daydreaming forms a “default mode network” (raichle, 2015). its activity ceases when the participant engages in specific cognitive tasks. at the same time, activity in other regions, the “task positive network” (tpn), increases (basten, stelzel, & fiebach, 2013). this suggests that energy expenditure shifts rather than rises during thinking. although the exact energy regime in the brain is still a matter of lively debate (howarth, gleeson, & atwell, 2012; de boeck & kovacs, 2020), we can state that the view that homo sapiens uses thinking sparingly to save energy is too simple. nevertheless, research on individual differences in cognitive functioning also considers energetic factors. debatin (2019) reviewed a number of studies that addressed the relation between glucometabolic function and cognitive performance and concludes that “there is an increasing amount of research supporting the hypothesis that individuals with better glucose regulation perform better in cognitive performance tasks than individuals with worse glucose regulation” (debatin, 2019, p. 4; see also lamport et al., 2009). however, most research in this area has focused on the role of glucose as substrate for oxidative phosphorylation, which is not the only way of providing energy in the body. an additional way, aerobic glycolysis, has received much less attention (vaishnavi et al., 2010), so that the view on these questions may change in the near future. taking up the idea of “shifting rather than rising energy expenditure” again and combining it with the fact of limited energy supply in the brain, one might 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 4 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving speculate that type 2 processing can only occur at the cost of other cortical processing. as these other processes might be essential for survival (e.g., scanning the ambience visually and/or aurally), selection pressure may have acted on excessive thinking during evolution. this speculation is compatible with calculations of the energy demand of neurons in the cortex on a molecular level, which gave rise to the assumption that the maximally available energy in the brain severely limits neuronal activity (lennie, 2003). however, newer calculations showed that action potentials demand much less energy than previously assumed, and that a good part of energy in the brain is used for functions that are independent of acute signaling, such as maintaining resting potentials or neurotransmitter recycling (howarth et al., 2012). although these modifications attenuate lennie’s (2003) original argument, they do not rule out the above speculation. it is generally problematic to draw inferences between different levels of abstraction (kästner, 2018; newell, 1994), even more when the evidence on the biological level is vast and controversial. however, psychological theories should be consistent with biological evidence, and the latter can help inspire the former through generating new hypotheses. in the case of the tendency to economize a glimpse into the neurosciences showed that the discussions there about energy expenditure in the brain justify a possible connection with modes of thinking. predictions of the dp account dual processing accounts have been criticized for not being able to make predictions (keren & schul, 2009). however, with the recent specifications (see above), i venture on some predictions in the area of complex problem solving. obviously, all cognitive processing involves type 1 and type 2 portions to different degrees. therefore, in the following statements, i use “type x processing” as shorthand term for “processing that is predominantly characterized as type x” – just for the sake of readability. for making predictions, we need to identify especially the broad range of type 1 processes. candidates are pattern recognition, incidental learning, implicit learning resulting in implicit knowledge (including specialized procedural knowledge). in complex dynamic control tasks, type 1 processes perform the following functions. the list is not intended to be complete. when performed with little or no practice, some of the functions might also be classified as type 2. 1. recognition of system states 2. recognition of system developments or temporal patterns 3. input response on recognized system states 4. unsystematic exploration (trial and error – can be useful under certain circumstances, e.g., finite state automata) 5. buildup of i-o knowledge 6. execution of automatized action sequences the following functions are governed mainly by type 2 processes: 1. systematic exploration of a dynamic system to acquire structural knowledge (e.g., using votat) 2. construction of a strategy for exploration 3. calculation of an intervention based on structural knowledge 4. construction of input tactics (what variables to manipulate in what order) 5. keeping the focus on the problem when difficulties arise 6. remembering to check background variables when we combine the classifications above with the propositions of the dp account, we arrive at following predictions about (complex) problem solving: 1. when confronted with a problem, most persons initially tackle it with a high proportion of type 1 processes such as unsystematic exploration. 2. learning to control a novel complex dynamic system requires type 2 processing. if central executive capacity is bound by other requirements, problem solving performance declines. 3. working with ample use of type 2 processing is not very common. it usually needs a considerable incentive such as sustaining a threatened selfesteem, feelings of challenge, or large extrinsic incentives (liddle et al., 2011). 4. advanced problem solvers have exploration strategies in their repertoire (e.g., votat) and can execute those largely in type 1 mode (without overloading their working memory). 5. extensive practice with a specific system leads to automatization, meaning the demand for type 2 processing decreases. 6. after transition to type 1 processing, it is difficult to detect changes in the system and respond to them appropriately (luchins, 1942; betsch et al., 2001). 7. the difficulty of a problem correlates predominantly with its requirement for type 2 processing (stanovich & west, 1999). however, individual differences in experience, which are reflected in implicit knowledge (type 1), may override the correlation (ackerman, 1990; weise et al., 2020). predictions 1 and 3, and less obviously, prediction 6, are instances of the tendency to economize. prediction 2 is based on considerations around the standard model of cps (see section “extant approaches”). other predictions rest on established theories of cognitive skill acquisition and automatization (anderson et al., 1997; norman & shallice, 1986). 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 5 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving dual processing in dynamis2 these considerations shall now be applied to a new variant of a dynamic problem solving environment, called dynamis2 (schoppek & fischer, 2017). inspired by allen newell (1973), who summoned his colleagues in cognitive science at the time: “analyze a complex task” (p.21) and “know the method your subject is using to perform the experimental task” (p.12), i present a detailed description of a typical problem within this environment together with possible strategies2 . in the complex dynamic control task environment dynamis2 (schoppek & fischer, 2017) systems are simulated using sets of linear equations. output variables (aka endogeneous variables) depend on the values of input variables (aka exogeneous variables), which are controlled by the problem solver, and on each other, including themselves. this idea is based on funke’s (1993) dynamis approach, which is also realized in microdyn (greiff, et al., 2012). one important feature of dynamis2 is that it is real-time driven: the simulation is updated every half second, regardless of whether the participant manipulates the input variables or not. this makes the dynamics of the simulated systems more rigorous than in most other cps environments and results in genuine time pressure for the participants. a typical run of the system consists of 250 simulation updates (called cycles), which represent a round. an experimental block in dynamis2 comprises one exploration round, where participants can freely vary the input variables without a specific goal state, followed by several rounds where participants are required to reach goal states provided by the experimenter. figure 1 shows a screenshot of the user interface. the following equations constitute an exemplary system that simulates the effect of three drugs (meda, medb, medc), administered continuously (as if from a drip), on three blood levels of three substances (muron, fontin, sugon). all variables and their relationships are fictitious in order to minimize prior knowledge influences. however, the course of the blood levels is plausible. m uront = 0.1 · m uront-1 + 2 · m edat-1 f ontint = f ontint-1 + 0.5 · m uront-1 − 0.2 · sugont-1 + m edbt-1 sugont = 0.9 · sugont-1 + m edc t-1 the effects of the output variables on themselves result in an eigendynamic (or momentum) that is more pronounced the higher the coefficient is. for example, muron’s level, with an eigendynamic coefficient of 0.1, responds quickly to the administration of meda, whereas sugon reaches a stable level only slowly, given a constant input of medc. fontin, having the coefficient one, tends to accumulate, which can only be prevented by a certain level of sugon. the characteristics of the system have implications for all possible control strategies, regardless of being based on type 1 or type 2 processing: as muron can only be controlled with meda, and also has a positive effect on fontin, the latter must be prevented from increasing steadily. this can only be achieved using medc. medc raises the level of sugon, which in turn decreases fontin. however, as the effect of medc on sugon unfolds slowly, it is almost impossible to control the level of fontin by varying medc. a straightforward strategy for reaching and maintaining the goal state is to keep medc constant at a certain level (e.g., 25), wait until sugon levels off, and eventually use medb to raise fontin to the desired level. additionally, meda needs to be set to 45 at some time during this process to reach the goal of muron=100. of course, other strategies are possible, but it is important to recap that using medc for a fine tuning of fontin is adverse. a participant who conforms to the standard model would start exploring the system by varying the three input variables one at a time (votat). to detect the eigendynamics of the output variables, she should apply a pulse tactic (schoppek & fischer, 2017) that consists of setting an input variable to a positive value, then back to zero, and observe the course of the output variables. from her observations, the participant can induce all causal relations that constitute the system. when it comes to targeted control, the participant can use her structural knowledge to develop a control strategy and deduce specific input values. from this description, it is obvious that such an approach involves inferential reasoning, which puts a heavy load on working memory and can be characterized as type 2 processing. on the other hand, what can a participant who takes an intuitive approach learn? he notices early that muron can only be controlled with meda. he will also notice that fontin tends to increase. because fontin shall be kept at 1000, he will search for a means to prevent fontin from growing. eventually, he will find out that only medc does that. this participant has gained rudimentary structural knowledge, which he uses to control the system: he will set meda to the value that brings (and keeps) muron to 100 (this can be accomplished by visuomotor closed-loop control). then he tries to control fontin by adjusting medc. as fontin responds to changes in medc only gradually, this strategy rarely succeeds. i will refer to this as “strategy gamma”. this procedure mainly consists of visuomotor closed-loop control: doing something – watching – adjusting, which is a type 1 process. beckmann and goode (2017) called this “ad-hoc optimization” (see also beckmann & guthke, 1995). occasionally, the participant must draw some inferences from the observed: noticing and considering that only meda affects muron, that only medc limits fontin. these are type 2 processes. a little more reasoning could lead our participant to the conclusion that medc should be kept constant 2this unusual description of the material in the introduction is due to the theoretical nature of analyzing the strategies, which readers cannot understand unless they know the problem. 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 6 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving figure 1. screenshot of the user interface of a dynamis2 scenario. the lines represent the course of the output variables. fontin is displayed in a separate panel because of its range being larger than the ranges of the other variables. the current values of the input variables are listed in the top left corner (meda . . . medc), those of the output variables in the bottom left corner. to prevent fontin from fluctuating in a delayed manner. he will notice that fontin responds much quicker to medb and uses this medicine to fine-tune fontin. this feasible strategy gets by with rudimentary structural knowledge and hence deviates considerably from the standard model. i shall label this strategy, which is characterized by low variation of med a and med c, and higher variation of med b, “strategy beta”. compared to strategy gamma, the development of strategy beta involves a higher share of type 2 processing. to make the strategy classification complete, i introduce a third strategy “alpha”, where all input variables are varied. strategy alpha is characteristic for early exploration phases. experiment 1 the purpose of this experiment was to challenge the standard model of cps. this means most hypotheses were formulated under the assumption that the standard model was valid. by varying the presence of a secondary task, intended to increase the burden on working memory, the propensity to adopt a standard or intuitive approach should be manipulated. the secondary task was sentence verification, which has proved its utility in the context of measuring working memory capacity (daneman & carpenter, 1980; unsworth et al., 2009). adding working memory load should make the use of working memory intensive strategies less likely. in terms of the dp framework, this should disturb type 2 processing, leading to a greater proportion of type 1 processing. it is not realistic to expect that all participants conform to a certain model (standard model, intuitive model). also, the proportion of using either type of processing cannot be measured directly. therefore, i started with the working hypothesis that most participants conformed to the standard model of cps. from this, one can derive testable hypotheses: under the assumption that the standard model was true, i expected that participants in the dual task condition – as compared to the single task condition – perform worse and gain less structural knowledge. additional and more specific hypotheses are listed after the description of details of the experiment. materials and measures as complex dynamic control tasks, three dynamis2 systems were used. the rationale of dynamis2 and the first system was described in the introduction. the equations of the other two problems are listed in the 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 7 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving appendix. a distinctive feature of the present study was that i used a board with three physical sliders as input device. this should enable participants to control the system in an intuitive, sensory-motor style. because the sliders are resting in their positions unless the user moves them, the same applies to the input values. in the beginning of the experiment, the sliders were in their minimum position (zero). negative input values were not possible. one round consisted of 250 cycles with 0.5 s each, resulting in a total time of 2 min and 5 s. control performance was measured by the number of rounds each participant took to reach the criterion (variable “trials to criterion”). the criterion was reaching the targets and keeping them for ten cycles in two consecutive rounds. as the number of rounds was limited to 15, some participants did not reach the criterion. their performance was coded as 16. participants in the dual task condition were asked to do a sentence verification task concurrently with system control. a female voice spoke sentences that were either meaningful or not. in the first case, participants were to respond by saying “yes”, in the second case by speaking “no”. for example, a meaningful sentence was “oranges grow on trees”, an absurd sentence was “litter goes into the litter nose”. there were 25 sentences during one round – one sentence every five seconds on average. after each exploration round, participants were asked to enter the effects they had inferred as arrows into a diagram that showed the variables of the system. from this, a structural score was calculated as the difference between the numbers of correctly and wrongly marked effects. additionally, i carried out exploratory analyses of the strategies. the associated definitions are described in a separate section “strategies” under results. design and procedure two factors were varied between subjects and one factor was varied within subjects. in the dual task condition (dt), participants had to do the sentence verification task while controlling the dynamis2 system medicine 1. there was also a single task condition (st) without the verification task. the single vs. dual task factor was varied in the first block only to avoid overburdening the participants with a continued dual task requirement. all blocks began with a free exploration round without given goal states. the second factor consisted in a variation of the sequence in which the dynamis2 systems had to be controlled. both conditions started with a specific goal state for the system medicine 1 (muron = 100, fontin = 1000). in the blocked condition, participants continued with a task consisting of a changed goal state for the same system (muron = 80, fontin = 1500; near transfer) followed by a new system (medicine 2, bulmin = 1000, grilon = 80; far transfer). in the spaced condition, the order of transfer problems was reversed (far transfer first, then near transfer). hence, participants in the blocked condition had more experience with the task environment before turning to medicine 2 than participants in the spaced condition. in both conditions the session ended with a third system (growing vegetables), which is not reported here. the different tasks can be viewed as a third factor that was varied within subjects. figure 2 shows the sequence of tasks in the different conditions. participants seventy-three persons participated in the experiment: 42 women and 31 men. participants were studying different majors (32 economics, business administration or law, 16 humanities or social sciences, and 20 sciences, five did not provide the information) at a german university. participants provided informed consent and all procedures followed the principles of the declaration of helsinki. hypotheses hypothesis 1.1: participants in the st condition take fewer rounds for reaching the goal criterion in the source problem and in the near transfer problem than those in the dt condition. hypothesis 1.2: participants in the st condition acquire better structural knowledge about the source problem than those in the dt condition. hypothesis 1.3: structural knowledge and problem solving success are correlated positively, particularly in the st condition. hypothesis 1.4: the use of the pulse tactic in the first two rounds is predictive for (a) structural knowledge and for (b) success, particularly in the st condition. hypothesis 1.5: participants solve the far transfer problem faster in the blocked condition than in the spaced condition. as described above, the hypotheses are based on the working hypothesis that the standard model of cps with its emphasis on acquisition and application of structural knowledge is an adequate description of cps. hypothesis 1.4 was formulated as a replication of results found in an earlier experiment (schoppek & fischer, 2017). the pulse tactic involves systematic setting back of input values to zero in order to observe the eigendynamics of output variables and has been shown to predict success in several complex dynamic control tasks (beckmann, 1994; lotz et al., 2017). hypothesis 1.5 is based on the fact that participants in the blocked condition have a second opportunity to work with the same system. the original consideration was that this enabled participants to further analyze the causal structure of the system they already know. during this opportunity, they can acquire strategic knowledge about exploration of a system, which they can transfer to the new system (far transfer). this effect should be most prominent in the dt condition, because the secondary task is omitted 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 8 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving figure 2. diagram of the experimental design. g1, g2: different goal states; st: single task, dt: dual task; knowlt1: structural knowledge test for medicine 1. in the near transfer problem, which makes the second opportunity more profitable in the dt condition. at first glance, this prediction seems to contradict the standard model with its focus on structural knowledge (which cannot be transferred to the far transfer problem). however, the standard model does not state that structural knowledge is the only relevant type of knowledge and therefore does not preclude the acquisition of strategic knowledge. additionally, the hypothesis refers to a less specific effect of cognitive load, which is reduced in the second block of medicine 1 due to increased familiarity (van merriënboer, 1997). this effect pertains to a reduction of extraneous load (handling the task environment) and intrinsic load (reaching the goals), resulting in more resources for learning any kind of knowledge or skills (germane load). apart from testing the hypotheses, i will report results about differences between participants studying certain subjects, and detailed analyses about the use of strategies and their relation to control performance. in an a priori power analysis, the expectation was that effects of d = 0.65 (medium) should be detected by a one tailed t-test with a power of 1 − β = .85 and a significance level of α = .05. this resulted in a sample size of n = 35 per condition. as it turned out that some of the variables markedly differed from the normal distribution, nonparametric tests were applied, the power of which is a little lower than the t-test’s. post hoc, the power of the one tailed t-tests with the present sample is 1 − β = .87, for the u-tests it is 1 − β = .85. all power analyses were conducted using the software g-power (faul et al., 2009). results the results are presented in two sections. first, i report the analyses for testing the hypotheses. in a second part, i report some exploratory analyses that can support the interpretation of results or can be used to generate new hypotheses. testing the hypotheses table 1 shows descriptive statistics for the main variables. we see that the scenario medicine 1 was a difficult problem. many participants did not reach the goal criterion in 15 rounds (coded as 16). the scenario medicine 2 was much easier. the range from 3 to 13 trials to criterion indicates that all participants reached the goals. it is very unlikely that this marked difference between the scenarios is only due to practice, because one half of the sample (the spaced condition) worked on medicine 2 before they repeated medicine 1 with changed target values. the means of the structure score show that in both conditions, participants identified little more than one causal relation on average. given the five possible relations, this is a low value. the average number of pulse events in the first two rounds is also rather low3 . for block 1 the distributions of trials to criterion were clearly deviating from a normal distribution in both conditions (figure 3). local modes can be identified at 7 to 8 trials and at 12 to 13 trials. the most frequent value in both conditions was 16, meaning that the criterion was not reached. due to the peculiar distributions, i calculated nonparametric statistical tests. for all scenarios, the u-tests indicated no significant differences between the dual vs. single task conditions (medicine 1.1: u = 689.5, p = .636, medicine 1.2: u = 703, p = .532). comparing the medians (see table 1) shows that the median in the dual task condition was even lower than in the single task condition. so, hypothesis 1.1 was not supported by the data. for the analyses pertaining to hypotheses 1.2 to 1.4, six participants with missing structure scores were removed from the sample (three in each condition), resulting in n = 34 and n = 33 in the dt and st conditions, respectively. with respect to hypothesis 1.2, a t-test revealed no significant difference in the structure score between the st and dt conditions (mst = 1.15, mdt = 1.24, t = −0.18, p = .573, cohen’s d = −0.048). hence, hypothesis 1.2 was not supported by the data. 3i have also calculated an alternative measure of cps performance, based on goal deviations, which i have not reported. the measure has a similarly peculiar distribution as the reported measure and does not reflect goal attainment as well as the reported measure. the results were qualitatively the same as for trials to criterion. 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 9 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving table 1. descriptive statistics of dependent variables from experiment 1. single task dual task m mdn sd range m mdn sd range ttca med 1.1 12 4 to 16 11.5 4 to 16 ttc med 1.2 6 2 to 16 4 2 to 16 ttc med 2 4 3 to 16 4 3 to 13 strucscore med 1.1 1.15 2.18 -5 to 5 1.24 1.48 -2 to 4 pulse med 1.1 2.89 2.01 0 to 8 2.57 1.94 0 to 7 a ttc: trials to criterion, strucscore: structure score – a measure of structural knowledge, pulse: number of impulse events in the first two rounds. to test hypothesis 1.3, i calculated spearman’s rho between the structure score and trials to criterion in medicine 1. for the whole sample, this results in rho = −.247 (p = .044). in the st and dt conditions, i obtained rho = −.513 (p = .002), and rho = .020 (p = .913), respectively (zdiff = 2.08, p = .019). hence, hypothesis 1.3 was supported. as expected, the correlation is significantly larger in the st condition. spearman’s rank correlations between the number of pulse inputs in the first two rounds and the structure score were rho = .371 (p = .033) in the st condition and rho = −.070 (p = .696) in the dt condition (zdiff = 1.72, p = .043), supporting hypothesis 1.4a. hypothesis 1.4b was not supported by the data: the correlations between pulse and trials to criterion were rho = −.328 (p = .062) in the st condition and rho = −.153 (p = .388) in the dt condition (zdiff = 0.68, p = .247). hypothesis 1.5 stated better performance of the blocked condition in the far transfer problem. although the medians of trials to criterion in medicine 2 do not differ much between the conditions, the exfigure 3. distributions of the number of trials to achieve the goal criterion in the single task and dual task conditions. sixteen means that the target was not achieved in 15 rounds. pected difference was significant. participants in the blocked condition solved that problem earlier than in the spaced condition (u = 477.5, p=.047), so the hypothesis is supported. however, the supposed reason for that – a better acquisition of strategic knowledge in the blocked condition – was not supported in additional analyses: in medicine 2, participants in the blocked condition used the pulse tactic only slightly more than those in the spaced condition (m = 3.73 vs. m = 3.25, t = 1.00, one-sided p = .160, cohen’s d = 0.162). exploratory analyses to analyze differences between the participants of the study they were assigned to three categories: “sciences” (chemistry, physics, biology, mathematics, engineering sciences), “economics” (economics, law), and “arts & humanities” (history, cultural studies, languages, social sciences). kruskal-wallis tests revealed significant effects of the participants’ subject of study on trials to criterion in all three problem solving blocks. figure 4 shows boxplots of the results in medicine 1.1 (panel a) and medicine 2 (panel b). we see that science students solved the problem considerably faster than students of other fields of study (medicine 1.1: χ2 = 9.52, df = 2, p = .009, medicine 1.2: χ2 = 6.65, df = 2, p = .036, medicine 2: χ2 = 8.48, df = 2, p=.014). this confirms similar results from earlier studies (schoppek, 2004; schoppek & fischer, 2017). to classify the strategies used in the source problem, i calculated the standard deviations for each input variable across all 250 cycles of each round. this allows judging how much a variable was varied by the problem solver. based on these indicators, three main strategies and two marginal strategies4 were identified. strategy alpha is defined by varying input variables medb and medc (both sds ≥ 0.7). the sd for meda may be zero because many participants keep this input constant at a value of 45). strategy beta is defined by keeping medc relatively constant (sd < 0.7) and using medb to control fontin (sd ≥ 0.7). strategy gamma is defined the other way round: keeping medb constant and using medc for controlling fontin. the marginal strategies were “minimal”, de4the name is due to the rare occurrence of those strategies. 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 10 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving figure 4. boxplots of the number of trials needed to achieve the target in three areas of study (16 means that the target was not achieved in 15 rounds). left panel: medicine 1.1, right panel: medicine 2. fined by varying all input variables only slightly (all sds < 0.7), and “onlya”, defined by varying almost entirely meda (sd for meda ≥ 0.7, all other sds < 0.7). the marginal strategies were used in 2.1% of all rounds. i removed the first round of all participants from the dataset, because this round was declared as exploration round, and the participants were supposed to vary all input variables by instruction. of the remaining 728 rounds, 77.5% were classified as alpha, 7.7% as beta, and 12.8% as gamma. the success rates of the strategies were markedly different (see table 2). as expected, strategy beta was the most successful (68% success rate). to analyze the relations between strategy and success (having reached the goals), i calculated a generalized linear mixed model with the three main strategies and the participant as predictors and success in each round as dependent variable. the analysis, which estimates the parameters of a multilevel logistic regression model, was calculated with the function glmer from the r-package lme4 (bates et al., 2015). note that in this analysis the entity at level 1 is a round of dynamis2. participants are located on level 2. therefore, participants figure as predictor and the analysis is based on n = 728 data points. strategy alpha was used as the baseline, coded with zero in the dummy variables. the marginal strategies were omitted due to their rare occurrence. the variance between participants was factored in by estimating a random intercept for each participant. the estimated parameters are to be interpreted as odds (intercept) or odds ratios (predictors) on a log scale. the estimated value for the intercept was −2.375 (z = −11.26, p < .001), meaning that it is significantly more likely being not successful using strategy alpha than being successful using any other strategy. the log odds ratios for strategies beta and gamma were 3.452 (z = 8.57, p < .001) and 1.824 (z = 5.49, p < .001), respectively. this means that the odds of being successful using strategy beta are e3.452 = 31.6 times higher than the odds for any other strategy. for strategy gamma this ratio is e1.824 = 6.2. discussion overall, experiment 1 has not completely supported the predictions of the standard model of cps. the dynamic system medicine 1 has turned out equally difficult in the single task (st) and the dual task (dt) conditions (hyp. 1.1). moreover, the participants in the st condition have not gained better structural knowledge about the system (hyp. 1.2). in a comparable experiment, hundertmark, holt, fischer, said, & fischer (2015) also found a much smaller effect (η2 = .01) of a cognitive load manipulation on system control than they had expected. in retrospect, these hypotheses might not have been well justified, because they implicitly assumed that approaching the control task with a greater amount of type 2 processing would be superior. this assumption can be doubted generally, because the relation between approach and success is moderated by factors such as expertise or cognitive ability (evans, 2012; gigerenzer & brighton, 2011; gobet & chassy, 2009). in particular, dynamis2 problems with their time pressure, their lack of a log of input variations, and the analog user interface (sliders and graphs) probably suggest a type 1 approach much more than conventional dynamis applications, featuring input logs, much fewer time steps, and little time pressure. another reason why the expected differences have not been found in experiment 1 could be that the concurring tasks called for different subsystems of working memory (baddeley, 2007), the sentence verification task being clearly verbal, the dynamis2 problem being more visual-motor. in devising the hypotheses, i had assumed an important role of the central executive for 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 11 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving table 2. strategy use and associated success in two data sets. proportions refer to individual rounds (each round of each participant was counted). strategy alpha beta gamma minimal onlya all experiment 1 proportion 0.78 0.08 0.13 0.01 0.01 1 rate of success 0.09 0.68 0.34 1 0.13 0.18 experiment 2 proportion 0.84 0.04 0.11 0.01 0.01 1 rate of success 0.10 0.54 0.27 1 0.14 0.14 system control, for moving the attention between the two tasks and for the decisions about the meaning of the sentences. it might be that the sentence verification task called for central executive processes far less than expected. this surmise should be tested with varied secondary tasks. the predictions of the standard model about the role of structural knowledge (hyp. 1.3) and the pulse tactic (hyp. 1.4) for controlling the system have been confirmed, albeit the effect is due to the st condition only. whereas the level of structural knowledge was quite low in both conditions, the range was much larger in the st condition. the reason for this pattern of results could be that only part of the participants proceeded in accordance with the standard model, that this group was larger in the st condition, and that this proceeding does not warrant success (as indicated by the markedly negative minimum of the structure score in the st condition). this interpretation raises the question about the proceeding of the other participants. i will get back to that question below. lastly, the expectation that the longer experience with the source problem in the blocked condition is beneficial for the far transfer problem was confirmed (hyp. 1.5). as the participants cannot transfer structural knowledge to the new system (far transfer), this effect must be due to other types of knowledge. however, the data did not support the supposed mediation of the effect through more use of the pulse tactic. experiment 2 in this experiment, dynamis2 was used for inducing ego depletion (baumeister, vohs, & tice, 2007) in the context of a study about training self-control through regular physical activity (schoppek, in prep.). for the present research, i report only the results related to dynamis2. participants, design, and hypotheses seventy-seven subjects from the same population as in experiment 1 participated in the experiment (students of different majors at the university of bayreuth, 48 female, 29 male). participants worked on the same problem as the source problem from experiment 1 for a maximum of 15 rounds. the sentence verification task was also administered concurrently. with the results of experiment 2, the explorative results from experiment 1 can be cross validated. therefore, the hypotheses for experiment 2 were as follows: hypothesis 2.1: science students solve the problem in fewer rounds than students of other majors (particularly faster than students of arts and humanities). hypothesis 2.2: strategy beta is the most successful strategy, followed by strategy gamma and strategy alpha as least successful strategy. the first hypothesis is not only based on the results of experiment 1, but also on earlier findings with dynamis2 (schoppek & fischer, 2017) or a predecessor system (schoppek, 2004). results participants reached the goal criterion within 4 to 16 rounds (16 meaning they never reached it). the median was 13. these values are close to those from experiment 1 (see table 1). experiment 2 confirmed the differences among the students of the three categories of majors (kruskalwallis test, χ2 = 9.34, df = 2, p = .009). however, an examination of the medians shows that the differences are due to the poor performance of the arts & humanities students (mdn=16). the economics students (mdn=11) performed similarly to the science students (mdn=11.5). the u-test comparing the combined science and economics group with the arts & humanities group was significant (u = 729, p = .002). for cross validation of the strategy results, the same analysis as in experiment 1 was applied: a generalized linear mixed model with the three main strategies and the participant as predictors and success in each round as dependent variable. (please recall that in this analysis the entity at level 1 is a single round. participants are located on level 2). the present analysis is based on n=806 data points. the results were qualitatively the same as in experiment 1: the odds for the intercept (corresponding to strategy alpha) were −2.316 (z = −11.64, p < .001). strategy beta was the most successful of the main strategies (log odds ratio = 2.630, z = 5.85, p < .001), followed by strategy gamma (log odds ratio = 1.568, z = 4.28, p < .001). descriptive statistics for this analysis are displayed in table 2. 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 12 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving discussion the effect of the subject of study on problem solving performance has been replicated with respect to the difference between the science students and the arts & humanities students (see schoppek, 2004). however, the economics students were as fast in solving the problem as the science students. the effect of subject of study points again to the important role of knowledge other than structural knowledge. science students (and probably economics students, too) have more experience with diagrams of quantitative gradients and with the notion of dynamic systems than students of arts & humanities. additionally, i have received the impression that the relevance of controlling dynamic systems for the participants’ selves matters. in case of failure, arts students may well take comfort in thinking “this kind of stuff has never been my cup of tea” – and disengage from the task. this is much harder for science students, who probably feel their subject-based self-esteem challenged in the face of failure. this hypothesis should be investigated in future studies. the assumed relatedness of systems of the dynamis type with certain topics in the sciences is consistent with findings that problem solving in microdyn correlates most closely with school grades in math and science (greiff et al., 2013; greiff et al., 2013). it is also in line with findings from pisa 2003 that math and science competence significantly contribute to problem solving across 41 countries (scherer & beckmann, 2014). experiment 2 has also replicated the role of the different strategies. in both experiments, the little effective strategies alpha and gamma prevailed (whereby strategy alpha, characterized by consistently varying two input variables, might not even be worthy of the name “strategy”). this prevented many participants from reaching the goals. only in 4% (exp. 1: 8%) of all rounds, participants used the more sophisticated strategy beta, which had a much higher rate of success. this preference for self-evident but inadequately simple strategies is a further instance of the tendency to economize (dörner, 1996). general discussion we have enough evidence now that structural knowledge is beneficial for controlling complex dynamic systems (funke, 1993; greiff et al., 2013; schoppek & fischer, 2017), but also that by no means all participants conform to the standard model (fischer et al., 2012). this also became apparent in a study using microdyn (stadler, hofer, & greiff, 2020) where individual differences in problem solving behavior were found in participants who obtained the same cps scores. in the present experiments, many participants preferred an intuitive approach, which is on average less successful. so, one of the most important research questions for the future is to investigate the conditions, under which problem solvers switch from the “default mode”, which is dominated by type 1 processes, to effortful thinking, which involves much type 2 processing. this question is not only relevant to problem solving, but also to judgment and decision-making. we can find one answer to that question in existing research: rewards. although kahneman, slovic, and tversky (1982) have obtained their findings about heuristic judgement despite rewarding their participants for correct answers (e.g., kahneman & tversky, 1972), there is evidence from diverse areas that attractive rewards motivate individuals to engage in effortful control or thought. they instigate persons to overcome ego depletion (muraven & slessareva, 2003), they can markedly reduce adhd symptoms in an experimental setting (liddle et al., 2011), and they counteract fatigue (inzlicht & berkman, 2015). the interpretation about threatened self-esteem in science students can also be subsumed under this account, albeit the incentive is negative in that case. this is in line with the statement by inzlicht and berkman (2015) that “affirming some core value . . . similarly prevents the reductions in self-control” (p.516). we can investigate the potency of such mechanisms in problem solving well with cdc tasks like dynamis2. their complexity, dynamics, time scale, and interactivity make such tasks more similar to real life requirements than the more artificial, highly standardized and short system control items in the multiple complex systems approach (greiff et al., 2012; neubert et al., 2015). future research needs to clarify the relations among effortful thinking, its behavioral indicators, and success. for instance, kahneman (2011) described the pupil reaction as indicator for type 2 processing. in the present study, i took the strategy beta as indicator for type 2 processing and strategy gamma for type 1 processing. this provision as well as other indicators should be validated further. similarly, the relation between reasoning and success is not trivial. kahneman, slovic, and tversky (1982) have been criticized for almost equating reasoning with normative solutions (gigerenzer & brighton, 2011). with respect to this problem, evans (2012) stated that “normative correctness cannot be a defining feature of type 2 processing because it is an externally imposed evaluation and not intrinsic to definitions based upon explicit processing through working memory" (p.123). therefore, the relation can be subject to empirical investigation. as in other areas (stanovich & west, 1999), one would expect an advantage of the “analytic approach” to cps that is moderated by individual differences in intelligence (greiff et al., 2013). however, even an approach that is dominated by type 2 processing can be automated with extensive practice and hence get less dependent on cognitive ability (a phenomenon closely associated with the elshout-raaheim hypothesis, which has recently been confirmed in a cps study, weise, et al., 2020). for making progress in understanding and predicting cps, we need a more general theory about problem solving, or as beckmann (2019) stated, “some ex ante ideas are needed about both the real-life problem and the laboratory task” (p.3). models that are tailored 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 13 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving to a narrow class of tasks, like the standard model of cps, are only helpful when they are embedded in a more general theoretical framework, like the dp approach. to that end, i envision a theory of mental states, which characterizes classes of states and specifies the rules that govern the transitions among those states. this description applies to a number of important and more or less successful theories: the rubicon model of action phases (gollwitzer, 1990), the flow theory (csikszentmihalyi et al., 2014), or the resource model of self-control (baumeister et al., 2007) with its recent modifications by inzlicht and schmeichel (2012). for example, ego depletion is characterized as a state in which persons are not willing or not able to exert effortful control. persons enter this state when having spent effortful control for a while, and exit it when consuming sugar or experiencing humor, amongst other things. given these assumptions, trying to reach the goals in a dynamis2 scenario using an analytical approach, which involves much type 2 processing, can lead to ego depletion. on the other hand, it is conceivable that participants are getting so involved in the control task that they experience a state of flow. csikszentmihalyi, abuhamdeh, and nakamura (2014) characterize flow as “intense experiential involvement in moment-to-moment activity. attention is fully invested in the task at hand, and the person functions at his or her fullest capacity” (p. 214). this apparently involves type 2 processing. to my knowledge, it has not been investigated whether flow is usually followed by ego depletion or not. as the activities during a state of flow are not accompanied by feelings of labor, i suppose it is not. from a dp perspective, flow can be described as resulting from a seamless interplay between a bird’s eye view on the situation, which is maintained and handled by type 2 processing, and a broad array of potent type 1 processes that are orchestrated through decisions on the top level (type 2). these are just a few examples of existing connection points that might enable a unification of those theories in the future. i regard such a unified theory of mental states as a convenient framework for specific theories about problem solving in dynamic and uncertain situations – also known as cps. as mentioned earlier, effortful thinking does not always generate better results than an intuitive approach; but in general, overcoming the tendency to economize is desirable, not just in the laboratory, but also in real life. acknowledgements: i want to thank three reviewers and the action editor for their valuable hints that were helpful for improving earlier versions of the manuscript. declaration of conflicting interests: the author declares no conflicts of interest. peer review: in a blind peer review process, jens f. beckmann, andré kretzschmar, and matthias stadler have reviewed this article before publication. all reviewers have approved the disclosure of their names after the end of the review process. handling editor: varun dutt copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: schoppek, w. (2023). a dual processing approach to complex problem solving. journal of dynamic decision making, 9, 1–17. doi:10.11588/jddm.2023.1.76662 received: 09.11.2020 accepted: 15.03.2023 published: 20.06.2023 references ackerman, p. l. (1990). a correlational analysis of skill specificity: learning, abilities, and individual differences. journal of experimental psychology: learning, memory, and cognition, 16 (5), 883–901. https://doi.org/10.1037/0278-7393.16.5.883 anderson, j. r., fincham, j. m., & douglass s. (1997). the role of examples and rules in the acquisition of a cognitive skill. journal of experimental psychology: learning, memory, and cognition, 23, 932–945. anderson, j. r., & lebiere, c. (1998). the atomic components of thought. mahwah, nj: erlbaum. baddeley, a.d. (2007). working memory, thought and action. oxford: oxford university press barnard, c., & simon, h. a. (1947). administrative behavior. a study of decision-making processes in administrative organization. new york: free press. basten, u., stelzel, c., & fiebach, c. j. (2013). intelligence is differentially related to neural effort in the task-positive and the task-negative brain network. intelligence, 41 (5), 517–528. https://doi.org/10.1016/j.intell.2013.07.006 bates, d., maechler, m., bolker, b., & walker, s. (2015). fitting linear mixed-effects models using lme4. journal of statistical software, 67 (1), 1–48. https://doi.org/10.18637/jss.v067.i01 baumeister, r. f., & tierney, j. (2011). die macht der disziplin: wie wir unseren willen trainieren können. frankfurt/new york: campus-verlag. baumeister, r. f., vohs, k. d., & tice, d. m. (2007). the strength model of self-control. current directions in psychological science, 16 (6), 351–355. retrieved from http://dx.doi.org/ 10.1111/j.1467-8721.2007.00534.x beckmann, j. (1994). lernen und komplexes problemlösen [learning and complex problem solving ]. bonn: holos. beckmann, j.f. (2019). heigh-ho: cps and the seven questions – some thoughts on contemporary complex problem solving research. journal of dynamic decision making, 5 (12). https://doi.org/10.11588/jddm.2019.1.69301 beckmann, j.f., & goode, n. (2017). missing the wood for the wrong trees: on the difficulty of defining the complexity of com10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 14 https://doi.org/10.11588/jddm.2023.1.76662 https://doi.org/10.1037/0278-7393.16.5.883 https://doi.org/10.1016/j.intell.2013.07.006 https://doi.org/10.18637/jss.v067.i01 http://dx.doi.org/10.1111/j.1467-8721.2007.00534.x http://dx.doi.org/10.1111/j.1467-8721.2007.00534.x https://doi.org/10.11588/jddm.2019.1.69301 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving plex problem solving scenarios. journal of intelligence, 5 (15), 1-18. https://doi.org/10.3390/jintelligence5020015 beckmann, j. f., & guthke, j. (1995). complex problem solving, intelligence, and learning ability. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective (pp. 177–200). psychology press. betsch, t., haberstroh, s., glockner, a., haar, t., & fiedler, k. (2001). the effects of routine strength on adaptation and information search in recurrent decision making. organizational behavior and human decision processes, 84 (1), 23–53. https:// doi.org/10.1006/obhd.2000.2916 boeck, p. de, & kovacs, k. (2020). the many faces of intelligence: a discussion of geary's mitochondrial functioning theory on general intelligence. journal of intelligence, 8 (1). https://doi.org/10.3390/jintelligence8010008 broadbent, d. e., fitzgerald, p., & broadbent, m. h. p. (1986). implicit and explicit knowledge in the control of complex systems. british journal of psychology, 77 (1), 33–50. https://doi.org/ 10.1111/j.2044-8295.1986.tb01979.x buchner, a., funke, j., & berry, d. c. (1995). negative correlations between control performance and verbalizable knowledge: indicators for implicit learning in process control tasks? the quarterly journal of experimental psychology. a, human experimental psychology, 48 (1), 166–187. https://doi.org/10.1080/ 14640749508401383 clausewitz, c. von (1832/1991). vom kriege (ed. werner hahlweg). bonn: dümmler. csikszentmihalyi, m., abuhamdeh, s., & nakamura, j. (2014). flow. in m. csikszentmihalyi (ed.), flow and the foundations of positive psychology: the collected works of mihaly csikszentmihalyi (pp. 227–238). heidelberg, new york: springer. daneman, m., & carpenter, p. a. (1980). individual differences in working memory and reading. journal of verbal learning & verbal behavior, 19, 450–466. doi:10.1016/s0022-5371(80)90312-6 davis, z. j., bramley, n. r., & rehder, b. (2020). causal structure learning in continuous systems. frontiers in psychology, 11, 244. https://doi.org/10.3389/fpsyg.2020.00244 debatin, t. (2019). a revised mental energy hypothesis of the g factor in light of recent neuroscience. review of general psychology, 23 (2), 201–210. diamond, a. (2013). executive functions. annual review of psychology, 64 (1), 135–168. https://doi.org/10.1146/annurev -psych-113011-143750 dienes, z., & fahey, r. (1995). role of specific instances in controlling a dynamic system. journal of experimental psychology: learning, memory, and cognition, 21 (4), 848–862. dienes, z., & fahey, r. (1998). the role of implicit memory in controlling a dynamic system. the quarterly journal of experimental psychology. a, human experimental psychology, 51 (3), 593–614. https://doi.org/10.1080/713755772 dörner, d. (1980). on the difficulties people have in dealing with complexity. simulation & games, 11 (1), 87–106. dörner, d. (1996). the logic of failure: recognizing and avoiding error in complex situations. new york, ny: basic books. dörner, d., & funke, j. (2017). complex problem solving: what it is and what it is not. frontiers in psychology, 8 (1153), 1–11. https://doi.org/10.3389/fpsyg.2017.01153 dörner, d., & schaub, h. (1994). errors in planning and decisionmaking and the nature of human information processing. applied psychology, 43 (4), 433–453. evans, j. st. b.t. (2008). dual-processing accounts of reasoning, judgment, and social cognition. annual review of psychology, 255–278. retrieved from http://dx.doi.org/10.1146/ annurev.psych.59.103006.093629 evans, j. st. b.t. (2012). spot the difference: distinguishing between two kinds of processing. mind & society, 11 (1), 121–131. retrieved from http://dx.doi.org/10.1007/s11299-012-0104-2 evans, j. st. b.t., & stanovich, k. e. (2013). dual-process theories of higher cognition: advancing the debate. perspectives on psychological science, 8 (3), 223–241. faul, f., erdfelder, e., buchner, a., & lang, a.-g. (2009). statistical power analyses using g*power 3.1: tests for correlation and regression analyses. behavior research methods, 41, 1149-1160. fischer, a., greiff, s., & funke, j. (2012). the process of solving complex problems. j. probl. solv. 4, 19–42. doi: 10.7771/19326246.1118 fox, m. d., & raichle, m. e. (2007). fox, m. d., & raichle, m. e. (2007). spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. nature reviews neuroscience, 8 (9), 700-711. https://doi.org/10.1038/nrn2201 funke, j. (1993). microworlds based on linear equation systems: a new approach to complex problem solving and experimental results. in g. strube & k. f. wender (eds.), knowledge and performance in complex problem solving (pp. 313–330). elsevier. fum, d. & stocco, a. (2003). instance vs. rule-based learning in controlling a dynamic system. in f. detje, d. dörner, & h. schaub (eds.), proceedings of the international conference on cognitive modelling (pp. 105–110). universitätsverlag bamberg. gigerenzer, g., & brighton, h. (2011). homo heuristicus: why biased minds make better inferences. in g. gigerenzer, r. hertwig, & t. pachur (eds.), heuristics: the foundations of adaptive behavior (pp. 2–27). oxford, new york: oxford university press. gigerenzer, g., hertwig, r., & pachur, t. (eds.). (2016). heuristics: the foundations of adaptive behavior (first issued as an oxford university press paperback). oxford university press. https://doi.org/10.1093/acprof:oso/9780199744282.001.0001 gobet, f., & chassy, p. (2009). expertise and intuition: a tale of three theories. minds and machines, 19 (2), 151–180. retrieved from http://dx.doi.org/10.1007/s11023-008-9131-5 greiff, s., fischer, a., wüstenberg, s., sonnleitner, p., brunner, m., & martin, r. (2013). a multitrait-multimethod study of assessment instruments for complex problem solving. intelligence, 41 (5), 579–596. retrieved from http://dx.doi.org/ 10.1016/j.intell.2013.07.012 greiff, s., and funke, j. (2009). measuring complex problem solving: the microdyn approach, in the transition to computerbased assessment lessons learned from large-scale surveys and implications for testing, eds f. scheuermann and j. björnsson (luxembourg: office for official publications of the european communities), 157–163. greiff, s., wüstenberg, s., & funke, j. (2012). dynamic problem solving: a new assessment perspective. applied psychological measurement, 36 (3), 189–213. https://doi.org/10.1177/ 0146621612439620 greiff, s., wüstenberg, s., molnár, g., fischer, a., funke, j., & csapó, b. (2013). complex problem solving in educational contexts—something beyond g: concept, assessment, measurement invariance, and construct validity. journal of educational psychology, 105 (2), 364–379. https://doi.org/10.1037/ a0031856 gollwitzer, p. (1990). action phases and mind-sets. in e. t. higgins & r. m. sorrentino (eds.), the handbook of motivation and cognition: foundations of social behavior (pp. 53–92). new york, ny: guilford press. howarth, c., gleeson, p., & attwell, d. (2012). updated energy budgets for neural computation in the neocortex and cerebellum. 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 15 https://doi.org/10.3390/jintelligence5020015 https://doi.org/10.1006/obhd.2000.2916 https://doi.org/10.1006/obhd.2000.2916 https://doi.org/10.3390/jintelligence8010008 https://doi.org/10.1111/j.2044-8295.1986.tb01979.x https://doi.org/10.1111/j.2044-8295.1986.tb01979.x https://doi.org/10.1080/14640749508401383 https://doi.org/10.1080/14640749508401383 https://doi.org/10.3389/fpsyg.2020.00244 https://doi.org/10.1146/annurev-psych-113011-143750 https://doi.org/10.1146/annurev-psych-113011-143750 https://doi.org/10.1080/713755772 https://doi.org/10.3389/fpsyg.2017.01153 http://dx.doi.org/10.1146/annurev.psych.59.103006.093629 http://dx.doi.org/10.1146/annurev.psych.59.103006.093629 http://dx.doi.org/10.1007/s11299-012-0104-2 https://doi.org/10.1038/nrn2201 https://doi.org/10.1093/acprof:oso/9780199744282.001.0001 http://dx.doi.org/10.1016/j.intell.2013.07.012 http://dx.doi.org/10.1016/j.intell.2013.07.012 https://doi.org/10.1177/0146621612439620 https://doi.org/10.1177/0146621612439620 https://doi.org/10.1037/a0031856 https://doi.org/10.1037/a0031856 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving journal of cerebral blood flow and metabolism, 32 (7), 1222– 1232. https://doi.org/10.1038/jcbfm.2012.35 hundertmark, j., holt, d. v., fischer, a., said, n., and fischer, h. (2015). system structure and cognitive ability as predictors of performance in dynamic system control tasks. j. dynam. decis. making, 1, 1–10. doi:10.11588/jddm.2015.1.26416 hunt, e. (2010). human intelligence. cambridge university press. inzlicht, m., & berkman, e. (2015). six questions for the resource model of control (and some answers). social and personality psychology compass, 9 (10), 511–524. https://doi.org/10.1111/ spc3.12200 inzlicht, m., & schmeichel, b. j. (2012). what is ego depletion? toward a mechanistic revision of the resource model of selfcontrol. perspectives on psychological science, 7 (5), 450–463. retrieved from http://dx.doi.org/10.1177/1745691612454134 kästner, l. (2018). integrating mechanistic explanations through epistemic perspectives. studies in history and philosophy of science, 68, 68–79. https://doi.org/10.1016/j.shpsa.2018.01.011 kahneman, d., slovic, p., & tversky, a. (1982). judgment under uncertainty: heuristics and biases. cambridge university press. kahneman, d., & tversky, a. (1972). subjective probability: a judgment of representativeness. cognitive psychology, 430– 454. retrieved from http://dx.doi.org/10.1016/0010-0285% 2872%2990016-3 kahneman, d. (2011). thinking, fast and slow. mcmillan. keren, g., & schul, y. (2009). two is not always better than one: a critical evaluation of two-system theories. perspectives on psychological science, 4 (6), 533–550. kretzschmar, a., hacatrjana, l., & rascevska, m. (2017). reevaluating the psychometric properties of microfin: a multidimensional measurement of complex problem solving or a unidimensional reasoning test? psychological test and assessment modeling, 59 (2), 157–182. kretzschmar, a., & süß, h. m. (2015). a study on the training of complex problem solving competence. journal of dynamic decision making, 1 (1), 1–14. https://doi.org/10.11588/jddm .2015.1.15455 kruglanski, a. w. & gigerenzer, g. (2011). intuitive and deliberative judgements are based on common principles. psychological review, 118, 97–109. lamport, d. j., lawton, c. l., mansfield, m. w., & dye, l. (2009). impairments in glucose tolerance can have a negative impact on cognitive function: a systematic research review. neuroscience & biobehavioral reviews, 33 (3), 394-413. lennie, p. (2003). the cost of cortical computation. current biology, 13 (6), 493–497. https://doi.org/10.1016/s0960-9822(03) 00135-0 liddle, e. b., hollis, c., batty, m. j., groom, m. j., totman, j. j., liotti, m., & liddle, p. f. (2011). task-related default mode network modulation and inhibitory control in adhd: effects of motivation and methylphenidate. journal of child psychology and psychiatry, 52 (7), 761–771. https://doi.org/10.1111/j.1469 -7610.2010.02333.x logan, g. d. (1988). toward an instance theory of automatization. psychological review, 95 (4), 492–527. lotz, c., scherer, r., greiff, s., & sparfeldt, j. r. (2017). intelligence in action – effective strategic behaviors while solving complex problems. intelligence, 64, 98–112. doi: 10.1016/j.intell.2017.08.002 luchins, a. s. (1942). mechanization in problem solving: the effect of einstellung. psychological monographs, 54, 95–111. müller, r. & urbas, l. (2020). adapt or exchange: making changes within or between contexts in a modular plant scenario. journal of dynamic decision making, 1, 1–10. muraven, m., & slessareva, e. (2003). mechanisms of self-control failure: motivation and limited resources. personality & social psychology bulletin, 29 (7), 894–906. neubert, j. c., kretzschmar, a., wüstenberg, s., & greiff, s. (2015). extending the assessment of complex problem solving to finite state automata. european journal of psychological assessment, 31 (3), 181–194. https://doi.org/10.1027/1015-5759/ a000224 newell, a. (1973). you can't play 20 questions with nature and win: projective comments on the papers of this symposium. in chase, w. g. (ed.), visual information processing. new york, ny: academic press. newell, a. (1994). unified theories of cognition. harvard university press. norman, d. a., & shallice, t. (1986). attention to action: willed and automatic control of behavior. in r. j. davidson (ed.), consciousness and self-regulation (pp. 1–18). new york, ny: springer. öllinger, m. (2017). problemlösen. in j. müsseler & m. rieger (eds.), allgemeine psychologie (pp. 587–618). springer berlin heidelberg. https://doi.org/10.1007/978-3-642-53898-8_16 osman, m. (2010). controlling uncertainty: a review of human behavior in complex dynamic environments. psychological bulletin, 136 (1), 65–86. osman, m., glass, b., & hola, z. (2015). approaches to learning to control dynamic uncertainty. systems, 3, 211–236. https:// doi.org/10.3390/systems3040211 quesada, j., kintsch, w., & gomez, e. (2005). complex problemsolving: a field in search of a definition? theoretical issues in ergonomics science, 6 (1), 5–33. http://dx.doi.org/10.1080/ 14639220512331311553 raichle, m. e. (2015). the brain’s default mode network. annual review of neuroscience, 38 (1), 433–447. https://doi.org/ 10.1146/annurev-neuro-071013-014030 scherer, r., & beckmann, j. f. (2014). the acquisition of problem solving competence: evidence from 41 countries that math and science education matters. large scale assessment in education, 2 (10). https://doi.org/https://doi.org/10.1186/ s40536-014-0010-7 schoppek, w. (in prep.). increasing self-regulatory strength through regular physical exercise? schoppek, w. (2002). examples, rules, and strategies in the control of dynamic systems. cognitive science quarterly, 2(1), 63– 92. schoppek, w. (2004). direction of causality makes a difference. in k. forbus, d. gentner, & t. regier (eds.), proceedings of the 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 16 https://doi.org/10.1038/jcbfm.2012.35 https://doi.org/10.1111/spc3.12200 https://doi.org/10.1111/spc3.12200 http://dx.doi.org/10.1177/1745691612454134 https://doi.org/10.1016/j.shpsa.2018.01.011 http://dx.doi.org/10.1016/0010-0285%2872%2990016-3 http://dx.doi.org/10.1016/0010-0285%2872%2990016-3 https://doi.org/10.11588/jddm.2015.1.15455 https://doi.org/10.11588/jddm.2015.1.15455 https://doi.org/10.1016/s0960-9822(03)00135-0 https://doi.org/10.1016/s0960-9822(03)00135-0 https://doi.org/10.1111/j.1469-7610.2010.02333.x https://doi.org/10.1111/j.1469-7610.2010.02333.x https://doi.org/10.1027/1015-5759/a000224 https://doi.org/10.1027/1015-5759/a000224 https://doi.org/10.1007/978-3-642-53898-8_16 https://doi.org/10.3390/systems3040211 https://doi.org/10.3390/systems3040211 http://dx.doi.org/10.1080/14639220512331311553 http://dx.doi.org/10.1080/14639220512331311553 https://doi.org/10.1146/annurev-neuro-071013-014030 https://doi.org/10.1146/annurev-neuro-071013-014030 https://doi.org/https://doi.org/10.1186/s40536-014-0010-7 https://doi.org/https://doi.org/10.1186/s40536-014-0010-7 https://doi.org/10.11588/jddm.2023.1.76662 schoppek: a dual processing approach to complex problem solving twenty-sixth annual conference of the cognitive science society. mahwah, nj: erlbaum, 1219-1224. schoppek, w. (2019). a flashlight on attainments and prospects of research into complex problem solving. j. dynam. decis. making, 5, 8. schoppek, w., & fischer, a. (2017). common process demands of two complex dynamic control tasks: transfer is mediated by comprehensive strategies. frontiers in psychology, 8, 2145. stadler, m., becker, n., gödker, m., leutner, d., & greiff, s. (2015). complex problem solving and intelligence: a metaanalysis. intelligence, 53, 92–101. https://doi.org/10.1016/ j.intell.2015.09.005 stadler, m., hofer, s., & greiff, s. (2020). first among equals: log data indicates ability differences despite equal scores. computers in human behavior, 111, 106442. https://doi.org/10.1016/j.chb .2020.106442 stanovich, k. e., & toplak, m. e. (2012). defining features versus incidental correlates of type 1 and type 2 processing. mind & society, 11 (1), 3–13. https://doi.org/10.1007/s11299-0110093-6 stanovich, k. e., & west, r. f. (1999). individual differences in reasoning and the heuristics and biases debate. in p. l. ackerman, p. c. kyllonen, & r. d. roberts (eds.), learning and individual differences (pp. 389–415). washington, dc: apa. sun, r., slusarz, p., & terry, c. (2005). the interaction of the explicit and the implicit in skill learning: a dual-process approach. psychological review, 112 (1), 159. taatgen, n. a., & wallach, d. (2002). whether skill acquisition is rule or instance based is determined by the structure of the task. cognitive science quarterly, 2 (2), 163–204. tschirgi, j. e. (1980). sensible reasoning: a hypothesis about hypotheses. child development, 51 (1), 1. https://doi.org/10 .2307/1129583 unsworth, n., spillers, g. j., & brewer, g. a. (2009). examining the relations among working memory capacity, attention control, and fluid intelligence from a dual-component framework. psychology science, 51 (4), 388–402. vaishnavi, s. n., vlassenko, a. g., rundle, m. m., snyder, a. z., mintun, m. a., & raichle, m. e. (2010). regional aerobic glycolysis in the human brain. proceedings of the national academy of sciences of the united states of america, 107 (41), 17757–17762. https://doi.org/10.1073/pnas.1010459107 van merriënboer, jeroen j. g. (1997): training complex cognitive skills. englewood cliffs, nj: educational technology publications. vollmeyer, r., burns, b. d., & holyoak, k. j. (1996). the impact of goal specificity on strategy use and the acquisition of problem structure. cognitive science, 20 (1), 75–100. http://dx.doi.org/ 10.1016/s0364-0213%2899%2980003-2 weise, j. j., greiff, s., & sparfeldt, j. r. (2020). the moderating effect of prior knowledge on the relationship between intelligence and complex problem solving – testing the elshoutraaheim hypothesis. intelligence, 83, 101502. https://doi.org/ 10.1016/j.intell.2020.101502 woods, d. d., roth, e. m., stubler, w. f., & mumaw, r. j. (1990). navigating through large display networks in dynamic control applications. in proceedings of the human factors society annual meeting (vol. 4, pp. 396–399). sage. wüstenberg, s., greiff, s., & funke, j. (2012). complex problem solving—more than reasoning? intelligence, 40, 1–14. 10.11588/jddm.2023.1.76662 jddm | 2023 | volume 9 | article 1 | 17 https://doi.org/10.1016/j.intell.2015.09.005 https://doi.org/10.1016/j.intell.2015.09.005 https://doi.org/10.1016/j.chb.2020.106442 https://doi.org/10.1016/j.chb.2020.106442 https://doi.org/10.2307/1129583 https://doi.org/10.2307/1129583 https://doi.org/10.1073/pnas.1010459107 http://dx.doi.org/10.1016/s0364-0213%2899%2980003-2 http://dx.doi.org/10.1016/s0364-0213%2899%2980003-2 https://doi.org/10.1016/j.intell.2020.101502 https://doi.org/10.1016/j.intell.2020.101502 https://doi.org/10.11588/jddm.2023.1.76662 original research a web-based feedback study on optimization-based training and analysis of human decision making michael engelhart1, joachim funke2, and sebastian sager1 1 faculty of mathematics, otto-von-guericke-universität magdeburg and 2 ruprecht-karls-universität heidelberg the question “how can humans learn efficiently to make decisions in a complex, dynamic, and uncertain environment” is still a very open question. we investigate what effects arise when feedback is given in a computersimulated microworld that is controlled by participants. this has a direct impact on training simulators that are already in standard use in many professions, e.g., flight simulators for pilots, and a potential impact on a better understanding of human decision making in general. our study is based on a benchmark microworld with an economic framing, the iwr tailorshop. n=94 participants played four rounds of the microworld, each 10 months, via a web interface. we propose a new approach to quantify performance and learning, which is based on a mathematical model of the microworld and optimization. six participant groups receive different kinds of feedback in a training phase, then results in a performance phase without feedback are analyzed. as a main result, feedback of optimal solutions in training rounds improved model knowledge, early learning, and performance, especially when this information is encoded in a graphical representation (arrows). keywords: complex problem solving, training, dynamic decision making, feedback, mixed-integer nonlinear optimization, tailorshop modern life imposes daily decision making, oftenwith important consequences. illustrative examples are politicians who decide on actions to overcome a financial crisis, medical doctors who decide on complementary chemotherapy drug delivery strategies, or entrepreneurs who decide on long-term strategies for their company. the process of human decision making is the subject of research in the field of complex problem solving (cps), which deals with complex problems. the complexity may result from one or several different characteristics, such as a coupling of subsystems, nonlinearities, dynamic changes, opaqueness, or others (dörner, 1980). such problems are considered to be similar to problems we encounter and solve in everyday life. thus, investigation of cps is claimed to yield more insight into real-world human decision making than simple problems with a well-defined problem space, like the tower of hanoi. apparently, our introductory examples are complex problems and as such, they are ill-defined. more precisely, their problem space is open and a problem solver has to deal with lots of variables, dependencies and dynamics making them complex problems: which information is relevant? how is the data connected? what is the exact aim? the main intention in cps research is to understand how certain exogenous variables influence a solution process. in general, personal and situational variables are differentiated. the most typical and frequently analyzed personal variable is intelligence. it is an ongoing debate how intelligence influences complex problem solving (wittmann & hattrup, 2004). other interesting personal variables are working memory (robbins et al., 1996), amount of knowledge (kluwe, 1993), and emotion regulation (otto & lantermann, 2004). situational variables like the impact of goal specificity and observation (osman, 2008), feedback (brehmer, 1995), and time constraints (gonzalez, 2004) attracted less attention. in a recent work (selten, pittnauer, & hohnisch, 2012), an abstract computer-simulated monopoly market is used to investigate dynamic decision making based on the choice of goal systems. for investigations in the field of cps, computer-based simulations of small parts of the real world, microworlds, are frequently used. these simulations present users with situations similar to those encountered when attempting to solve real-world complex problems, but offer researchers the possibility to conduct studies under controlled conditions. in cps, the performance of participants in a clearly defined microworld is investigated, evaluated and correlated to certain characteristics, such as the participant’s capacity to regulate emotions. previous research with the microworld tailorshop one microworld that comprises a variety of properties such as dynamics, complexity and interdependence, discrete choices, lack of transparency, and polytely in an economical framing is the tailorshop. participants have to make economic decisions to maximize the overall balance of a small company, specialized in the production and sales of shirts. the tailorshop sometimes is referred to as the drosophila for cps researchers (funke, 2010) and thus is a prominent example for a computer-based microworld. it has corresponding author: sebastian sager, otto-von-guericke-universität magdeburg: sager@ovgu.de 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 1 mailto:sager@ovgu.de http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training been used in a large number of studies, e.g., putzosterloh, bott, and köster (1990); kluwe, misiak, and haider (1991); kleinmann and strauß (1998); meyer and scholl (2009); barth (2010); barth and funke (2010). comprehensive reviews on studies with tailorshop have also been published, e.g., frensch and funke (1995); funke (2003); funke and frensch (2007); funke (2010). the calculation of indicator functions to measure performance of cps participants is by no means trivial. to measure performance within the tailorshop microworld, different indicator functions have been proposed in the literature, see danner, hagemann, schankin, hager, and funke (2011) for a recent review. hörmann and thomas (1989) proposed a comparison of the variable which the participants were requested to maximize. such a performance criterion seems natural. however, it cannot yield insight into the temporal process and is not objective in the sense that the performance depends on what other participants achieved. analyzing the temporal evolution of other variables of this microworld has also been proposed (see, e.g., putz-osterloh (1981); süß, oberauer, and kersting (1993); funke (1983); barth and funke (2010)). an obvious drawback of comparing the development of variables which were not the actual objective for the participants is that a monotonic development does not necessarily indicate good or even optimal decision making. the lacking availability of an objective performance indicator is an obstacle for analysis and it has often been argued that inconsistent findings are due to the fact that an objective indicator function yielding detailed insight into the participants’ performance is not available, e.g., in wenke and frensch (2003). to overcome this problem, we propose to use indicator functions based on optimal solutions. in sager, barth, diedam, engelhart, and funke (2010) as well as ins sager, barth, diedam, engelhart, and funke (2011) the question of how to get a reliable performance indicator for the tailorshop microworld has been addressed. because all previously used indicators have unknown reliability and validity, decisions are compared to mathematically optimal solutions. for the first time, a complex microworld such as tailorshop has been described in terms of a mathematical model. thus, the assumption that the fruit fly of complex problem solving is not mathematically accessible has been disproved. this novel methodological approach has also been combined with experimental studies (barth, 2010; barth & funke, 2010; sager et al., 2011) but beyond these works, has to our knowledge not yet received much attention. training and relation to optimization with tasks for humans becoming more complex in the real-world, there is also an increasing need to train and assist persons performing complex tasks. in hüfner, tometzki, kraja, and engell (2011), a framework for training engineering students in designing controllers for complex systems like chemical reactors is presented. in this approach, students can learn from the results of simulations depending on their inputs. in the context of cps, an interesting approach would be to determine optimal solutions and corresponding controls for a microworld to compute a feedback for participants to support and train them. however, as cronin, gonzalez, and sterman (2009) show, the presentation of information in a dynamic context is crucial for the success of the participants. to the best of our knowledge, there have been no studies investigating the effects of an optimization-based feedback. so far, cps microworlds have been developed in a purely disciplinary trial-and-error approach. a systematic development of cps microworlds based on a mathematical model, sensitivity analysis, and eventually optimization methods to choose parameters that lead to a wanted behavior of the complex system has not yet been applied. an example for this necessity is the fact that the mathematical modeling of the tailorshop microworld in sager et al. (2011) led to the discovery of unwanted and unrealistic winning strategies. based on this experience with modeling oddities, bugs, and other undesirable properties, a new microworld has been built from scratch designed as a mathematical model for cps by engelhart, funke, and sager (2013), the iwr tailorshop. the iwr tailorshop is the first cps test-scenario with functional relations and model parameters that have been formulated based on optimization results yielding desirable (mathematical) properties. compared to the tailorshop, the setting is slightly more general. for example, machines have been replaced by production sites, and vans by distribution sites. the optimization problems that need to be solved in the context of the iwr tailorshop scenario are mixedinteger nonlinear programs (minlp) with non-convex continuous relaxations. whenever optimization problems involve variables of continuous and discrete nature together, the term mixed-integer is used. in this case they can be interpreted as discretized optimal control problems (dmiocps). we use the mathematical approaches presented in engelhart et al. (2013) and engelhart (2015) that are based on a tailored decomposition technique to determine ε–optimal solutions for iwr tailorshop in (almost) real time. about this study in the interest of a compact presentation we focus on the most important results of a study which has been described in full detail in the phd thesis of engelhart (2015). method we describe the tailorshop microworld, the feedback study with the experimental groups, the hypotheses, a prestudy, details of the data collection, and the statistical methods. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 2 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training iwr tailorshop: a new complex microworld we work with a systematically built new microworld with controlled properties, the iwr tailorshop. it was first described in engelhart et al. (2013) and engelhart (2015) and is based on the economical framing of tailorshop. table 1 lists all states and controls (interventions for the participants) that the iwr tailorshop contains together with corresponding units. the final mathematical model of the iwr tailorshop consists of 14 state variables x (i.e., dependent variables) and 10 control variables u (i.e., independent variables) including 5 integer controls. all equations and constraints, the objective function, and the parameter and initial values are specified in the appendix. states variable unit∗ employees xem person(s) production sites xps site(s) distribution sites xds site(s) shirts in stock xsh shirt(s) resources in stock xrs shirt(s) production xpr shirt(s) sales xsa shirt(s) demand xde shirt(s) reputation xre — shirts quality xsq — machine quality xmq — resources quality xrq — motivation of empl. xmo — resources price∗∗ xrp mu/shirt capital xca mu controls variable unit∗ shirt price usp mu/shirt advertising uad mu wages uwa mu/person working conditions∗∗ uwc mu maintenance uma mu buy resources∗∗ udrs shirt(s) sell resources∗∗ udrs shirt(s) resources quality urq — recruit/dismiss empl. udem/udem person(s) create production site udps site(s) close production site udps site(s) create distribution site udds site(s) close distribution site udds site(s) table 1. states and controls in the iwr tailorshop microworld (∗ mu means monetary units, ∗∗ not part of the final model for the web-based study). the equations describe how the different state and control variables are connected. some of these equations may be trivial, as, for example, the number of production sites (xps) in equation (a.1b) in the appendix, where the numbers of newly created (udps) or closed distribution sites (udps) are added to or subtracted from the current value. they may also involve more variables and include nonlinear expressions as, e.g., in the demand which depends nonlinearly on shirt price, advertisement, reputation, and others, compare equation (a.1d). these mathematical relations are intransparent to the study participants, as it is a part of the task to explore and understand the microworld. the objective is the maximization of the capital at the end of the discrete time-scale in this work, see equation (a.4) in the appendix. the constraints are basically bounds on the controls or non-negativity of variables. the objective is communicated to participants, the constraints can be determined from admissible values in the web interface. iwr tailorshop has been implemented including different optimization-based feedback methods in a web-based interface, compare figure 1. for the analysis of data collected with this interface, optimizationbased analysis methods have been implemented in the analysis software antils. both the web front end and the analysis back end are available as open-source software under the gpl (gnu general public license) and thus can easily be used for further investigations. analysis and feedback based on optimal solutions enabled insights on human decision making which else would not have been possible. a web-based feedback study from november to december 2013, we conducted a feedback study with the described iwr tailorshop microworld. we collected data from 148 participants (n = 94 after removal of incomplete datasets and outliers, see below) and applied our optimization-based analysis and feedback approach. the participants were asked to play four rounds with 10 "months" each of the economic simulation via its web interface. different approaches for both feedback computation and feedback presentation have been applied in the first two rounds (so-called training or feedback rounds). in the last two rounds, however, no one received any feedback. these rounds will be referred to as performance rounds. task. participants had to play four rounds of the iwr tailorshop microworld of 10 months each via its web interface. they were allowed to interrupt the process at any time. for the four rounds, different initial values were used, see table a.3 in the appendix, but the same for all participants. rounds 1 and 3 started with the same values, whereas in rounds 2 and 4 pairwise different values were used. control values for recruitment and dismissal of employees and creation and closing of sites were always reset to 0 in order to avoid accidental execution. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 3 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training figure 1. the iwr tailorshop web interface with arrows as feedback for the trend group (compare figure 2) and a hint for maintenance control. f e e d b a c ko p t i m i z a t i o n 4 bar chart 1 highlight variables 55 2 show arrows 55 55 3 toggle values 38 55 b start optimization in xk, fix decisions uk with constraints artificial constraints for uk yield sensitivities a start optimization in xk+1 identical to the start values the participant will have for next decisions uk+1 xk+1xk uk uk+1uk-1 c start optimization in xk identical to the start values, the participant had for decisions uk d start optimization in xk+1, fix single decision with constraint compute online when one variable is changed and give feedback, which variables now should be changed figure 2. optimization-based feedback at month k + 1: on the left hand side, there are different methods to compute a feedback and on the right hand side there are different types of feedback presentation. optimization method a is used with feedback presentations 1, 2, and 3 (corresponding to indicate group, trend group, and value group) and optimization method b is used with feedback presentation 4 (corresponding to chart group). note that xk refers to the state variables and uk refers to the control variables of month k. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 4 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training as an incentive, there was a competition with chances weighted according to success in which participants could win one of six 20 euro amazon gift cards. for this, only the results of performance rounds were considered. procedure. for the main task, the control of the iwr tailorshop microworld, the participants received guidance by the following introduction: thank you! now you can start into the iwr tailorshop microworld. please note, that you need to finish 4 rounds of 10 "months" each to participate in the competition. all in all it will take you about 30–45 minutes. you ideally play all 4 rounds at a stretch, but you may interrupt after each “month” and continue at a later date. the first two rounds are training rounds, only your points (not your rank) in the last two rounds are considered for the drawing. now, please imagine you are the head of a company, which produces shirts. your aim is to maximize the company’s capital at the end of each round, i.e., in month 10. for this there are several possibilities of intervention available, which will be located in the lower part. in the upper part you will find important figures of your company. however, your intervention possibilities are subject to certain constraints, e.g., you are not allowed to close all company sites. at the end of each round, you will find a highscore table and after the last round the table, which is important for the competition. in the blue hint box you can find assistance and useful hints during your game. good luck! the hint box the introduction refers to was displayed at the left side and contained hints corresponding to the situation and the feedback group the participant was in, compare figure 2, e.g., during your first two rounds, you will receive assistance to improve your performance. we will show you arrows next to the interventions to indicate in which direction the mathematically optimal decision for the next month is, depending on the decision shown at the beginning of the month. the arrows will be thicker if the optimal decision is far away, but will not change when you change the values. hints on each state and control, e.g., “the wages for each employee per month in money units” for control wages, were available as a tooltip on mouse rollover. after each round, participants were shown an anonymized highscore list with the top 20 participants in their group. additional variables. additional information on the participants was collected via three questionnaires. the first survey comprised gender, interest in economics, interest in computer games, age, and a selfassessment of systematic problem solving. this survey had to be answered before participants could start the main task, i.e., the four iwr tailorshop rounds. the other two surveys were carried out after the main task. the second survey was targeted on participants’ model knowledge. participants were shown five claims about the iwr tailorshop microworld and had to decide if they were right or wrong, compare table a.8 in the appendix. final survey was the 10-item short version of the big five inventory test proposed by rammstedt and john (2007) to measure the big five dimensions of personality (digman, 1990), i.e., agreeableness, conscientiousness, extraversion, neuroticism, and openness. the experimental groups participants were divided randomly into six groups based on pseudorandom numbers generated by a mersenne twister (matsumoto & nishimura, 1998). they differ in the way they received additional information in the first two (feedback) rounds. compare figure 2 for an illustration of the optimization-based feedback. the six groups were designed as follows. the control group (co) did not receive any feedback. the highscore group (hs) received a feedback based on the results of previous participants during training rounds, giving a ratio of participants who performed better and worse of the kind “until now x% of participants performed better and y% performed worse than you.” the indicate group (in) received optimization-based feedback via highlighted control values. variables are highlighted if they differ from the optimal value more than a given threshold, e.g., 30 % of the difference δ between lower and upper bound of a variable. the trend group (tr) received optimization-based feedback via up and down arrows in different thickness. arrow thickness is also determined by thresholds depending on δ. arrows indicate the direction of the optimal control: if the optimal control value is larger, the arrow points up and vice versa. the value group (va) received optimization-based feedback via toggled values, showing the optimal solution. note that participants of this group could theoretically obtain a 100% performance (in the two feedback rounds) by simply copying all values. the chart group (ch) received optimization-based feedback via bar charts. lagrange multipliers are displayed scaled according to δ. these dual variables indicate the sensitivity of the objective function with respect to the current value. hypotheses before the beginning of the study, specific hypotheses were formulated. in the interest of a compact presen10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 5 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training tation, we list a subset of them directly in the corresponding result sections in tables 2, 3, 4, 5, 6, 8. the full set of hypotheses that have been formulated and tested can be found in the phd thesis of engelhart (2015). they concern correlations with the additional variables mentioned above (computer games, economic interest, gender, age, big five) and a detailed analysis of processing times. no statistically significant effects were found (for age and gender possibly due to low numbers of old/female participants). therefore in this paper we focus on the main result, namely the impact of optimization-based feedback on performance and learning. prestudy in october 2013, 18 participants (recruited directedly via e-mail) took part in a prestudy. the aim was twofold: on the one hand, this was a test under realistic conditions for the main study and an opportunity to eliminate bugs in the interface. on the other hand, the data were used for highscore feedback in the main study. this was particularly necessary to avoid a feedback like “0% performed better and 0% worse than you” for the first participant in that group. however, the data were considered neither in our statistical nor in our optimization-based analysis. data collection starting from november 15, 2013, the study was announced in several first and third term lectures for mathematics, physics, computer science, engineering, and psychology students at heidelberg university and otto von guericke university magdeburg in germany. these announcements were complemented by public announcements in the social networks google+ and facebook as well as selective announcements via e-mail. potential participants were informed that they would have to play four rounds of the economic simulation iwr tailorshop via a device of their choice with a web browser (e.g., pc, tablet, or smartphone) which in total would take approximately 30–45 minutes. it was advertised as an incentive that there will be a competition with chances weighted according to success where participants can win one of six 20 euro amazon gift cards. the deadline for participation was december 15, 2013. participants had to create an account with an email address, which they needed to confirm in order to avoid multiplicate participation. creating multiple accounts was also prohibited by terms of participation leading to exclusion from the competition. until the end of data collection, 157 accounts were registered for participation. two accounts have not been activated, maybe because of erroneous e-mail addresses or the like. furthermore, seven participants did not answer the first survey and therefore could not start the main task, i.e., no data was recorded for them at all. thus, we received data from 149 participants, of which 101 provided complete datasets, i.e., they played four full rounds and answered all three surveys. one account was identified as a duplicate participation and was excluded from the analysis. the first account of the corresponding participant was part of the analysis, but was not considered in the competition. this resulted in 100 complete datasets and 148 datasets in total for our statistical analysis. model knowledge a true/false questionnaire, table a.8 in the appendix, was used at the end of the four rounds to determine the participants’ knowledge about the iwr tailorshop microworld. the overall ratio of correct answers varies a lot for the five claims. this shows that the questions had a varying difficulty, which was intended. correct answers were identified as knowledge about the model. participants who chose don’t know were considered to be uncertain about the corresponding claim. statistical methods statistical analysis of the data was done using the open source package r version 3.0.1 (r development core team, 2008). statistical significance. we tested the statistical significance of differences between means of scores and other variables. to this end we applied student’s ttest and welch’s t-test. usually all tests have also been confirmed qualitatively by wilcoxon rank sum tests. for all tests, p-values of < 0.05 were considered statistically significant (i.e., α = 0.05). all such values are printed in bold face in tables. normality of distributions. statistical tests like student’s t-test and welch’s t-test require normality of the population—although these two are known to be relatively robust against non-normality (e.g., sawilowsky & blair, 1992). we applied the implementation of the kolmogorovsmirnov test for normality (lilliefors, 1967) from the r package nortest to the score variables. for this test, the alternative hypothesis is that the data is not normally distributed. student’s t-test—in contrast to welch’s t-test—also requires homogeneity of variances between the groups. this has been tested using levene’s test (levene, 1960), brown-forsythe test (brown & forsythe, 1974; both as implemented in r package lawstat), and bartlett’s test (bartlett, 1937). for α = 0.05, the hypothesis of the data being normally distributed cannot be rejected for most groups and rounds by a majority of the applied tests for normality. however, we cannot assume homogeneous variances between feedback groups. thus, for the sake of comparability, welch’s t-test will be used for comparison of score means for all rounds. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 6 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training 0 2 4 6 8 10 0,0 0,2 0,4 0,6 0,8 1,0 1,2 (x10 5 ) state function 12 month m .u . participant optimal solutions 0 2 4 6 8 10 analysis function 2 month objective (participant) how much is still possible 0 1 2 3 4 5 6 7 8 9 10 k= figure 3. relation between optimal scores and the how much is still possible-function, illustrated for a specific participant. left: development of score (capital) over time. an optimization starting at month k provides an optimal value that could have been achieved. the specific shape of the optimal solutions (approximately constant, then linear increase of capital) is due to an investment that pays off later. therefore taking the score itself as an indicator is not a good performance measure. right: the optimal objective values at the final month 10 are plotted for different starting months k, resulting in the how much is still possible-function. participant decisions are good (even optimal), whenever the values stay constant, and the worse, the more it decreases. dropouts and outliers. 148 datasets have been considered, 100 of which were complete. our statistical analysis showed that incomplete datasets did not show any systematic differences compared to complete datasets. in particular, there were no significant effects on the dropout concerning feedback group, gender, or the performance until the dropout. grubbs’ test is a statistical test proposed by frank e. grubbs (1950, 1969) which detects one outlier at a time in a normally distributed population. we used the implementation of grubbs’ test available in the r package outliers. another approach are the outer fences for boxplots, as described by john w. tukey (1977). an analysis of the score variable with grubbs’ test and outer fences detected 6 severe outliers, which were excluded from further analysis. the analysis in the remainder, including the optimization-based analysis, is therefore based on n=94 datasets. optimization-based analysis as discussed in the introduction, measuring performance in a complex microworld is by no means trivial. in previous work we suggested a completely novel approach: to use mathematical optimization and the so-called how much is still possible-function and the use of potential-function (sager et al., 2011; engelhart et al., 2013). we applied these techniques also in the current study as follows. optimization. we computed optimal solutions for each participant (1 to 94) and round (1 to 4) and month (1 to 10). as illustrated in figure 2, the starting value is identical to the one of the participant in the specific round and month, and hence pairwise different. alltogether, we solved 94 ·4 ·10 = 3760 mixedinteger nonlinear optimization problems for our analysis, using a specifically developed optimization algorithm (engelhart et al., 2013; engelhart, 2015). note that this approach is very similar to the computation of an optimization-based feedback, compare figure 2. the main difference is whether this is done a priori (feedback for training) or a posteriori (analysis). how much is still possible-function. the optimal solution starts in the identical state as the participant in a specific round and month. hence we know how much could have been achieved if all of the participant’s future decisions would have been optimal. the optimal objective function values are interpreted as a monotonically decreasing function (because participants can’t do better than the optimal solution) over rounds and months. an illustrating example is shown in figure 3. use of potential-function. the use of potentialfunction is derived from the how much is still possible-function by taking the difference between two succeeding months. doing this for each month one obtains a function that indicates how much of the potential of optimal decisions was used by a participant. learning to enable conclusions on learning effects, we are going to analyze the use of potential function. as this function indicates how close to optimality the decisions of a participant (group) for each month were, the function can be seen as a learning curve. we experimented with different functional parameterizations, and decided eventually to use a piecewise linear model for our analysis. we used r’s lm to fit the linear model for use of potential for each participant and each round, y = m ·x + c, (1) 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 7 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training based on given values yi of use of potential at months xi = i. the regression parameters are m and c, and estimate the gradient and the intercept of use of potential. the estimate m for the gradient characterizes how much more potential the participant was able to use over time, i.e., how much the participant learned. we use statistical tests on the values of m for different participant groups for our a priori hypotheses on learning. the first months of the feedback rounds (i.e., months 1 and 11) were not considered for the linear regression. no feedback is given before the first decision and thus use of potential may change drastically from month 0 to month 1. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −6 × 10+4 −4 × 10+4 −2 × 10+4 0 × 10+0 0 10 20 30 month u se o f p ot en tia l (a) based on all months ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −6 × 10+4 −4 × 10+4 −2 × 10+4 0 × 10 +0 0 10 20 30 month u s e o f p o te n ti a l (b) without first month figure 4. regression lines for use of potential for value group over all rounds (one round consists of 10 months). (a) shows a regression with all months of each round, for (b) the first month of feedback rounds has been excluded. the importance of this is shown in figure 4, where figure 4a exhibits linear regressions based on all months, and figure 4b the corrected approach. in performance rounds this effect does not occur, so all months are considered. technical implementation for data collection, the iwr tailorshop web interface was used, which is implemented using xhtml and javascript with jquery 1.10 and usage of ajax client-side, complemented by a server-side php code. for the online optimization, ampl version 20131012 0 5 10 15 20 −4 × 10+5−2 × 10+5 0 × 10+0 2 × 10+5 4 × 10+5 n u m b e r o f p a rt ic ip a n ts round 1 0 5 10 15 20 −4 × 10+5−2 × 10+5 0 × 10+0 2 × 10+5 4 × 10+5 round 2 0 5 10 15 20 −4 × 10+5−2 × 10+5 0 × 10+0 2 × 10+5 4 × 10+5 score n u m b e r o f p a rt ic ip a n ts round 3 0 5 10 15 20 −4 × 10+5−2 × 10+5 0 × 10+0 2 × 10+5 4 × 10+5 score round 4 figure 5. score histogram for all four rounds for all complete datasets without 6 outliers (n = 94). together with bonmin 1.5 and ipopt 3.10 was used via iwr tailorshop’s ampl interface. the web server for the study was an intel core i7 920 machine with 12 gb ram running php 5.5 and mysql 5.5 with an apache 2.4 http server on ubuntu 13.10 64-bit. the web interface implemented a so-called responsive grid, which allowed participants to use both mobile devices and desktop pcs conveniently. usage statistics based on user logins show that approximately 20% of participants used mobile devices. the methods for an optimization-based analysis are implemented in the open-source software package antils (analysis tool for iwr tailorshop results and solutions). all computations were carried out on an intel core i7 920 machine with 12 gb ram running ubuntu 14.04 64-bit. for the solution of the arising optimization problems, ampl version 20140331 together with bonmin 1.5 and ipopt 3.10 was used via iwr tailorshop’s ampl interface. results we are going to test hypotheses related to the different participant groups. first we will focus on performance, second on learning, and third on model knowledge. we will close by an illustrating investigation of the strategies of exemplary participants. we start with a look at the score and the use of potential–functions of the study participants. figure 5 shows how the performance (score) is distributed over all participants in the four rounds. note that rounds 1 and 3 had the same initial values, whereas rounds 2 and 4 had different initial values, and thus also different optimal solutions and scores. rounds 1 and 2 are training rounds with feedback, rounds 3 and 4 performance rounds without feedback. obviously, it is only meaningful to investigate the impact of the different types of feedback, if the role of the participants’ prerequisites is not a decisive factor (e.g., because one group simply consisted of better problem solvers at the beginning of the study). this would have biased the groups’ performance and is relevant, given the low number of samples for some of the participant groups. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 8 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training hypothesis confirmed (a) initial performance is not important for final performance x table 2. hypothesis about participant prerequisites. the optimization-based analysis gives us the possibility to check this by comparing the first use of potential value. at this point, all participants had received the same information, as feedback only started after the first decision, so there should be no significant difference in the performance. table a.4c contains mean values, kolmogorov-smirnov test results, and welch’s t-test results (in comparison to control group). the kolmogorov-smirnov test shows that the first use of potential values can be considered to be normally distributed for all groups. no significant differences to control group can be observed by the welch’s t-test for all groups, so we can suppose that there were no systematic differences among the participants of the six groups. correlation between first use of potential and score in performance rounds is 0.067, confirming hypothesis (a), see table 2. effects of optimization-based feedback on performance we investigate whether the optimization-based feedback in the first 2 training rounds had a significant effect on the performance in the rounds 3 and 4, where no feedback was given (table 3). we start by looking at all optimization participants, i.e., the ones in groups indicate (in), trend (tr), value (va), and chart (ch). we assess hypothesis (b) visually via figure 6 hypothesis confirmed (b) participants with optimization-based feedback perform better overall, better in feedback rounds, and better in performance rounds compared to control group x (c) control group performs worst overall and performs worse in performance rounds than groups with optimization-based feedback — table 3. hypotheses related to performance of participants who received optimization-based feedback (groups in, tr, va, ch) and to performance of the control group. and statistically via table a.4a. figure 6 shows a boxplot of the different participant groups’ performance via the obtained score. the four groups which received optimization-based feedback (in, tr, va, and ch, rightmost in figure 6) show different performance, which will be discussed later. relevant for hypothesis (b) is that the mean scores are above the ones of the control group (co). this is true for training rounds 1 and 2, for performance rounds 3 and 4, and thus also overall. the statistical significance based on a comparison between optimization-based feedback groups and the other two groups is shown in table a.4a. participants who received optimization-based feedback performed significantly better than those without feedback, in each round and in total, proving hypothesis (b). looking closer at table a.4a one observes that this significance holds for both comparisons, the one to all participants without feedback (highscore group and control group) and only to those from control group. the value of the statistical test is larger in the training rounds by roughly one order of magnitude, which is not surprising given the direct benefit of the feedback on the performance. the performance of the four optimization-based feedback groups is quite diverse, compare again figure 6: value group was the best by far in all the rounds, trend group comes second. the two other feedback groups, indicate group and chart group, do not exhibit such a good performance. as a result, the performance of the control group is only significantly worse on average, but not compared to all of the single feedback groups as tested in table a.4b. consequently, hypothesis (c) can be considered as disproved, both for the performance rounds as overall. the results of the group–specific welch’s t-test in table a.4b are also helpful for an assessment of the hypotheses of table 4. for α = 0.05, value group is sighypothesis confirmed (d) trend group performs best overall and best in performance rounds — (e) value group performs best in training rounds and worst in performance rounds, compared to other feedback groups (x) (f) indicate group and chart group do not perform significantly better than control group in performance rounds x table 4. hypotheses on specific feedback types (arrow feedback in trend group and toggled values in value group). nificantly better than control group in all the rounds. trend group misses significance only in round 3 by narrow margin, but exhibits significant differences in the other rounds. indicate group is significantly better only in round 1. the remaining groups are not 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 9 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training ● ● ● ● ● ● ● ● ● ● ● −4 × 10+5 −2 × 10+5 0 × 10+0 2 × 10+5 4 × 10+5 all co hs in tr va ch s co re round 1 ● ● ● ●● −4 × 10+5 −2 × 10+5 0 × 10+0 2 × 10+5 4 × 10+5 all co hs in tr va ch round 2 ● ● ● ● ● ● −4 × 10+5 −2 × 10+5 0 × 10+0 2 × 10+5 4 × 10+5 all co hs in tr va ch group s co re round 3 ● ●● ● ● ● −4 × 10+5 −2 × 10+5 0 × 10+0 2 × 10+5 4 × 10+5 all co hs in tr va ch group round 4 figure 6. score boxplot of all feedback groups (co: control, hs: highscore, in: indicate, tr: trend, va: value, ch: chart) for all rounds and all complete datasets without 6 outliers (n = 94). the boxplot indicates that value group and—except for round 3—trend group are better than the others. significantly different than control group. the difference between value group and all other groups is also significant in all rounds for α = 0.05 (not in the table). as value group showed the best performance, and trend group only second–best, hypothesis (d) can be considered as disproved. it is true that value group performed best in training rounds, but it did not perform worst in performance rounds. so, the first statement of hypothesis (e) is likely to be true, the second to be false. the two other feedback groups, indicate group and chart group, do indeed not perform significantly better than control group, confirming hypothesis (f). figure 7 contains the average use of potential for each feedback group over all rounds. this plot reveals much more detail on the performance of the different groups, as it contains also temporal information. this will be helpful in the next section. looking at the average values (remember: use of potential is the better, the closer it is to 0), additional evidence is given for the results for hypotheses (b–f). effects of optimization-based feedback on learning as described in section“learning”, we use the gradient m obtained from a linear regression as an indicator for learning. as use of potential may hence be considered as the learning curve, it is worthwhile to have a look at figure 7 to assess the first hypothesis on learning in table 5. the visual impression is that on average the use of potential has a tendency to increase, at least hypothesis confirmed (g) participants learn how to solve the complex problem x (h) learning function is approximately logarithmic — table 5. hypotheses related to learning. for rounds 1, 2, and 4. this is confirmed quantitatively by looking at the average values and the p values in table a.6. on average, participants show significant learning effects in all rounds except for round 3. this supports the assumption that participants learn how to control the microworld, i.e., hypothesis (g). additional evidence for hypothesis (g) comes from figures 5 and 6. comparing round 1 (with feedback) and 3 (without feedback) one can see that the distribution is shifted slightly to the right, i.e., to higher scores, hinting at an overall learning effect. that this learning effect is dependent on the feedback in the training rounds 1 and 2, can already be guessed by looking at round 4. round 4, which is a performance round with initial values the participants have not seen before in the training rounds, exhibits a non–normal distribution of performance. trying to fit a logarithmic function to the use of potential was not successful. a closer inspection of figure 7 indicates that although for certain partici10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 10 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training 0 10 20 30 -6 -4 -2 0 (x10 4 ) analysis function 7 month control highscore indicate trend value chart u se o f p o te n ti a l fo r a ll r o u n d s figure 7. use of potential for all four complete datasets without 6 outliers (n = 94) over all rounds (one round consists of 10 months), but averaged for the six different participant groups (see section “experimental groups”). value group is always on top and almost constant in feedback rounds, but decreases slightly in performance rounds. all other groups show a more (control and highscore group) or less (trend group) severe decline at the beginning of round 4. pant groups and rounds (e.g., trend group in rounds 1, 2, and 4) there is a stronger increase at the beginning that flattens toward the end of the round, the hypothesis (h) cannot be confirmed based on our data. this is also the impression from investigating use of potential of single participants, compare figure 9. we now have a closer look at the effect of optimization-based feedback on learning. to test hypohypothesis confirmed (i) optimization-based feedback groups learn faster (x) (j) trend group learns fastest x table 6. hypotheses related to learning, specific for participant groups. thesis (i), see table 6, we look at the regression parameters m for the four optimization-based feedback groups (of, consisting of in, tr, va, ch) and the two other groups (nof, consisting of co and hs) in table 7. the mean for parameter m for of is higher in round 1 and lower in all other rounds. this suggests that, given the performance of these groups, optimizationbased feedback groups learned faster, namely mainly in the first round. however, welch’s t-test only shows significance for rounds 2–4. we see this as an indication that (i) might be true, but it cannot be fully confirmed with our data. to shed more light on the issue, we investigate the learning curves of the single participant groups. as above, figure 6 hints at improved scores in round 3 rnd nof of nof < of of < nof 1 651.2 1063.1 0.2384 0.7616 2 1086.6 550.3 0.9642 0.0358 3 670.9 -263.4 0.9997 0.0003 4 3445.1 817.0 1.0000 0.0000 table 7. columns 2 and 3: mean regression parameters m for nonoptimization based feedback groups (nof) and optimization-based feedback groups (of). columns 4 and 5: corresponding significances from welch’s t-test. rnd means round. one observes that of learned more in round 1, however not significant, and co&hs learned significantly more in rounds 2–4. compared to round 1 (with identical initial values) for all participant groups with the exception of value group. value group remained static (-4%) at a higher level than the other groups. a reason for this may be that participants profited so strong from the value feedback during the feedback rounds that their performance without feedback slightly decreased. however, the group’s mean is on a high level, so there was not much space for improvement anyhow. for the other five groups performance improved drastically (20% at least). again, more insight comes from our novel analysis approach, the study of use of potential depicted in figure 7. value group is always on top as expected and almost constant in feedback rounds, but decreases slightly in performance rounds. this means that the performance of participants in this group is on a very high level from the beginning and hardly improves, in fact rather impairs. all other groups show a more or 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 11 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training 0 10 20 30 -3,0 -2,5 -2,0 -1,5 -1,0 -0,5 (x10 4 ) analysis function 7 month u se o f p o te n ti a l fo r a ll r o u n d s average (a) high 0 10 20 30 -4 -3 -2 -1 (x10 4 ) analysis function 7 month u s e o f p o te n ti a l fo r a ll r o u n d s average (b) mid 0 10 20 30 -6 -5 -4 -3 -2 -1 (x10 4 ) analysis function 7 month u s e o f p o te n ti a l fo r a ll r o u n d s average (c) low figure 8. use of potential according to high, mid, and low model knowledge for all complete datasets without 6 outliers (n = 94) over all rounds (one round consists of 10 months). participants with low knowledge show a severe decline in their score at the beginning of round 4, whereas they stay on the same level in the rounds before. high and mid group show an increase in feedback rounds and high group also stays almost on the same level later in round 4. note that the start of round 4 is challenging due to new initial values. less severe decline at the beginning of round 4 with control and highscore group at the one end and trend group at the other. however, all groups except value group seem to improve their performance during the first three rounds. to quantify this, table a.5 contains the mean values for the regression parameters m of the different feedback groups. the kolmogorov-smirnov test results show that the mean values can be considered to be normally distributed in all rounds, except for chart group in round 1. the welch’s t-test results show whether the hypothesis that the mean value of m is positive, and hence a positive learning effect occured, is significant or not. trend group is the only group with a significant learning effect in both rounds 1 and 2. therefore we see hypothesis (j) as confirmed. for control group, the learning effects get significant from round 2 on, and for highscore group they are significant in rounds 2 and 4. the mean values in performance rounds for control and highscore group are drastically higher than for the optimization-based feedback groups. value group is the only one with a significantly decreasing performance in round 3 and also the only one with an overall mean below 0. note that chart group performs even worse at least in the feedback rounds. this changes in performance rounds, so one can suppose that the feedback consternated the participants. a possible reason could lie in a misinterpretation of the sensitivity information participants were given by this feedback. all other optimizationbased feedback groups received direct information on the optimal solution. effects of model knowledge the focus of this section are the two variables knowledge and uncertainty. we look at the hypotheses in table 8. to investigate hypothesis (k), quartiles have been used to build groups of participants with high (best 25%), mid (those between first and third quartile), and low (worst 25%) score for each round. means of correspondent model knowledge and uncertainty scores can be found in tables a.9a and a.9b. high groups have hypothesis confirmed (k) well-performers know more about the model x (l) participants with high model knowledge perform well x (m) participants with high model knowledge learn more (x) (n) trend group has highest model knowledge and lowest uncertainty x table 8. model knowledge related hypotheses. the highest means which increase over the rounds. except for round 1, mid groups are between low and high groups. in performance rounds, all differences are significant according to the welch’s t-test. significance roughly increases over the rounds, which suggests that model knowledge is a crucial factor for successful control of the iwr tailorshop microworld. concerning hypotheses (l) and (m), participants have been merged in 3 (low (0/1), mid (2/3), and high (4/5)) and 2 (low (0/1) and mid (2/3)) groups respectively according to their knowledge and uncertainty score, which both are between 0 and 5. no participant achieved an uncertainty score of 4 or 5, thus there are only two groups for uncertainty. tables a.10a and a.10b contain the mean score values of all four rounds for these groups. for knowledge, the high group has the highest score means by far. except for round 1, mid group lies between low and high group. student’s t-test in table a.10c shows that high group was almost always significantly better than the two other groups. significance increases over the rounds, which means that model knowledge becomes a better predictor for participants success the more rounds the participants played. comparing round 1 and 3, participants with low model 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 12 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training knowledge could barely improve their performance, whereas the high group approximately doubled their score. indeed, correlation between score and model knowledge increases from about 0.09 in round 1 to 0.48 in round 4. as a summary, we see hypothesis (l) as confirmed. for uncertainty, the low group has higher means in all rounds, but again the differences are much smaller than for knowledge. hence, the differences between the groups are not significant. correlation with score is about -0.2 for all rounds except the first. concerning hypothesis (m), the average use of potential for the three model knowledge groups can be found in figure 8. participants with low knowledge show a severe decline at the beginning of round 4, whereas they stay on the same level in the rounds before. high and mid group show an increase in feedback rounds and high group also stays almost on the same level in round 4. the values in table a.12 reveal that participants with low model knowledge learned significantly less in round 1 than those with high knowledge. again, the situation reverses in round 4. hypothesis (m) could thus be confirmed with a restriction to round 1. however, it seems also likely that model knowledge changes from round to round and is an indicator of success in learning, rather than a predictor. therefore a high use of potential in the training rounds could also be considered as a predictor for model knowledge at the end of the experiment. in summary, hypothesis (m) can not be decided. concerning hypothesis (n), for an analysis of differences between the groups, ratios of model knowledge and uncertainty levels and mean values are given in table a.11. trend and value group have the highest knowledge, but only highscore and trend group are significantly better than control group. indicate and chart group have a much lower knowledge, which together with these groups’ performance suggests that participants were rather confused by the optimizationbased feedback. trend group has by far the lowest uncertainty among the groups and is the only one which has significantly lower uncertainty than control group. all other groups are on a similar level. exemplary participants a more detailed look on single participants reveals different decision patterns. figure 9 shows use of potential for participants 134, 164, 165, and 208 from value group and of participant 115 from trend group. participants 134 and 164 seem to more or less copy the optimal solution in the feedback rounds. remember that feedback for these participants consisted of the numeric values of the optimal solution. participant 208, in contrast, seems to pursue a different strategy which is less solution-oriented. the success in the performance rounds 3 and 4 also varies a lot: participant 164 seems to remember the solution, which is especially useful in round 3 as it 0 10 20 30 -4 -3 -2 -1 0 (x10 4 ) analysis function 7 month p o te n ti a l fo r a ll r o u n d s participant (a) participant 134 (value) 0 10 20 30 -2,0 -1,5 -1,0 -0,5 0,0 (x10 4 ) analysis function 7 month p o te n ti a l fo r a ll r o u n d s participant (b) participant 164 (value) 0 10 20 30 -4 -3 -2 -1 0 (x10 4 ) analysis function 7 month p o te n ti a l fo r a ll r o u n d s participant (c) participant 165 (value) 0 10 20 30 -3 -2 -1 0 (x10 4 ) analysis function 7 month p o te n ti a l fo r a ll r o u n d s participant (d) participant 208 (value) 0 10 20 30 -4 -2 0 (x10 4 ) analysis function 7 month p o te n ti a l fo r a ll r o u n d s participant (e) participant 115 (trend) 0 2 4 6 8 50 52 54 control function 1 month s h ir t p ri c e (f) shirt price decision of participant 188 (chart group) in round 3. figure 9. use of potential for single participants from value group (a–d) and trend group (e), and exemplary shirt price decisions (f). started with the same value as round 1, but participant 134 does not and lacks knowledge how to control the model. participant 165, who seems to change strategy during feedback rounds from exploration to solutionoriented, decreases in round 3, too. participant 208, who possibly has found an own strategy, stays on the same level throughout all rounds. participants 115 from trend group reaches a comparably high level of use of potential with monotonically increasing curves during the first two rounds converging to 0, i.e., coming close to optimality at the end of each round. not surprisingly, a solution-oriented pattern like among the participants from value group in figure 9 (a–d), cannot be observed due to the different type of feedback. figure 9f shows the shirt price decision of participant 188 from chart group. although already in a performance round, the participant seems quite unsure about the right strategy and changes the control a lot. such a pattern at that time point can particularly be found among the datasets from chart group. conclusion and outlook in this work, optimization methods were used in the context of complex problem solving (cps) both as an 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 13 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training analysis tool and to provide feedback in real time for learning purposes. while first works on optimizationbased analysis for cps (sager et al., 2010, 2011) had a focus on understanding how external factors influence thinking, in the work at hand, we also investigated learning effects. the use of optimization as an analysis and feedback tool for psychological studies is completely new to our knowledge. we presented a variant of the iwr tailorshop, a new microworld for cps. this turn-based test-scenario yields a mixed-integer nonlinear program with nonconvex relaxation and consists of functional relations based on optimization results. with the proof of feasibility for the iwr tailorshop in this article, we intend to start a new era beyond trial-and-error in the definition of microworlds for analyzing human decision making. in our web-based feedback study with 148 participants, we used the iwr tailorshop microworld to investigate the effects of optimization-based feedback. optimization-based feedback could significantly improve participants’ performance in the iwr tailorshop microworld if the presentation was chosen appropriately. in our study, value group performed significantly better than all other groups. we could show that such a feedback can significantly improve participants’ performance in a complex microworld and for some kinds of feedback, the difference to control group was huge. however, it also became apparent that the representation of feedback is important. feedback based on a kind of sensitivity information seemed to rather confuse participants in this study, which was also suggested by our optimizationbased analysis. the best-performing group was the value group which received the most precise information about the optimal solution. knowledge about the model was better amongst another well-performing group, the trend group. since we could show that model knowledge is a predictor for performance, perhaps these participants would have outperformed the others on a longer timescale. more data is needed to verify this hypothesis, though. optimization-based analysis could show that participants learn to control the model over time by an analysis of use of potential. different aspects of the analysis indicate that for a high performance, learning during the first round is crucial. it turned out that the best way to enforce learning at the beginning was by trend feedback. through the optimization-based analysis, we were also able to show that there were no systematic differences between the groups at the beginning and that initial performance was not relevant for performance at the end of the time scale. for some of the hypotheses, however, significance could not or only partly be shown. in these cases, more data and investigation will be necessary. the main intention of this paper is to present the optimization-based feedback and to show their usefulness in a feedback situation. the test of (learning) theories was not the focus. our different hypotheses are not drawing on specific literature but are kind of “informed guesses” about what might happen. this is also due to the fact that there exist no reference studies with the tailorshop in a feedback setting that could be used as a baseline for expected effects. however, coupling our approach to theoretically based hypotheses on learning seems a promising line of future research. another interesting research direction could be if the widely spread assumption that positive feedback increases performance is true. in barth and funke (2010) it has been shown that negative feedback impairs performance. however, it is unclear if this is also true in the long run. from former studies we know that positive and negative feedback lead to different processing styles. therefore one could expect that a quotient of positive and negative feedback (carrot and stick) impairs performance the most. 40% positive feedback and 60% negative feedback might lead to the best performance, for instance. finally, the parameter set used for the computations of the iwr tailorshop microworld in this work has been set up manually to achieve a reasonable model behavior. here we still see high potential for improvement. one could use derivative-free optimization methods to optimize the parameter values such that two (or even more) previously defined strategies (e.g., a high and a low price strategy) yield a similar objective value. by that, participants could follow different strategies and perform quite well in all of them if decisions are made appropriate. acknowledgements: this project has received funding from the european research council (erc) under the european union’s horizon 2020 research and innovation programme (grant agreement no 647573) and from the german bmbf under grant 05m2013 gossip. the authors gratefully acknowledge this. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. author contributions: the main contribution is due to the first author (me) who performed the study as part of his phd thesis (engelhart, 2015). ss and jf helped in designing and analysing the study and did part of the writing. supplementary material: supplementary material available online. handling editor: andreas fischer copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 14 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training citation: engelhart, m., funke, j. ,& sager, s. (2017). a web-based feedback study on optimizationbased training and analysis of human decision making. journal of dynamic decision making, 3, 2. doi:10.11588/jddm.2017.1.34608 received: 04 january 2017 accepted: 17 april 2017 published: 26 may 2017 references barth, c. m. (2010). the impact of emotions on complex problem solving performance and ways of measuring this performance (unpublished doctoral dissertation). ruprecht– karls–universität heidelberg. barth, c. m., & funke, j. (2010). negative affective environments improve complex solving performance. cognition and emotion, 24(7), 1259–1268. doi: 10.1080/ 02699930903223766 bartlett, m. s. (1937). properties of sufficiency and statistical tests. proceedings of the royal statistical society series a, 160(901), 268–282. doi: 10.1098/rspa.1937.0109 brehmer, b. (1995). feedback delays in dynamic decision making. in p. a. frensch & j. funke (eds.), complex problem solving: the european perspective (pp. 103–130). hillsdale, nj: erlbaum. brown, m., & forsythe, a. b. (1974). robust tests for the equality of variances. journal of the american statistical association, 69(346), 364–367. doi: 10.1080/01621459 .1974.10482955 cronin, m. a., gonzalez, c., & sterman, j. d. (2009). why don’t well-educated adults understand accumulation? a challenge to researchers, educators, and citizens. organizational behavior and human decision processes, 108(1), 116–130. doi: 10.1016/j.obhdp.2008.03.003 danner, d., hagemann, d., schankin, a., hager, m., & funke, j. (2011). beyond iq. a latent state-trait analysis of general intelligence, dynamic decision making, and implicit learning. intelligence, 39(5), 323–334. doi: 10.1016/ j.intell.2011.06.004 digman, j. m. (1990). personality structure: emergence of the five-factor model. annual review of psychology, 41(1), 471–440. doi: 10.1146/annurev.ps.41.020190.002221 dörner, d. (1980). on the difficulties people have in dealing with complexity. simulation and games, 11(1), 87–106. doi: 10.1177/104687818001100108 engelhart, m. (2015). optimization-based analysis and training of human decision making (unpublished doctoral dissertation). ruprecht-karls-universität heidelberg. engelhart, m., funke, j., & sager, s. (2013). a decomposition approach for a new test-scenario in complex problem solving. journal of computational science, 4(4), 245–254. doi: 10.1016/j.jocs.2012.06.005 frensch, p. a., & funke, j. (1995). complex problem solving: the european perspective. taylor & francis. doi: 10 .4324/9781315806723 funke, j. (1983). einige bemerkungen zu problemen der problemlöseforschung oder: ist testintelligenz doch ein prädiktor? diagnostica, 29(4), 283–302. doi: 10.11588/ heidok.00008131 funke, j. (2003). problemlösendes denken. stuttgart, germany: kohlhammer. funke, j. (2010). complex problem solving: a case for complex cognition? cognitive processing, 11(2), 133–142. doi: 10.1007/s10339-009-0345-0 funke, j., & frensch, p. a. (2007). complex problem solving: the european perspective – 10 years after. in d. h. jonassen (eds.), learning to solve complex scientific problems (pp. 25–47). new york: erlbaum. gonzalez, c. (2004). learning to make decisions in dynamic environments: effects of time constraints and cognitive abilities. human factors, 46(3), 449–460. doi: 10.1518/ hfes.46.3.449.50395 grubbs, f. e. (1950). sample criteria for testing outlying observations. annals of mathematical statistics, 21(1), 27–58. doi: 10.1214/aoms/1177729885 grubbs, f. e. (1969). procedures for detecting outlying observations in samples. technometrics, 11(1), 1–21. doi: 10.1080/00401706.1969.10490657 hörmann, h. j., & thomas, m. (1989). zum zusammenhang zwischen intelligenz und komplexem problemlösen. sprache & kognition, 8(1), 23–31. hüfner, m., tometzki, t., kraja, t., & engell, s. (2011). learn2control: eine webbasierte lernumgebung im biound chemieingenieurwesen. journal hochschuldidaktik, 22(1), 20–23. kleinmann, m., & strauß, b. (1998). validity and applications of computer simulated scenarios in personal assessment. international journal of seclection and assessment, 6(2), 97–106. doi: 10.1111/1468-2389.00078 kluwe, r. h. (1993). knowledge and performance in complex problem solving. advances in psychology, volume 101, 401–423. amsterdam, netherland: elsevier. doi: 10.1016/s0166-4115(08)62668-0 kluwe, r. h., misiak, c., & haider, h. (1991). the control of complex systems and performance in intelligence tests. in h. rowe (ed.), intelligence: reconceptualization and measurement (pp. 227–244). hillsdale, nj: erlbaum. levene, h. (1960). robust tests for equality of variances. in i. olkin, s. g. ghurye, w. hoeffding, w. g. madow, & h. b. mann (eds.), contributions to probability and statistics: essays in honor of harold hotelling (pp. 278– 292). stanford, ca: stanford university press. lilliefors, h. w. (1967). on the kolmogorov-smirnov test for normality with mean and variance unknown. journal of the american statistical association, 62(318), 399–402. doi: 10.1080/01621459.1967.10482916 matsumoto, m., & nishimura, t. (1998). mersenne twister: a 623-dimensionally equidistributed uniform pseudorandom number generator. acm transactions on modeling and computer simulation, 8(1), 3–30. doi: 10.1145/ 272991.272995 meyer, b., & scholl, w. (2009). complex problem solving after unstructured discussion. effects of information distribution and experiece. group process and intergroup relations, 12(4), 495–515. doi: 10.1177/1368430209105045 osman, m. (2008). observation can be as effective as action in problem solving. cognitive science, 32(1), 162–183. doi: 10.1080/03640210701703683 otto, j. h., & lantermann, e.-d. (2004). wahrgenommene beeinflussbarkeit von negativen emotionen, stimmung und komplexes problemlösen. zeitschrift für differentielle und diagnostische psychologie, 25(1), 31–46. doi: 10.1024/0170-1789.25.1.31 putz-osterloh, w. (1981). über die beziehung zwischen testintelligenz und problemlöseerfolg. zeitschrift für psychologie, 189(1), 79–100. putz-osterloh, w., bott, b., & köster, k. (1990). models of learning in problem solving – are they transferable to tutorial systems? computers in human behavior, 6(1), 83–96. doi: 10.1016/0747-5632(90)90032-c r development core team. (2008). r: a language and environment for statistical computing [computer software manual]. vienna, austria. retrieved from http://www.r -project.org rammstedt, b., & john, o. p. (2007). measuring personality in one minute or less: a 10-item short version of the big five inventory in english and german. journal of research in personality, 41(1), 203–212. doi: 10.1016/ j.jrp.2006.02.001 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 15 http://www.r-project.org http://www.r-project.org http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training robbins, t. w., anderson, e. j., barker, d. r., bradley, a. c., fearnyhough, c., henson, r., hudson, s. r., & baddeley, a. d. (1996). working memory in chess. memory & cognition, 24(1), 83–93. doi: 10.3758/bf03197274 sager, s., barth, c. m., diedam, h., engelhart, m., & funke, j. (2010). optimization to measure performance in the tailorshop test scenario — structured minlps and beyond. in proceedings ewminlp10 (pp. 261–269). cirm, marseille, france. sager, s., barth, c. m., diedam, h., engelhart, m., & funke, j. (2011). optimization as an analysis tool for human complex problem solving. siam journal on optimization, 21(3), 936–959. doi: 10.1137/11082018x sawilowsky, s. s., & blair, c. r. (1992). a more realistic look at the robustness and type ii error properties of the t test to departures from population normality. psychological bulletin, 111(2), 352–360. doi: 10.1037/0033-2909.111.2 .352 selten, r., pittnauer, s., & hohnisch, m. (2012). dealing with dynamic decision problems when knowledge of the environment is limited: an approach based on goal systems. journal of behavioral decision making, 25, 443– 457. doi: 10.1002/bdm.738 süß, h.-m., oberauer, k., & kersting, m. (1993). intellektuelle fähigkeiten und die steuerung komplexer systeme. sprache & kognition, 12, 83–97. tukey, j. w. (1977). exploratory data analysis. boston, ma: addison-wesley. wenke, d., & frensch, p. a. (2003). is success or failure at solving complex problems related to intellectual ability? in j. davidson & r. sternberg (eds.), the psychology of problem solving (pp. 87–126). cambridge, england: cambridge university press. doi: 10.1017/ cbo9780511615771.004 wittmann, w. w., & hattrup, k. (2004). the relationship between performance in dynamic systems and intelligence. systems research and behavioral science, 21(4), 393– 409. doi: 10.1002/sres.653 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 16 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training appendix the mathematical model for the iwr tailorshop consists of the following set of equations, for k = t0, . . . , tf , shown in equations (a.1a) to (a.1l). xemk+1 = x em k + u em k (a.1a) xpsk+1 = x ps k −u dps k + u dps k (a.1b) xdsk+1 = x ds k −u dds k + u dds k (a.1c) xdek+1 = p de,0 · exp ( −pde,1 ·uspk ) · log ( pde,2 ·uadk + 1 ) · ( xrek + p de,3) (a.1d) xrek+1 = p re,0 ·xrek + pre,1 log (( pre,2 ·uadk + pre,3 ·uspk · (x sq k ) 2 + pre,4 ·uwak ) ·pre,5 ) (a.1e) xprk+1 = p pr,0 ·xpsk+1 · log ( ppr,1 ·xemk+1 xpsk+1 + x ds k+1 + ppr,2 + 1 ) (a.1f) xsak+1 = min { psa,0 ·xdsk+1 · log ( psa,1 ·xemk+1 xpsk+1 + x ds k+1 + psa,2 + 1 ) ; xshk + x pr k+1; p sa,3 ·xdek+1 } (a.1g) xshk+1 = x sh k −x sa k+1 + x pr k+1 (a.1h) x sq k+1 = p sq,0 ·xmok + p sq,1 ·xmqk + psq,2 ·urqk (a.1i) x mq k+1 = x mq k ·p mq,0 · exp ( −pmq,1 xprk xpsk + pmq,2 ) + pmq,3 · log ( umak ·p mq,4 + 1 ) (a.1j) xmok+1 = ( 1 −pmo,0 ) ·xmok + p mo,0 · log ( pmo,1 · (uemk + p dem ) + pmo,2 ·udpsk + p mo,3 ·uddsk + pmo,4 ·uwak + p mo,5 ·xrek + pmo,6 ) · exp ( − (pmo,7 ·udpsk + pmo,8 ·uddsk ) + p mo,9 ) ·pmo,10 (a.1k) xcak+1 = p ca,0 · ( xcak + ( xsak+1 ·u sp k ) + ( udpsk ·p ca,1) + (uddsk ·pca,2) − ( xemk+1 ·u wa k ) − ( xprk+1 ·u rq k ·p ca,3 ) − ( xpsk ·p ca,4)−(xdsk ·pca,5) −umak −u ad k − ( xshk+1 ·p ca,6) − ( udpsk ·p ca,7) − ( uddsk ·p ca,8)) (a.1l) additional constraints are given by the inequalities shown in equations (a.2a) to (a.2e), udpsk + u dps k−1 ≤ p dps, (a.2a) pdem,0 ·xpsk + p dem,1 ·xdsk ≥ u em k , (a.2b) xemk ,x ps k ,x ds k ≥ 1, (a.2c) xshk ,x pr k ,x sa k ,x de k ≥ 0, (a.2d) xrek ,x sq k ,x mq k ,x mo k ≥ 0, (a.2e) and the simple bounds on the controls (a.3a) to (a.3j), uspk ∈ [35 m.u., 55 m.u.], (a.3a) uadk ∈ [1000 m.u., 2000 m.u.], (a.3b) uwak ∈ [1000 m.u., 2000 m.u.], (a.3c) umak ∈ [10 m.u., 5000 m.u.], (a.3d) u rq k ∈ { prq,1 ,prq,2 } , (a.3e) uemk ∈ [ −pdem,∞ ] ∩z+, (a.3f) udpsk ∈ [ 0,pdps ] ∩z+, (a.3g) udpsk ∈ [ 0,∞ ] ∩z+, (a.3h) uddsk ∈ [ 0,pdds ] ∩z+, (a.3i) uddsk ∈ [ 0,pdds ] ∩z+. (a.3j) we use the objective function max x,u,p xcatf , (a.4) i.e., maximizing the capital at the end. of course, the set of parameters has a significant influence on the model behavior. one could, e.g., think of applying derivative-free optimization methods with a subset of the parameters to determine an appropriate parameter set for a microworld like iwr tailorshop. for this work, however, we set up a parameter set manually such that the model fulfills a certain desired behavior. the chosen parameters also yield a model behavior that makes sense for the optimization, i.e., there are feasible solutions and the optimization problem is not unbounded. the parameter values used throughout this work unless otherwise stated are listed in tables a.1 and a.2. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 17 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training parameter value pde,0 2200.0 shirts pde,1 2 · 10−2 shirts/mu pde,2 2 · 10−2 1/mu pde,3 0.5 pre,0 0.5 pre,1 0.672 pre,2 2.5 · 10−5 1/mu pre,3 10−4 shirts/mu pre,4 6 · 10−5 persons/mu pre,5 12.0 ppr,0 99.9 shirts/sites ppr,1 2.0 sites/persons ppr,2 10−6 sites psa,0 99.9 shirts/sites psa,1 2.0 sites/persons psa,2 10−6 sites psa,3 1.0 psq,0 0.2 psq,1 0.3 psq,2 0.5 pmq,0 0.8 pmq,1 6 · 10−3 sites/shirts pmq,2 10−6 sites parameter value pmq,3 0.13 pmq,4 0.2 mu−1 pmo,0 0.5 pmo,1 2 · 10−2 persons−1 pmo,2 0.5 sites−1 pmo,3 0.25 sites−1 pmo,4 2.0 · 10−4 persons/mu pmo,5 0.3 pmo,6 1.0 pmo,7 2.5 sites−1 pmo,8 2.0 sites−1 pmo,9 1.0 pmo,10 0.5 pca,0 1.03 pca,1 5000 mu/site pca,2 3500 mu/site pca,3 5.0 mu/shirt pca,4 1000 mu/site pca,5 700 mu/site pca,6 1.5 mu/shirt pca,7 10000 mu/site pca,8 7000 mu/site table a.1. parameter set for states used with iwr tailorshop. mu means monetary units. parameter value nrq 2 prq,1 0.5 prq,2 1.0 pdem,0 5 persons/site pdem,1 10 persons/site pdem 10 persons pdps 1 site pdps 1 site pdds 2 sites pdds 1 site table a.2. parameter set for controls used with iwr tailorshop. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 18 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training variable round 1 round 2 round 3 round 4 employees xem0 14 3 14 42 production sites xps0 1 1 1 2 distribution sites xds0 1 5 1 7 shirts in stock xsh0 319 0 319 0 production xpr0 270 69 270 467 sales xsa0 270 69 270 467 demand xde0 3877 2399 3877 3065 reputation xre0 0.7934 0.1805 0.7934 0.4711 shirts quality xsq0 0.7500 0.6558 0.7500 0.8136 machine quality xmq0 0.8125 0.9998 0.8125 0.7712 motivation of employees xmo0 0.7403 0.4032 0.7403 0.5108 capital xca0 175226 28075 175226 323907 shirt price usp0 50 39 50 42 advertising uad0 2000 1599 2000 1337 wages uwa0 1500 1750 1500 1451 maintenance uma0 500 3000 500 267 resources quality urq0 2 1 2 2 recruit employees udem0 0 0 0 0 dismiss employees udem0 0 0 0 0 create production site udps0 0 0 0 0 close production site udps0 0 0 0 0 create distribution site udds0 0 0 0 0 close distribution site udds0 0 0 0 0 table a.3. initial values for each round used in iwr tailorshop feedback study. note that values for controls (lower part) were only preset values and could still be changed by the participant. the last six controls, starting from recruit employees, were always set to the value in the table after each month to avoid accidental recruitment and dismissal as well as site creation and closing. round 1 and 3 had the same initial values. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 19 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training round means t test co&hs control of co&hs < of control < of 1 25869.2 24274.1 112042.8 0.0009 0.0020 2 -58869.2 -57289.0 -1174.4 0.0000 0.0003 3 124185.3 128502.7 172860.8 0.0091 0.0182 4 170923.2 166039.1 293403.4 0.0029 0.0059 sum 262108.6 261526.8 577132.7 0.0000 0.0002 (a) welch’s t-test p-values of comparison of score means for each round between control and highscore groups (co, hs) on the one side and groups with optimization-based feedback (of) on the other side with all complete datasets without 6 outliers (n = 94). with α = 0.05, optimization-based feedback groups were significantly better than those without (co&hs as well as co alone). round highscore indicate trend value chart 1 0.4429 0.0001 0.0005 0.0000 0.8531 2 0.5891 0.3804 0.0002 0.0000 0.5414 3 0.6216 0.2168 0.0507 0.0000 0.3622 4 0.4200 0.4037 0.0133 0.0000 0.0577 sum 0.4947 0.1539 0.0007 0.0000 0.3935 (b) welch’s t-test p-values of comparison of score means for each round to control group with all complete datasets without 6 outliers (n = 94). alternative hypothesis was that mean of control group is lower. with α = 0.05, only value group is significantly better than control group in all rounds. however, trend group misses significance only in round 3 by narrow margin. control highscore indicate trend value chart mean -31807.3 -32308.6 -27065.5 -31202.2 -32194.4 -29073.8 ks test 0.2192 0.6468 0.5051 1.0000 0.6880 0.9652 t-test — 0.8988 0.1455 0.8231 0.9335 0.4110 (c) comparison of use of potential by feedback groups in first month for all complete datasets without 6 outliers (n = 94): no significant differences between groups. values can be considered to be normally distributed. table a.4. different statistical tests. bold valus of the test statistics indicate significance (α = 0.05). 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 20 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training round control highscore indicate trend value chart 1 599.1 767.5 359.9 1286.5 -91.3 2366.5 2 1140.9 965.4 714.2 725.0 22.2 610.7 3 814.4 350.8 104.6 -616.5 -448.5 294.7 4 3717.4 2837.5 1304.5 847.9 78.0 1097.9 feedback rounds sum 1740.0 1732.9 1074.1 2011.5 -69.1 2977.2 performance rounds sum 4531.8 3188.3 1409.2 231.4 -370.5 1392.6 total sum 6271.8 4921.2 2483.2 2242.9 -439.6 4369.9 (a) means round control highscore indicate trend value chart 1 0.1551 0.2901 0.7662 0.4528 0.0748 0.0493 2 0.5016 0.9603 0.9348 0.4203 0.6070 0.6826 3 0.8186 0.9434 0.7300 0.7786 0.4601 0.9627 4 0.9961 0.8713 0.8615 0.9498 0.9832 0.6299 (b) kolmogorov-smirnov test round control highscore indicate trend value chart 1 0.1051 0.1820 0.2194 0.0036 0.6708 0.0960 2 0.0002 0.0248 0.1263 0.0045 0.4718 0.0787 3 0.0002 0.1528 0.3853 0.9399 0.9646 0.2284 4 0.0000 0.0016 0.0435 0.1053 0.4542 0.0858 (c) welch’s t-test for µ>0 round control highscore indicate trend value chart 1 0.8949 0.8180 0.7806 0.9999 0.3292 0.9040 2 0.9998 0.9752 0.8737 1.0000 0.5282 0.9213 3 0.9998 0.8472 0.6147 0.0601 0.0354 0.7716 4 1.0000 0.9984 0.9565 0.8947 0.5458 0.9142 (d) welch’s t-test for µ<0 table a.5. parameter m by feedback groups for all complete datasets without 6 outliers (n = 94): means, welch’s t-test, and kolmogorov-smirnov test. the values of all groups can be considered to be normally distributed in all rounds except for chart group in round 1. trend group is the only group with a significant learning effect in the first two rounds, value group the only one with a significantly decreasing performance in round 3. bold valus of the test statistics indicate significance (α = 0.05). 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 21 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training round mean t-test µ > 0 1 879.1 0.0016 2 789.9 0.0000 3 154.0 0.1365 4 1991.2 0.0000 table a.6. regression m for all complete datasets without 6 outliers (n = 94): means and welch’s t-test results (α = 0.05). participants show significant learning effects in all rounds except for round 3, in which especially value group is significantly < 0. round low mid high 1 305.2 924.5 1365.9 2 1439.3 637.2 433.2 3 549.4 148.2 -230.2 4 4409.5 1888.7 -230.4 feedback 1744.4 1561.7 1799.2 performance 4958.9 2036.9 -460.7 sum 6703.3 3598.6 1338.5 table a.7. means for regression m according to performance in performance rounds (low: below lower quartile, mid: between lower and higher quartile, high: above higher quartile) for all complete datasets without 6 outliers (n = 94): high performers have the highest mean for m in the first round and the lowest in all other rounds. claim answer correct wrong don’t know motivation of employees plays an important role. false 56% 28% 16% maintenance is an important intervention possibility. false 55% 26% 19% the higher the shirt price is, the lower is the demand. true 41% 45% 14% opening and closing sites are important intervention possibilities. true 90% 3% 6% it is wise to dismiss employees at the end. true 31% 33% 36% table a.8. survey on model properties at the end of task. the participants were told that “we would like to ask you a few questions once again. your answers will help us very much and it only takes two minutes. [. . . ] please decide if the following propositions are correct or wrong according to your experience from all four rounds.” participants could always choose between true, false, and don’t know. the content of the five items can be found in the claim column, the correct answer is shown in the corresponding column. the remaining columns show the ratio of correct, wrong and don’t know answers among all participants. differences to 100% are due to rounding. round high score mid score low score high > low high > mid mid > low 1 3.17 2.50 2.79 0.1477 0.0205 0.8417 2 3.42 2.65 2.25 0.0004 0.0063 0.0770 3 3.46 2.74 2.04 0.0000 0.0061 0.0068 4 3.50 2.70 2.08 0.0000 0.0023 0.0142 sum 3.33 2.80 2.04 0.0001 0.0384 0.0035 (a) means of model knowledge for participants with high (i.e., best 25%), mid (between 1st and 3rd quartile), and low (i.e., worst 25%) score in the corresponding round with all complete datasets without 6 outliers (n = 94). pairwise comparison of means by welch’s t-test with α = 0.05 shows, that high scorers know significantly more about the model than mid or low scorers. round high score mid score low score high > low high > mid mid > low 1 0.75 1.07 0.79 0.4376 0.0909 0.8716 2 0.58 1.07 0.96 0.0711 0.0181 0.6733 3 0.67 0.87 1.25 0.0097 0.1667 0.0633 4 0.71 0.93 1.08 0.0820 0.1612 0.2740 sum 0.67 0.98 1.04 0.0696 0.0930 0.3924 (b) means of model uncertainty for participants with high (i.e., best 25%), mid (between 1st and 3rd quartile), and low (i.e., worst 25%) score in the corresponding round with all complete datasets without 6 outliers (n = 94). uncertainty means are lower for high scorers. pairwise comparison of means by welch’s t-test with α = 0.05 barely shows significance, however. table a.9. different tests for model uncertainty and model knowledge. bold valus of the test statistics indicate significance (α = 0.05). 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 22 http://dx.doi.org/10.11588/jddm.2017.1.34608 engelhart et al.: optimization-based training round low (0/1) mid (2/3) high (4/5) 1 101085.2 42407.9 110944.2 2 -51748.1 -40915.9 10319.9 3 108448.1 135943.2 200281.3 4 80163.6 214925.1 366269.3 (a) mean score values for different levels of model knowledge round low (0/1) mid (2/3) 1 69516.5 86706.5 2 -22096.4 -42847.0 3 159269.1 124416.9 4 259346.6 171036.4 (b) mean score values for different levels of model uncertainty r low < high low < mid mid < high 1 0.3740 0.9743 0.0188 2 0.0005 0.2657 0.0010 3 0.0004 0.1447 0.0007 4 0.0001 0.0223 0.0001 (c) student’s t-test p-values for model knowledge round mid < low 1 0.7335 2 0.1221 3 0.1020 4 0.0626 (d) student’s t-test p-values for model uncertainty table a.10. scores for different model knowledge and uncertainty levels (r: round) with all complete datasets without 6 outliers (n = 94). with α = 0.05, participants with high model knowledge have achieved a significantly better score in almost all rounds. for model uncertainty, no significant score differences have been observed. property co hi in tr va ch all knowledge low 24% 8% 44% 5% 18% 9% 17% mid 59% 54% 33% 48% 36% 73% 52% high 17% 38% 22% 48% 45% 18% 31% mean 2.38 3.00 2.22 3.19 3.09 2.64 2.74 t-test — 0.0451 0.6241 0.0113 0.0824 0.2377 — uncertainty low 72% 69% 67% 95% 82% 64% 77% high 28% 31% 33% 5% 18% 36% 23% mean 1.03 1.15 1.22 0.38 0.91 1.09 0.91 t-test — 0.6525 0.6630 0.0017 0.3545 0.5545 — table a.11. ratio of model knowledge and uncertainty levels for all feedback groups (co: control, hs: highscore, in: indicate, tr: trend, va: value, ch: chart) with all complete datasets without 6 outliers (n = 94). mean refers to mean uncertainty and knowledge per group. alternative hypothesis for welch’s t-test was that mean of control group is lower (knowledge) or higher (uncertainty) respectively. for α = 0.05, only trend group is significantly better in both knowledge and uncertainty. differences to 100% are due to rounding. round low mid high 1 93.3 1013.0 1086.4 2 666.2 888.5 691.6 3 111.0 301.0 -70.5 4 3257.4 1979.4 1312.6 (a) means for regression m low < high mid < high low < mid 0.0207 0.4518 0.0686 0.4788 0.7564 0.3262 0.6712 0.8680 0.3020 0.9828 0.8480 0.9291 (b) welch’s t-test table a.12. regression m according to model knowledge (low: below lower quartile, mid: between lower and higher quartile, high: above higher quartile) for all complete datasets without 6 outliers (n = 94): those with low model knowledge learned less in the first round, and more in the last round. in comparison with high group, this is significant. 10.11588/jddm.2017.1.34608 jddm | 2017 | volume 3 | article 2 | 23 http://dx.doi.org/10.11588/jddm.2017.1.34608 references original research modeling decisions from experience: how models with a set of parameters for aggregate choices explain individual choices neha sharma and varun dutt applied cognitive science laboratory, indian institute of technology mandi, kamand, india 175005 one of the paradigms (called “sampling paradigm”) in judgment and decision-making involves decision-makers sample information before making a final consequential choice. in the sampling paradigm, certain computational models have been proposed where a set of single or distribution parameters is calibrated to the choice proportions of a group of participants (aggregate and hierarchical models). however, currently little is known on how aggregate and hierarchical models would account for choices made by individual participants in the sampling paradigm. in this paper, we test the ability of aggregate and hierarchical models to explain choices made by individual participants. several models, ensemble, cumulative prospect theory (cpt), best estimation and simulation techniques (beast), natural-mean heuristic (nmh), and instance-based learning (ibl), had their parameters calibrated to individual choices in a large dataset involving the sampling paradigm. later, these models were generalized to two large datasets in the sampling paradigm. results revealed that the aggregate models (like cpt and ibl) accounted for individual choices better than hierarchical models (like ensemble and beast) upon generalization to problems that were like those encountered during calibration. furthermore, the cpt model, which relies on differential valuing of gains and losses, respectively, performed better than other models during calibration and generalization on datasets with similar set of problems. the ibl model, relying on recency and frequency of sampled information, and the nmh model, relying on frequency of sampled information, performed better than other models during generalization to a challenging dataset. sequential analyses of results from different models showed how these models accounted for transitions from the last sample to final choice in human data. we highlight the implications of using aggregate and hierarchical models in explaining individual choices from experience. keywords: aggregate choice, individual choice, sampling paradigm, decisions from experience, computational models, likelihood with the advent of internet, online shopping forproducts has gained popularity (stevens, 2016). for making satisfying online purchases, a consumer could first sample information about different products and then make a choice for the preferred item (horrace et al., 2009). however, the act of making choices based upon sampled information is not limited to choosing between different products; rather, it is a very common exercise involving different facets of our daily lives (e.g., choosing food items, life partners, and careers). in fact, information search before a choice constitutes an integral part of decisions from experience (dfe) research, where the focus is on explaining human decisions based upon one’s experience with sampled information (hertwig & erev, 2009). to study people’s information search and consequential choice behaviors in the laboratory, researchers have proposed the “sampling paradigm” (hertwig & erev, 2009). in the sampling paradigm, people are presented with two or more options to choose between. these options are represented as blank buttons on a computer screen. people are first asked to sample as many outcomes as they wish from different button options (information search). once people are satisfied with their sampling of options, they decide from which option to make a single consequential choice for actual awards. several computational cognitive models have been proposed in the sampling paradigm, where these models help explain how people search for information and make consequential choices (erev et al., 2010; gonzalez & dutt, 2011). some of these models have a set of parameter values calibrated to each individual participant (called “individual models”; busemeyer & diederich, 2010; kudryavtsev & pavlodsky, 2012; frey, mata, & hertwig, 2015). the parameter calibration exercise in these models results in a set of parameter values per individual participant, where the number of parameter sets from a model equal the number of participants in data. for example, kudryavtsev and pavlodsky (2012) tested three variations of two models, prospect theory (pt) (kahneman & tversky, 1979) and expectancy-valence (evl) (busemeyer & stout, 2002) by calibrating model parameters to each participant’s choice. as another example, shteingart, neima and loewenstein (2013) modeled many repeated choices of individual participants in the corresponding author: varun dutt, applied cognitive science laboratory, indian institute of technology mandi, kamand, district mandi 175 005, h.p., india. e-mail: varun@iitmandi.ac.in 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 1 mailto:varun@iitmandi.ac.in http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices technion prediction tournament (tpt) dataset, considering a specific reinforcement-learning algorithm. these authors showed that there was a substantial effect of the first experience on choice behavior and this behavior could be accounted by the reinforcement learning model if the outcome of first experience rested the values of the experienced actions. similarly, frey, mata, and hertwig (2015) presented a modeling analysis at the individual level showing that a simple delta-learning rule model with parameters calibrated to younger and older adults separately best described the learning processes for both these age groups. furthermore, certain computational models have been proposed where model parameters are calibrated to the choice proportions of a group of participants (called “aggregate models”; busemeyer & diederich, 2010; estes & maddox, 2005). here, a single set of values of model parameters is calibrated to the average decision computed across several participants (busemeyer & diederich, 2010; erev, ert, roth, et al., 2010; gonzalez & dutt, 2011; 2012; lejarraga, dutt, & gonzalez, 2012). the calibration exercise results in only one set of values of parameters from a model and these parameters explain the averaged decision computed across all participants. for example, gonzalez and dutt (2011) calibrated one set of values for three parameters in an instance-based learning (ibl) model to the risky proportions averaged over all participants and problems in different dfe datasets. similarly, erev et al. (2010) compared several models, each with a single set of values for parameters, in their ability to capture average risk-taking in the tpt datasets. there is still a third approach to model calibration where model parameters follow certain distributions (possessing density functions) that are defined across the choice proportions of a group of participants (called “hierarchical models”; lee, 2008; rouder & lu, 2005). for example, in the choice prediction competition (erev, ert, plonsky, et. al., 2015), the best estimation and simulation techniques (beast) model was hierarchical and it contained a set of distribution parameters that were calibrated to the choice proportions across many participants. although literature has focused on calibrating parameters of individual, hierarchical, and aggregate models (estes & maddox, 2005; gonzalez & dutt, 2011; rouder & lu, 2005); however, little is currently known on how aggregate or hierarchical models and their set of single or distribution parameter values, respectively, account for decisions of individual participants. in this paper, we address this question by considering both aggregate and hierarchical models with a set of single or distribution parameter values and evaluate how these models explain individual choices. we perform our evaluation by calibrating and generalizing a set of parameter values in aggregate or hierarchical models to choices made by individual participants in large publically available datasets in the sampling paradigm. for example, the aggregate ibl model consists of a set of two parameters, d and σ where these two parameters possess single values and explain the average risk-taking in dfe datasets (dutt & gonzalez, 2012; gonzalez & dutt, 2011; 2012). in this paper, however, we recalibrate the d and σ parameters in the ibl model by assigning them a value each to predict individual choices in dfe datasets. the aggregate models that possess a single set of parameter values and predict aggregated choices, i.e., choices that are averaged over several participants, may or may not explain individual choices well. one reason for this expectation is that if several individuals learn linearly at different points in time, then the average learning curve is likely to be curvilinear (gallistel et al., 2004). thus, even if models with a single set of parameter values explain a group’s aggregate curvilinear learning, it is possible that such models may not explain individual linear behavior. another reason why these models may not explain individual behavior is due to the degree of heterogeneity present in individual choices (busemeyer & diederich, 2010): a single set of parameter values may not be sufficient to explain many individual choices. however, hierarchical models possess a set of distribution parameters. if these models account for aggregate choices, then they are also likely to account for individual choices. that is because the parameter values are resampled in a hierarchical model from their density functions for each individual participant and this resampling may allow these models to account for individual choices. in addition, there seems to be a tradeoff between aggregate models (like ibl; dutt & gonzalez, 2012) that possess cognitive mechanisms (like recency, frequency, and blending of outcomes) and a single set of parameter values that are fixed across individuals; and, hierarchical models (like beast; erev, ert, plonsky, et al., 2015) that possess mathematical functions to account for individual biases with a set of parameters that vary across individuals according to distributions. on one hand, one expects that aggregate models with cognitive mechanisms and a set of single parameters would account for individual choices; however, one may also expect that hierarchical models with mathematical functions and a set of distribution parameters would also account for individual choices. in this paper, we test these expectations by taking both aggregate and hierarchical models where these models’ parameters are calibrated to individual choices. furthermore, using the sampling paradigm, we also evaluate the sequential decisions of participants from their last sample to final choice as accounted by different aggregate and hierarchical models. this sequential analysis helps us showcase the ability of aggregate models in accounting for individual differences in decisions with a set of single or distribution parameters. to calibrate aggregate and hierarchical model parameters to individual choices, we use the estimation dataset from tpt (erev, ert, roth, et al., 2010), the largest publically available dfe dataset. we compare calibrated aggregate and hierarchical models by generalizing them to two different dfe datasets in the 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 2 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices sampling paradigm. furthermore, we investigate an aggregate or hierarchical model’s ability in capturing individual differences in data with a set of single or distribution parameter values. in what follows, we first motivate our model choices, different datasets used, and the working of different models. furthermore, we discuss the method used for calibrating a set of single or distribution parameters in models to choices made by individual participants. finally, we present the results of model evaluations both during calibration and generalization and close the paper by discussing the implications of our results for predicting individual choices from experience. models in the sampling paradigm two classes of models have been proposed in the sampling paradigm (hertwig, 2012): associative-learning models (e.g., instance-based learning) and cognitive heuristics (e.g., natural mean heuristic). among the associative-learning class, human choice is conceptualized as a learning process (for example, see busemeyer & myung, 1992; bush & mosteller, 1955). learning is captured by changing the propensity to select a gamble based on the experienced outcomes. good experiences boost the propensity of choosing the gamble associated with them, and bad experiences diminish it (e.g., barron & erev, 2003; denrell, 2007; erev & barron, 2005; march, 1996). some of the models in the associative class include the instance-based learning (ibl) model (dutt & gonzalez, 2012; gonzalez and dutt, 2011; 2012, lejarraga, dutt, & gonzalez, 2012), value updating model (hertwig et al., 2004), and fractional adjustment model (march, 1996). the ibl model (dutt & gonzalez, 2012; gonzalez & dutt, 2011; 2012, lejarraga, dutt, & gonzalez, 2012) consists of experiences (called instances) stored in memory. each instance’s activation is a function of the frequency and recency of the corresponding outcomes observed during sampling in different options, where the activation function is borrowed from the adaptive control of thought rational (act-r) cognitive framework (anderson & lebiere, 1998). activations are used to calculate the blended value for each option and the model makes a final choice for the option with the highest blended value. gonzalez and dutt (2011; 2012) showed that an aggregate ibl model with three parameters performed efficiently in accounting for choices aggregated over many participants across two dfe paradigms. in fact, this ibl model was overall the best model in explaining aggregate choices with fewest parameters. the second class of models are referred to as cognitive heuristics and this class aims to describe both the process and outcome of choice as heuristic rules (brandstätter et al., 2006; hertwig, 2012). a popular cognitive heuristic that focuses on the expectedvalue of outcomes obtained during sampling is the natural-mean heuristic (nmh) (hertwig & pleskac, 2010; hertwig, 2012). as per hertwig (2012), the nmh model has the following interesting properties: (a) it is well tailored to sequentially encountered outcomes; and, (b) it arrives at its choice prediction by the expected-value of options based upon sampled outcomes. two other heuristics proposed in the cognitiveheuristic class include the maximax heuristic (hau et al., 2008) and the lexicographic heuristic (luce & raifa, 1957). in maximax heuristic, the option with best possible outcome, no matter how likely it is, is chosen. a lexicographic heuristic generally consists of three building blocks (gigerenzer & goldstein, 1996): search rule: look up attributes in order of validity. stopping rule: stop search after the first attribute discriminates between alternatives. decision rule: choose the alternative that this attribute favors. hau et al. (2008) and brandstätter et al. (2006) have shown that both these heuristics seem to underperform compared to the nmh model. furthermore, a very commonly used baseline heuristic is the primed-sampler (ps) model (erev, glozman & hertwig, 2008). the ps model depends upon the recency of sampled information and it looks few samples back on each option during sampling before making a final choice (gonzalez & dutt, 2011). a variant of the ps model is the ps model with variability (erev, ert, roth, et al., 2010). in this model variant, the lookback sample size k is varied between participants and problems. the ps model with variability is a special case of the nmh model (as the nmh model looks back the entire sample size while deriving a choice). furthermore, hau et al. (2008) have shown that a cumulative prospect theory (cpt) model (tversky & kahneman, 1992), which is a popular mathematical model (sometimes referred to as a “measurement model” or an “as-if” model), seems to perform about the same as the nmh model to account for aggregated choices. in the cpt model, a weighing function and a value function is associated with each probability and outcome, respectively. the model chooses the option that has the highest prospect value, where the prospect value is determined by multiplying the value with its corresponding weight. furthermore, a linear combination heuristic model (ensemble) was submitted to tpt (erev, ert, roth, et al., 2010). the ensemble model contains four heuristic rules, ps, cpt, priority heuristic (ph), and nmh, and it was shown to be the best model in the sampling paradigm. most recently, erev, ert, plonsky, et al. (2015) proposed the beast model, which consisted of several heuristic rules like expected value and mental simulations with a set of distribution parameters. the beast model performed well to capture 14-different aggregate phenomena in the 2015 choice prediction competition. the 14-different aggregate phenomena refer to anomalies such as ellsberg paradox, allais paradox, reflection effect and others described by erev, ert, plonsky, et al. (2015). across the associative-learning models, mathematical models, and cognitive heuristics, there are aggregate models that possess a single set of parameter values and predict aggregated choices, i.e., choices that 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 3 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices are averaged over several participants (busemeyer & wang, 2000; dutt & gonzalez, 2012; 2015; gonzalez & dutt, 2011; 2012; lejarraga, dutt, & gonzalez, 2012). also, there exist hierarchical models that possess a set of distribution parameters to predict aggregated choices, i.e., choices that are averaged over several participants (erev, ert, plonsky, et al., 2015; lee, 2008; rouder & lu, 2005). the ibl, nmh, and cpt models are aggregate models (possessing a set of single parameter values); whereas, the beast and ensemble models are hierarchical models and they possess a set of distribution parameter values. within the aggregate and hierarchical models, some of the models (like ibl) possess cognitive processes like recency, frequency, or blending; whereas, other models (like cpt and ensemble) possess mathematical functions that account for biases in people’s decisions. if possessing a set of distribution parameter values helps models to account for individual choices, then we expect hierarchical models like beast and ensemble to perform well in explaining individual choices. in contrast, if possessing cognitive mechanisms helps models account for individual choices, then we expect models like ibl to perform well in explaining individual decisions. in contrast, if mathematical functions can accurately account for biases in individual decisions, then we expect that models like cpt and ensemble to perform well in explaining individual choices. we test these expectations in this paper by calibrating different models to human data in large datasets involving the sampling paradigm. model selection among all associative-learning models, the ibl model (dutt & gonzalez, 2012; lejarraga, dutt, & gonzalez, 2012) has been shown as the best performing aggregate model in the sampling paradigm (gonzalez & dutt, 2011; 2012). gonzalez and dutt (2011) showed that the ibl model accounts for aggregate final choices with a small error. thus, we choose the ibl model as one of the models for our evaluation. for this purpose, we first test the original ibl model (called ibl (ldg) model; lejarraga, dutt, & gonzalez, 2012) in explaining individual choices with a set of parameter values. next, we recalibrated a set of parameter values of this model to individual choices (called ibl (tpt) model) in the tpt dataset. popular maximax and lexicographic heuristics (hau et al., 2008; luce & raifa, 1957) have underperformed compared to the nmh model (brandstätter et al., 2006; hau et al., 2008). the nmh model has been reported in literature as explaining aggregate final choices in the sampling paradigm (hau et al., 2008; hertwig, 2012). thus, we chose the nmh model as another aggregate model for evaluating individual choices. furthermore, hau et al. (2008) have also shown that different variants of the cpt model (tversky & kahneman, 1992) perform about the same as the nmh model to account for aggregate choices. due to these reasons, we consider three variants of cpt model for our evaluation. the first, cpt (tk) model, is based upon parameters defined by tversky and kahneman (1992). the second, cpt (hau) model, is based upon recalibrated parameters from hau et al. (2008) to derive aggregated final choices. the third, cpt (tpt) model, has its parameters recalibrated to individual choices in the tpt dataset. erev et al. (2010) have shown the hierarchical ensemble model, consisting of the ps, cpt, ph, and nmh models, to perform best in tpt’s e-sampling condition.1 given that the ensemble model contains a collection of several popular heuristic models, we consider two variants of this model for our evaluation: ensemble (tpt) model, which used parameters proposed by erev et al. (2010); and, ensemble (individual), where we recalibrated a set of parameter values of this model to individual choices in the tpt dataset. in addition to the above models, we also considered the hierarchical beast model, which has recently been shown to account for 14-different phenomena in aggregate choices (erev, ert, plonsky, et. al., 2015). we considered two variants of the beast model: beast (cpc) model, which was based on the same set of distribution parameters as reported by erev, ert, plonsky, et. al. (2015); and, beast (tpt), which consisted of a set of distribution parameters calibrated to individual choices in the tpt dataset. the technion prediction tournament datasets the technion prediction tournament (tpt) (erev et al., 2010) was a competition in which several participants were subjected to an experimental setup, the e-sampling condition. in this condition, participants sampled the two blank button options in a problem before making a final consequential choice for one of the options. during sampling, participants were free to click both button options one-by-one and observe the resulting outcome. participants were asked to press the "choice-stage" key when they felt that they had sampled enough (but not before sampling at least once from each option). the outcome of each sample was determined by the structure of the relevant problem. one option corresponded to a choice where each sample provided a medium (m) outcome. the other option corresponded to a choice where each sample provided a high (h) payoff with some probability (ph) or a low (l) payoff with the complementary probability (1 ph). at the choice stage, participants were asked to select once between the two options. their choice yielded a random draw of one outcome from the selected option and this outcome was considered at the end of the experiment to determine the final payoff. competing models submitted to tpt were evaluated 1the cpt model within this ensemble model estimates the weighting function using approximations. 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 4 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices following the generalization criterion method (busemeyer &wang, 2000). as per the generalization criterion method, models were calibrated to aggregate human choices in 60 problems (the estimation set) and later tested in a new set of 60 problems (the competition set) with the set of parameters obtained in the estimation set. the m, h, ph, and l in a problem were generated randomly, and a selection algorithm was used so that the 60 problems in each set differed in its m, h, ph, and l from other problems. for more details about the tpt, please refer to erev, ert, roth, et al. (2010). in all the models described here, we have considered an individual human or model participant playing a problem in a dataset as an individual observation. also, all model parameters have been calibrated by using the estimation dataset from tpt that consisted of 60 problems and 1,170 observations.2 in the experiment involving the tpt’s estimation dataset, forty participants were randomly assigned to two different sub-groups, where each sub-group contained 20 participants who were presented with a representative sample of 30 problems. next, calibrated models were generalized on 60 problems from tpt’s competition set (composed of 1,200 observations) and the six-problems (sp) dataset (hertwig et al., 2004; composed of 150 observations). in the experiment involving the tpt’s competition dataset, forty new participants were randomly assigned to two different subgroups, where each sub-group contained 20 participants who were presented with a representative sample of 30 problems. in the experiment involving the sixproblems (sp) dataset, fifty participants were equally divided into two groups, where one group played the first three problems and the other group played the remaining three problems. working of models in this section, we detail the working of aggregate or hierarchical models with a set of point or distribution parameters values calibrated to individual choices. in every model, the final choice for each individual observation is estimated by using the following softmax function (bishop, 2006; daw, 2011; sutton & barto,1998): prob(optionx) = esm eanx esm eanx + esm eany (1) where, smeanx and smeany are the sample means or expectations of the two options x and y for a model participant in a problem; and, prob(option x) is the probability of choosing option x by a model participant. if option x was chosen by a human participant in a problem, then the prob(option x) is used to calculate the log-likelihood from a model given its parameters. the log-likelihood function l is defined as: l = n∑ i=1 ln (prob(optionxi)) (2) where, i refers to the ith observation (a combination of a participant playing a problem) and n is the total number of observations in human data.3 the refers to the natural log and the log-likelihood is negative as prob(option x) is a proportion. the log-likelihoods measure the goodness-of-fit for individual choices from a model and greater log-likelihoods values imply better fits from a model (busemeyer & diederich, 2010). as suggested by busemeyer and diederich (2010), in this paper, to calibrate aggregate or hierarchical model parameters, we minimize l . that is because our goal is to derive the likelihood of a model making the same choice as made by a human participant. we detail more about this calibration process in a future section. next, we detail the working of models that we considered for evaluating individual choices. ensemble model the ensemble model (erev et al., 2010) assumes that each choice is made based on one of four equally likely rules and the predicted choice rate is a simple average across the predictions of four different rules. the first rule is similar to the primed-sampler model with variability (erev, glozman, & hertwig, 2008). decisionmakers are assumed to sample each option m times, and select the option with the highest sample mean. the value of m is uniformly drawn from the set 1, 2, 3,. . . , 9. the second rule is identical to the first, but m is drawn from the distribution of sample sizes observed in the estimation set, with samples larger than 20 treated as 20. the third rule in the ensemble model is a stochastic variant of cpt (tversky & kahneman, 1992), where the weighting function is approximated based upon certain parameters (the model does not use the sampling data to determine the weighting function). the final rule is a stochastic version of the lexicographic priority heuristic (brandstätter et al., 2006; rieskamp, 2008). the probabilities with which search orders for final rule were porder1 and porder2. the first order begins by comparing minimum outcomes (i.e., minimum gain or minimum loss depending on the domain of gambles), then their associated probabilities, and finally the maximum outcomes. the second order begins with probabilities of the minimum outcomes, then proceeds to check minimum outcomes, and ends with the maximum outcomes (the probabilities with which both search orders are implemented were determined from the estimation set). the ensemble model 2the data of one observation was missing in the original estimation dataset downloaded from the website. 3n = 1,170 observations in tpt’s estimation set. 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 5 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices computes expectations for choosing options from its constituting models. these expectations are averaged to give a net expectation. given a human participant’s choice, the net expectation (averaged across all rules) is used to calculate the log-likelihood (using equation 2). in one version of the ensemble model (called ensemble (herzog)), we used the original parameters proposed in erev et al. (2010) to evaluate the model against individual choices. however, in a second version of the same model (called ensemble (tpt)), we calibrated a set of ensemble model’s distribution parameters to individual choices using the log-likelihood function. the ensemble (tpt) model had a set of 11 parameters. these parameters were assigned single values when they were recalibrated to individual choices. among the 11 parameters, α, β,γ, δ, λ, µ belonged to the stochastic variant of cpt; while, parameters to, tp, σ, porder1, and porder2 were part of the priority heuristic. the σ was a free distribution parameter that defined the variance of a normal distribution. if the subjective difference involving the first comparison in each search order exceeds a threshold t, then the more attractive option is selected based on this comparison; otherwise, the next comparison is executed. the values of thresholds are other free distribution parameters. the estimated values are to for the minimumand maximumbased comparisons, and tp for the probability-based comparison (both to and tp enabled define the mean of the normal distribution). the α, β, γ, δ, λ, µ, σ and to parameters were varied between 0 and 1.5, σ and to were varied between 0 and 1 while probabilistic parameters poder1, porder2 and tp were varied between 0 and 1.0. these ranges ensured that the optimization could capture the optimal parameter values with high confidence. during model calibration, the initial parameter population was set to parameters from erev et al. (2010). natural mean heuristic (nmh) model the nmh model (hertwig & pleskac, 2010) involves the following steps: step 1. calculate the natural mean of observed outcomes for each option by summing, separately for each option, all n experienced outcomes and then dividing by n. step 2. apply equation 1, where the sample mean for an option is replaced by its natural mean. in the nmh model, there are no free parameters. like the ensemble model, we evaluate the log-likelihood value from the nmh model (using equation 2). instance-based learning (ibl) model the ibl model (dutt & gonzalez, 2012; gonzalez & dutt, 2011; 2012; lejarraga, dutt, & gonzalez, 2012) is based upon the act-r cognitive framework (anderson & lebiere, 1998). in this model, every occurrence of an outcome of an option is stored in the form of an instance in memory. an instance is made up of the following structure: sdu, here s is the current situation (many blank option buttons on a computer screen), d is the decision made in the current situation (choice for one of the option buttons), and u is the goodness (utility) of the made decision (the outcome obtained upon making a choice for an option). when a decision choice needs to be made, instances belonging to each option are retrieved from memory and blended together. the blended value of an option j (e.g., a gamble that pays $5 with 0.9 probability or $0 with probability 0.1) at any trial t is defined as: vj,t = n∑ i=1 pi,j,txi,j,t (3) where xi,j,t is the value of the u (outcome) part of an instance (e.g., either $5 or $0, in the previous example) i on option j in trial t. the pi,j,t is the probability of retrieval of instance i on option j from memory in trial t. because xi,j,t is value of the u part of an instance i on option j in trial t, the number of terms in the summation changes when new outcomes are observed within an option j (and new instances corresponding to observed outcomes are created in memory). thus, n=1 if j is an option with one possible outcome. if j is an option with two possible outcomes, then n=1 when one of the outcomes has been observed on an option (i.e., one instance is created in memory) and n=2 when both outcomes have been observed (i.e., two instances are created in memory). in any trial t, the probability of retrieval of an instance i on option j is a function of the activation of that instance relative to the activation of all instances (1, 2, . . . n) created within the option j, given by: pi,j,t = e(ai,j,t)/τ∑n i=1 e (ai,j,t)/τ (4) where τ, is random noise defined as σ ∗ √ 2 and σ is a free noise parameter. noise in equation (4) captures the imprecision of recalling past experiences from memory. activation of an instance is a function of the frequency and recency of observed outcomes that occur on choosing options during sampling. the activation of an instance i corresponding to an observed outcome on an option j in a given trial t is a function of the frequency of the outcome’s past occurrences and the recency of the outcome’s past occurrences (as done in act-r). in each trial t, activation ai,j,t of an instance i on option j is given by: 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 6 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices where d is a free decay parameter; γi,j,t is a random draw from a uniform distribution bounded between 0 and 1 for instance i on option j in trial t; and tp is each of the previous trials in which the outcome corresponding to instance i was observed in the binary-choice task. the ibl model has two free parameters that need to be calibrated: d and σ. the d parameter controls the reliance on recent or distant sampled information. thus, when d is large (> 1.0), then the model gives more weight to recently observed outcomes in computing instance activations compared to when d is small (< 1.0). the σ parameter helps to account for the sample-to-sample variability in an instance’s activation. thus, blended value of each option is a function of activation of instances corresponding to outcomes observed on the option. in this model, we feed the sampling of individual human participants to generate instance activations and blended values. every time a choice is made and outcome is observed, the instance associated with it is activated and thereafter blended values are computed for options faced by an individual participant. at final choice, the likelihood is computed from the blended values that replace the option means in equation 1. in one version of the model, ibl (ldg), we used single value of parameters suggested by lejarraga, dutt, and gonzalez (2012) to test the model against individual choices. however, in a second version of the model, ibl (tpt), we calibrated a set of d and σ parameters in the ibl model to individual choices. for this calibration, we determine the model’s log-likelihood value for making the same choice as made by each human participant. during optimization, both d and σ parameters were varied between 0 and 20. these ranges ensured that the optimization could capture the optimal parameter values with high confidence. during parameter calibration, the initial parameter population was set to parameters from lejarraga, dutt, and gonzalez (2012). cumulative prospect theory (cpt) model the cpt model (hau et al., 2008; tversky & kahneman, 1992) assumes that people first form subjective beliefs of the probability of events, and then enter these beliefs into cumulative prospect theory’s weighting function (fox & tversky, 1998; tversky & fox, 1995). similarly, people associate a value (utility) corresponding to outcomes observed in options. the cpt consists of the following four steps: step 1. assess the sample probability, pj, of the nonzero outcome in given option j. step 2. calculate the expected gain (loss) of option j, ej ej = w(pj)v(xj) (6) where w represents a weighting function for the probability experienced in the option j, and v represents a value function for the experienced outcome xj in the option j. according to tversky and kahneman (1992), the weighting function w is defined as: w(pj) =   p γ j( p γ j + (1 −pj)γ )1/γ ,if x ≥ 0 pδj( pδj + (1 −pj)δ )1/δ ,if x < 0 (7) the γ and δ are adjustable parameters that fit the shape of the function for gains and losses, respectively. the weighting function w has an s-shape that underweights small probabilities and overweighs larger ones (hertwig, 2012). the x represents the outcome associated with the probability pj. the value function v is defined as: v(xj) = { xαj ,if xj ≥ 0 −λ ( |xj|β ) ,if xj < 0 (8) here, α and β are adjustable parameters that fit the curvature for gain and loss domains, respectively. finally, the λ parameter (λ > 1) scales loss aversion. the xj represents the outcome associated with the option j. step 3. assess the prospect value of the option by multiplying the weight with the value obtained. step 4. given a human participant’s choice, calculate the log-likelihood value of model making this choice using equation 1 and equation 2. the prospect value replaces the sample mean in equation 1. as seen above, the cpt model has 5 parameters, α, β, γ, δ, and λ; and, we investigated three versions of cpt model. in the first model, cpt (tk), we tested the set of parameter values estimated by tversky and kahneman (1992) against individual choices. in the second model, cpt (hau), we tested the set of parameter values estimated by hau et al. (2008) against individual choices. in the third model, cpt (tpt), we recalibrated a set of parameter values in the cpt model to individual choices. all five parameters were varied between 0 to 5. these ranges ensured that the optimization could capture the optimal parameter values with high confidence. during calibration, the initial parameter population was set to parameters from hau et al. (2008). best estimate and simulation techniques (beast) model the beast model captures the joint effect of and the interaction between 14-choice phenomena at aggregate level discussed in the 2015 choice prediction competition (erev, ert, plonsky, et. al., 2015). the first assumption in this model is to compute the expected 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 7 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices values of options (since people try to maximize payoffs). the second assumption uses mental simulations that were found to lead to good outcomes in similar situations in the past (marchiori, di guida, & erev, 2015; plonsky, teodorescu, & erev, 2015). each simulation uses four different techniques, unbiased, uniform, contingent pessimism, and sign. the unbiased technique implies random and unbiased draws, either from an option’s described distributions or from an option’s observed history of outcomes. the other three techniques are “biased” and imply overgeneralizations. they can be described as mental draws from distributions that differ from the objective problem distributions. the three biased techniques are each used with equal probability. the simulation technique uniform yields each of the possible outcomes with equal probability. this technique enables the model to capture underweighting of rare events and the splitting effect.4 the simulation-technique contingent pessimism is like the priority heuristic (brandstätter et al., 2006); it depends on the sign of the best possible payoff and the ratio of the minimum payoffs. this technique helps the model capture loss aversion and the certainty effect. the simulation technique sign implies high sensitivity to the payoff sign. it is identical to the technique unbiased, with one important exception: positive drawn values are replaced by r, and negative outcomes are replaced by -r, where r is the payoff range (the difference between the best and worst possible payoffs in the current problem). this model has six distribution parameters, σ, κ, β, γ, φ, and θ, where each of these parameters defines the upper bound of uniform distributions with a 0.0 lower bound (κ defined the upper bound of a discrete uniform distribution with a 0.0 lower bound). four of these parameters (σ, κ, β, and γ) are needed to capture decisions under risk without feedback. the parameter φ captures attitude toward ambiguity, and θ abstracts the reaction to feedback. in this model, the expectation for one of the options, option a, equals bev a(r) + st a(r) + e(r) and that for the other option, option b, equals bev b(r) + st b(r). here, bev a(r) and bev b(r) are the best estimates of the expected values of both options a and b after r samples; st a(r) and st b(r) are the expectations based on mental simulations techniques after r samples, and e(r) is an error term after r samples (e(r) is drawn from a normal distribution with a mean 0 and standard deviation σ). given a human participant’s choice, the expectations on different options are used to determine the log-likelihood in the model (using equation 1 and equation 2). in one of the beast versions, beast (cpc), we used the set of parameter values reported by erev, ert, plonsky, et. al. (2015) against individual choices. however, in another version, beast (tpt), we recalibrated the set of distribution parameter values to individual choices. all six parameters were varied between 0 to 20. these ranges ensured that the optimization could capture the optimal parameter values with high confidence. during recalibration, the initial population of parameters was taken from erev, ert, plonsky, et. al. (2015). method dependent variables in this paper, we account for final choices made by individual participants in different problems. for this purpose, given a choice made by a human participant in a problem, we calculate the log-likelihood of a model participant making the same choice in the same problem. in all models, if the probability of making a human participant’s choice is greater than 0.5, then it is assumed that the model choice coincides with the human choice. using this 0.5 rule, we compare whether both model and human participants select the maximizing option in a problem. the maximizing option is the one that has the highest expected value among both options (expected value is calculated by using the objective probability distribution of outcomes in options). if both human participants and model participants select the maximizing option or the non-maximizing option in a problem, then the model can explain the human participant’s choice. using this method, in the tpt’s estimation set, the final choices made by model observations are compared to 1,170 human observations, i.e., the total number of human observations available. the comparison between human choices and model choices is used to compute the incorrect proportion for each model, which is the main criteria for capturing individual behavior by a model. the incorrect proportion is simply a proportion of human choices that were different from model predictions. it is defined as: incorrect proportion = (mhnm + nhmm )/ (mhnm + nhmm + nhnm + mhmm ) (9) where, mhnm is the number of observations where the human participant makes a maximizing choice but the model predicts a non-maximizing choice. nhmm is the number of observations where the human participant makes a non-maximizing choice but the model predicts a maximizing choice. similarly, the mhmm and nhnm are the number of observations, where the human participant observation makes the same choice (maximizing or non-maximizing) as predicted by the model. the smaller the value of the incorrect proportion, the more accurate is the model in accounting for individual human choices. once model parameters were calibrated to individual choices using the log-likelihood function, the incorrect proportions were computed from different models and compared. 4according to birnbaum (2008) and tversky and kahneman (1986) splitting an attractive outcome into two distinct outcomes can increase the attractiveness of a prospect even when it reduces its expected value. this phenomenon is referred to as the splitting effect. 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 8 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices parameter calibration given the choice made by a human participant, we use equation 1 and equation 2 to compute the loglikelihood from a model of making the same choice as made by a human participant. classically, equation 1 has used an inverse temperature parameter β which scales the sample means (busemeyer & diederich, 2010). in this paper, we assume β = 1 across all models as we did not want to introduce an additional free parameter beyond those already present in models. that is because, the β parameter’s recalibration to individual choices could benefit models differently. as β = 1 across all models, the β parameter does not favor some models over others. the nmh model did not require parameter calibration as this model did not possess any parameters. the set of parameters of ensemble, beast, cpt, and ibl models were recalibrated using a genetic algorithm (ga) program. the ga is a probabilistic (stochastic) trial-and-error method of optimization that is different from other deterministic methods like steepest gradient descent. due to ga’s trial-and-error nature and its dependence on processes like reproduction, crossover, and mutation, the algorithm provides good chances of avoiding local optima in the parameter search space (jakobsen, 2010; gonzalez & dutt, 2011; houck, joines & kay, 1995). in addition, prior research involving models have used the ga procedure for model calibration (gonzalez & dutt, 2011; 2012; lejarraga, dutt, & gonzalez, 2012). in our model calibrations, the ga repeatedly modified a population of parameter tuples to find the tuple that minimized the negative of model’s log-likelihood function (equation 2) across all human participants. in each generation, the ga selected parameter tuples randomly from a population to become parents and used these parents to select children for the next generation. for each parameter tuple in a generation, each model was run five times across 1,170 participants to minimize the negative of model’s average log-likelihood function over five runs.5 over successive generations, the population evolved toward an optimal solution. the populationsize was set to 20 randomly-selected parameter tuples per generation (each tuple contained a certain value for each of the model’s parameters). the mutation and crossover fractions were both set at 0.5 after a grid search for the best combination. the best combination for mutation and crossover fractions was found by calibrating the ibl (ldg) model to aggregate choices using its known parameters (d = 5.0; σ = 1.5). we systematically varied the mutation and crossover fractions in steps of 0.1 in the interval [0, 1] for finding their best combination. the optimal values of mutation and crossover fraction (= 0.5) were those where the optimization converged the ibl (ldg) parameters to their optimal values in the least number of generations. these optimal values of mutation and crossover fraction found were used for calibration of model parameters to individual choices. the ga procedure was implemented in matlab r© toolbox (houck, joines & kay, 1995; mathworks, 2012), where the stopping criteria in optimization of model parameters involved the following constraints: stall generations = 200, function tolerance = 1x10−8, and when the average relative change in the fitness function value over 200 stall generations was less than function tolerance (1x10−8). results calibration in tpt’s estimation set table 1 shows parameter calibration results from different models in tpt’s estimation dataset. the table lists different models, calibrated parameter values, combinations obtained due to comparison of human and model final choices, log-likelihoods, and incorrect proportions. calibrated parameters the best model in terms of log-likelihood values was cpt (tpt). five parameters were calibrated in the cpt (tpt) model and the calibrated model possessed the log-likelihood of -634.7, which was significantly larger than that for the cpt (tk) model (-662.8) and cpt (hau) model (-643.9). the calibrated parameter values were: α = 1.008; β =0.96; γ =2.00; δ =0.92; λ =1.03. the free parameters for the value function indicated slightly less magnitude of disutility for losses compared to the utility for gains. the value function for the cpt (tpt) model was aligned with riskneutral behavior for both gains and losses, which was different from the behavior in the cpt (hau) model and in the cpt (tk) model. furthermore, the weighting function of the cpt (tpt) model showed underweighting of small probabilities for positive outcomes and about equal weighting of small probabilities for negative outcomes. furthermore, the weighting function for the cpt (hau) and cpt (tk) models overweight small probabilities for both positive and negative outcomes. please see appendix d for shapes of value and weighting functions for different cpt models. the ensemble (tpt) model was second best model where the model exhibited a log-likelihood of -691.0. the model’s calibrated parameters were α=0.75, β = 1.46, γ = 1.42, δ = 1.03, λ = 1.13, µ= 0.37, t0=0.001, porder1= 0.38, σ = 0.020, tp= 0.18 and porder2 = 0.62. the first six parameters from the model depicted underweighting of rare events and loss aversion with losses perceived as more damaging compared to gains. the later five parameters from priority heuristic showed smaller variance in distribution of σ parameter compared to the last round. also, results indicated underweighting of small probabilities, overweighting of 5the number of runs were set to five after analyzing the runto-run variability in models with stochasticity (e.g., ibl and beast). five runs were chosen as there was little change in the standard deviation by increasing the number of runs beyond five. 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 9 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices figure 1. the correct proportions against number of parameters from different models calibrated in the tpt’s estimation set. large probabilities, and diminishing sensitivity to gains and losses. the next best model was the ibl (tpt) model, where this model exhibited a considerably larger loglikelihood value of -929.0 compared to the ibl (ldg) model. the ibl (tpt) model’s calibrated parameters were: d = 5.39 and σ = 0.04. these parameters indicated reliance on recency of sampled information, which provides a plausible account of recency’s role in human participant’s sampling and subsequent choice. the recency reliance for individual choices is also in agreement with documented reliance on recency in aggregate choices (dutt & gonzalez, 2012; gonzalez & dutt, 2011; 2012; hertwig et al., 2004; lejarraga, dutt, & gonzalez, 2011). in fact, the d parameter value was higher for the model calibrated to individual choices compared to the model calibrated to the aggregate choices. furthermore, the participant-toparticipant variability (captured by the σ) was smaller in the ibl (tpt) compared to the ibl (ldg) model. this observation showed less variability among individual participants in their choices. for the beast (tpt) and nmh model, the loglikelihood values (-1129.0, -1386.5) were much smaller compared to those for the individual versions of cpt, ensemble, and ibl models. please see table 1 for log-likelihood values of different models. incorrect proportion in the calibration datasets, the cpt (tpt) model possessed the best incorrect proportion of 0.15. in the cpt (tpt) model, the desirable nhnm and mhmm combinations were 39% and 45%, respectively. in contrast, the erroneous nhmm and mhnm combinations were 10% and 6%, respectively. the cpt (hau) model showed an error proportion of 0.16. the model showed 39% of nhnm combinations and 45% of mhmm combinations. the erroneous combinations included, 9% for nhmm and 7% for mhnm. the incorrect proportion for the cpt (tk) model was 0.18. the proportions of desirable nhnm and mhmm combinations were 41% and 42%, respectively. in addition, the erroneous nhmm and mhnm combinations were 8% and 10%, respectively. the next best model was the ibl (tpt) model, where the model exhibited incorrect proportion of 0.21. the ibl (tpt) model showed 39% and 40% of desirable nhnm and mhmm combinations. the erroneous nhmm and mhnm combinations were 9% and 12%, respectively. beyond the ibl (tpt) model, the beast (tpt) model did well with an incorrect proportion of 0.24. the four combination proportions for beast (tpt) model were: 36% (nhnm), 41% (mhmm), 13% (nhmm) and 10% (mhnm). next, to gauge the benefit of explaining individual choices with different model parameters, we plotted correct proportions from calibrated models against their number of free parameters (see figure 1). models closer to the origin are the ones that explain individual choices with least number of free parameters. the magnitude distance of ibl (tpt) and cpt (tpt) models from origin (= 2 and 5 units, respectively) was much less than that for the beast (tpt) and ensemble (tpt) models (= 6 and 11 units, respectively). thus, based upon the distance metric, ibl and cpt models explained individual choices with fewer number of free parameters. thus, it seems that cognitive mechanisms like recency, frequency, and blending as well as mathematical functions that underweight rare outcomes and value gains and losses differently are appropriate to account for individual choices. 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 10 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices t ab le 1. c al ib ra ti on re su lt s fr om m od el s in t p t ’s es ti m at io n da ta se t. p er ce nt ag e of 11 70 o bs er va ti on s c om bi na ti on s fr om h um an e ns em bl e e ns em bl e n m h ib l ib l c p t c p t c p t b e a st b e a st an d m od el d at a h /m (h er zo g) (t p t ) (l d g ) (t p t ) (t k ) (h au ) (t p t ) (c p c ) (t p t ) α = 1. 19 , α = 0. 75 , β = 1. 35 , β = 1. 46 , γ = 1. 42 , γ = 1. 42 , σ = 7. 00 , σ = 0. 24 , δ = 1. 54 , δ = 1. 03 , α = 0. 88 , α = 0. 94 , α = 1. 00 8, κ = 3. 00 , κ = 1. 99 , λ = 1. 19 , λ = 1. 13 , β = 0. 88 , β = 0. 86 , β = 0. 96 , β = 2. 6, β = 0. 06 , µ = 0. 41 , µ = 0. 37 , d = 5. 00 , d = 5. 39 , γ = 0. 61 , γ = 0. 99 , γ = 2. 00 , γ = 0. 50 , γ = 1. 16 , t 0 = 0. 00 01 , t 0 = 0. 00 1, σ = 1. 50 σ = 0. 04 δ = 0. 69 , δ = 0. 93 , δ = 0. 92 , ϕ = 0. 07 , ϕ = 0. 03 , p o r d e r 1 = 0. 38 , p o r d e r 1 = 0. 38 , λ = 1. 00 λ = 1. 00 λ = 1. 03 θ = 1. 00 , θ = 1. 17 , σ = 0. 03 7, σ = 0. 02 , t p = 0. 11 , t p = 0. 18 , p o r d e r 2 = 0. 62 p o r d e r 2 = 0. 62 n h n 1 m 31 32 29 26 39 41 39 39 33 36 m h m m 40 40 33 32 40 42 45 45 37 41 n h m m 17 18 19 23 09 08 09 10 15 13 m h n m 12 11 19 20 12 10 07 06 15 10 in co rr ec t pr op or ti on 0. 29 0. 28 0. 37 0. 43 0. 21 0. 18 0. 16 0. 15 0. 31 0. 24 lo g li ke lih oo d -6 96 .2 -6 91 .0 -1 38 6. 5 -3 15 8. 0 -9 29 .0 -6 62 .8 -6 43 .9 -6 43 .7 -1 97 1. 0 -1 12 9. 0 n ot e. 1 n h an d m h re fe rs to no nm ax im iz in g an d m ax im iz in g hu m an ch oi ce s, re sp ec ti ve ly . n m an d m m re fe rs to no nm ax im iz in g an d m ax im iz in g m od el ch oi ce s, re sp ec ti ve ly . 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 11 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices generalization to different datasets up to now, different models predicted choices of individual participants in tpt’s estimation set using a single set of parameter values. these models, however, possess different number of free parameters. due to differences in model parameters it becomes difficult to compare model performance during parameter calibration. one method that allows us to compare models and account for parameter differences is generalization (busemeyer & diederich, 2010; busemeyer & wang, 2000; dutt & gonzalez, 2012). in generalization, models with calibrated parameters are run in new problems (busemeyer & wang, 2000). ideally, new problems encountered during generalization should be different from those encountered during calibration; otherwise, generalization may favor models that show superior performance during calibration. in what follows, we first generalize calibrated models to problems in tpt’s competition set. generalization of this kind was also followed for models submitted to tpt (erev et al., 2010). however, problems in tpt’s competition set were derived using the same algorithm as in tpt’s estimation set (erev et al., 2010). thus, it is likely that the nature of problems across competition and estimation sets were similar and that tpt’s competition set provided a weaker generalization dataset with respect to tpt’s estimation set. to overcome this limitation, we also generalized calibrated models to the six-choice (sc) problems dataset (hertwig et al., 2004), where the sc problems were different in structure and nature compared to the tpt problems. we will first report on the generalization results for the tpt’s competition set, then those for the sc dataset. generalization to competition set. tpt’s competition set was like the estimation set with two exceptions: problems in competition set were different from those in the estimation set and different subjects participated in the competition set compared to the estimation set (erev et al., 2010). the 60 problems in the competition set were selected using the same algorithm as used for the estimation set. to explain individual choices, all models were run in the competition set using the parameters obtained in the estimation set. table 2 shows generalization results from different models in the competition set. in all models, parameters were set to values reported in table 1. overall, the incorrect proportions obtained from models in the competition set were like those obtained in the estimation set. calibrated models performed better compared to their uncalibrated counterparts that borrowed parameter values for aggregate choices from literature. the incorrect proportion was the lowest for the cpt (tpt) model, where ibl (tpt) and ensemble (tpt) models took the second and third places, respectively. also, all three models performed significantly better than the beast and nmh models. these results highlight the role of the certain mechanisms in explaining individual choices: recency, frequency, and blending of encountered information during sampling, the underweighting of rare events, and the differential valuation of gains and losses. sequential analysis. to gauge models in accounting for individual differences, we evaluated the proportion of sequential decisions in models from last sample to final choice. here, human and model choices were analyzed sequentially. thus, we evaluated decisions made by human participants during their last sample and consequential choice and then compared these sequential decisions to those from models. table 3 presents the proportion of model participants and likelihoods showing a transition that was similar to or different from human participants in tpt’s competition dataset. based upon the last sample and consequential choice among human participants, the following four transition possibilities existed: n → n, n → m, m → n, and m → m, where the first letter (before the arrow) corresponds to the choice made by a human participant during her last sample and the second letter (after the arrow) corresponds to the final choice made by the same participant after sampling. for each last sample and final choice transition by a human participant, there are two transition possibilities for the model: first, like the human participant; and, second, different from the human participant. if the model is suggestive of individual choice, then the model should show a transition between last sample and final choice like human participants for more than 50 percent (i.e., a majority) of its participants. we evaluated sequential decisions in the top four models: cpt (tpt), ibl (tpt), ensemble (tpt) and nmh model. as shown in table 3, across all transitions, n → n, n → m, m → n, and m → m, the cpt (tpt) model performed better compared to all other models. thus, the cpt (tpt) model made stronger correct predictions for human transitions from last sample to final choice compared to the ensemble (tpt), nmh, and ibl models. the ibl (tpt) model performed superior to the ensemble (tpt) model on two kinds of transitions: nn and mn. overall, these results show that underweighting of experienced probabilities, lossaversion due to negative outcomes, and recency and frequency processes seem to account for sequential individual choices in data. six choice (sc) dataset. in the section above, we generalized models to tpt’s competition set. however, the problems in the competition set were similar to those in the estimation set as the problemgeneration algorithm remained the same between the two sets. due to this observation, the competition set provides a weaker generalization dataset. in order to overcome this limitation, we also generalized calibrated models to the six choice (sc) dataset (hertwig et al., 2004; appendix c), where the structure of options across problems in sc dataset was different from that in tpt’s estimation and competition sets. in the sc dataset, all six problems presented options that differed with respect to expected value; 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 12 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices table 2. generalization results from models in tpt’s competition dataset. percentage of 1200 observations combinations from human data and model h/m ensemble (herzog) ensemble (tpt) nmh ibl (ldg) ibl (tpt) cpt (tk) cpt (hau) cpt (tpt) beast (cpc) beast (tpt) nhnm 20 20 25 22 29 32 33 33 21 24 mhmm 46 46 39 40 43 53 49 50 36 39 nhmm 21 20 15 19 12 09 08 09 19 17 mhnm 14 13 21 20 17 09 10 07 24 20 incorrect proportion 0.34 0.33 0.36 0.39 0.28 0.17 0.18 0.16 0.42 0.37 note. 1 nh and mh refers to non-maximizing and maximizing human choices, respectively. nm and mm refers to non-maximizing and maximizing model choices, respectively. table 3. proportion of model participants following a transition that is similar to or different from human participants in the competition dataset. human transition (last sample → final choice) model transition (last sample → final choice) cpt (tpt) (%) ibl (tpt) (%) ensemble (tpt) (%) nmh (%) n→n n→n 79 73 54 62 n→m 21 27 46 38 n→m n→m 80 70 77 64 n→n 20 30 23 36 m→n m→n 77 67 51 62 m→m 23 34 49 38 m→m m→m 87 74 78 66 m→n 13 26 22 34 note. n and m refers to non-maximizing and maximizing choices, respectively. table 4. generalization results from models in the sc problems dataset. percentage of 150 observations combinations from human data and model h/m ensemble (herzog) ensemble (tpt) nmh ibl (ldg) ibl (tpt) cpt (tk) cpt (hau) cpt (tpt) beast (cpc) beast (tpt) nhnm 45 46 55 41 51 37 37 39 33 34 mhmm 20 19 26 23 32 25 31 27 31 31 nhmm 14 13 03 18 07 22 21 20 25 25 mhnm 21 22 15 19 09 17 11 10 10 11 incorrect proportion 0.35 0.35 0.19 0.37 0.16 0.39 0.34 0.33 0.36 0.35 note. 1 nh and mh refers to non-maximizing and maximizing human choices, respectively. nm and mm refers to non-maximizing and maximizing model choices, respectively. 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 13 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices table 5. proportion of model participants following a transition that is similar to or different from human participants in the sc problems dataset. human transition (last sample → final choice) model transition (last sample → final choice) nmh (%) ibl (tpt) (%) ensemble (tpt) (%) cpt tpt (%) n→n n→n 96 81 79 66 n→m 4 19 21 34 n→m n→m 54 75 36 64 n→n 46 25 64 36 m→n m→n 91 89 71 66 m→m 9 11 29 34 m→m m→m 71 74 59 68 m→n 29 26 41 32 note. n and m refers to non-maximizing and maximizing choices, respectively. four of them offered positive prospects and two offered negative prospects. all problems in the sc dataset were run in the sampling paradigm format: free sampling of options followed by a final choice for one of the options for real. during sampling, participants could sample options in whatever order they desired, and however often they wished. they were encouraged to sample until they felt confident enough to decide from which option to draw a real payoff. like the tpt dataset, each problem consisted of choosing between two options. however, unlike the tpt dataset, problems in the sc dataset could have both options risky: both options could independently contain high and low outcomes with predefined probability distributions. problems in sc dataset belonged to both positive and negative domains. in positive domain, the associated non-zero outcomes were positive; whereas, in the negative domain, the associated non-zero outcomes were negative. overall, the tpt dataset and sc dataset differed on the number of outcomes possible on options and the presence of the mixed domain in tpt and its absence in the sc problems. table 4 shows the generalization results from running different models in the sc dataset (model parameters were calibrated in the estimation set). as shown in table 4, the ibl (tpt) model was the best performing model with an incorrect proportion of 0.16. the nmh model was the second-best model with an incorrect proportion of 0.19. the cpt (tpt) model was the third-best model with an incorrect proportion of 0.33. other hierarchical models like ensemble and beast did not perform as well in the sc dataset and possessed higher incorrect proportions. furthermore, models with recalibrated parameters performed better compared to models with parameters for aggregate choices borrowed from literature. these results show that when a more challenging generalization is performed, models like ibl and cpt that are based upon activations and recency and frequency mechanisms as well as assumptions of underweighting of rare outcomes and different valuation of gains and losses perform better compared to other model that rely on heuristics rules and biased sampling techniques. sequential analyses. to evaluate models at explaining individual differences, we analyzed the top four models in the sc dataset. table 5 shows the transition from the last sample to final choice for human and model participants in the sc problems dataset. as seen in table 5, both ibl and nmh models were suggestive of human-like transitions for all four combinations based upon the 50% rule. the ibl (tpt) model performed better compared to the nmh model in nm and mm transitions and poorer compared to the nmh in nn and mn transitions. overall, these results show the role of recency and frequency processes during sampling in individual choices. discussion till recently, researchers had evaluated how aggregate or hierarchical models with a set of parameter values explained aggregate choices made from experience (dutt & gonzalez, 2012; gonzalez & dutt, 2011; 2012; lee, 2008; lejarraga, dutt, & gonzalez, 2012; rouder & lu, 2005). also, researcher had evaluated how models with a single set of parameter values calibrated to each participant explained individual choices (individual models; kudryavtsev & pavlodsky, 2012; frey, mata, & hertwig, 2015). however, little was known on how aggregate or hierarchical models with a set of single or distribution parameter values would perform when they are made to account for individual choices. in this paper, we contributed to this investigation by calibrating aggregate or hierarchical models with a set of single or distribution parameter values to individual choices across three different datasets. aggregate and hierarchicalmodels were calibrated in the technion prediction tournament (tpt)’s estimation set using the log-likelihood function and later generalized to tpt’s competition dataset (erev, ert, roth, et al., 2010) and the six-choice (sc) problems dataset (hertwig et al., 2004). we followed the traditional approach of model comparison via generalization as 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 14 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices proposed by busemeyer and wang (2000). overall, our results revealed that both aggregate and hierarchical models performed above chance (= 50%) when their parameters were calibrated to individual choices. even parameter values calibrated to aggregate choices (borrowed from literature) performed above chance in these models. the cpt model performed well overall in the calibration and generalization datasets from tpt. models such as ensemble and cpt possess rules like weighting and value functions that abstract the sampling process experienced by human participants. from our results, these constructs help such models in cases where the generalization environment is similar to the calibration environment (as in tpt); however, not when the generalization environment is different from the calibration environment (as in sc dataset). however, upon performing a generalization to the sc problems dataset, the ibl model, relying on recency, frequency, and blending mechanisms, showed superior performance compared to other models employing mathematical functions (ensemble or cpt) or biased sampling techniques (beast). also, the nmh model, which incorporates frequency and magnitude of experienced outcomes, also performed well to account for individual decisions. one likely reason for this observation is the presence of cognitive constructs like expectations, instances, activations, and blended values in the ibl model and the averaging mechanism in the nmh model. these mechanisms help these models account for individual experiences gained during sampling of options. for example, the ibl model is motivated from the act-r theory of cognition (anderson & libere, 1998). the ibl model’s reliance on recency and frequency of experiences during sampling (exhibited through activations and blended values) helps this model to make human-like choices. similarly, the natural means in the nmh model are computed based upon experienced outcomes during the sampling process. these natural means represent expectations of choosing different options and enables this model to account for individual choices. next, we found that the ibl model performed consistently well in both calibration and generalization datasets standing among the top two models even though it possessed only two parameters. one likely reason for this observation could be that the ibl model uses the blending mechanism, where for every option, the values of all the observed outcomes are weighted by their activation strengths. blending of experiences considers both the activation of outcomes in memory as well as their magnitude. perhaps, the ibl model’s blending mechanism makes the model blend outcomes correctly for both maximizing and non-maximizing choices. other factors affecting performance of ibl model are its two parameters d and σ. the calibrated value of d parameter was higher for individual choices compared to its calibrated value for aggregate choices (the latter being done by lejarraga, dutt, and gonzalez, 2012). the increased d value shows that individual choices rely excessively on recency of outcomes. furthermore, the σ parameter helped the ibl model account for sampleto-sample variability in instance activations. here, when the model parameters were calibrated to individual choices, the σ parameter’s value was much smaller and closer to its act-r default compared to when the same model was calibrated by lejarraga, dutt, and gonzalez (2012) to aggregate choices. the smaller value of σ parameter closer to its act-r default showcases lesser variation in outcome activations among individual choices. this research work builds upon literature in judgment and decision making in several ways. first, the beast and ensemble models were hierarchical, where these models possessed distribution parameters to account for individual choices. the parameters in these models assumed different values from distribution for different participants in the dataset. thus, these distribution parameters should have helped these models to account for individual choices due to parameter heterogeneity. however, in our results, the beast and ensemble models did not account for individual choices as well as those models (like ibl or cpt) that possessed single parameters. this finding likely shows that it is more important for a model to possess the right cognitive or mathematical mechanisms compared to possessing heterogeneity among its parameters for different participants. second, we performed generalizations to large datasets that were similar or dissimilar to the calibration dataset. an insight from this generalization exercise it that the true picture emerges when the generalization dataset is different in its structure from the calibration dataset. the sc dataset possessed problems where the problem structure was different from that of the tpt datasets (both options could be risky in sc dataset). thus, it is recommended that generalizations should be performed to datasets that possess structural differences from the calibration datasets. third, we used individual-level techniques like likelihoods and incorrect proportions, where these techniques enabled us to evaluate aggregate and hierarchical models at the individual participant level. in summary, the likelihood approach is powerful and it enables us to calibrate models at the individual level. however, beyond calibration, one needs to test models based upon dependent measures that account for model error at the individual level. this need is especially true for generalizations, where the calibration measures like likelihood cannot be used as parameters have already been fixed to their calibrated values. in this paper, our focus was on investigating how aggregate and hierarchical models with a set of single or distribution parameters performed when their parameters were calibrated to individual choices rather than aggregate choices. as part of our future research, we plan to also perform individual modeling: calibrate a set of model parameters to each individual decision such that we get a set of parameters for each participant in the dataset. this evaluation will enable us to test the tradeoffs between aggregate modeling, hierar10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 15 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices chical modeling, and individual modeling when these models are evaluated for explaining individual decisions (as in this paper). individual modeling may help us to account for individual differences well; however, these models also run the risk of overfitting individual decisions due to too many parameter values (one for each individual participant). to provide a robust comparison of this tradeoff, as part of our future research, we plan to generalize individual models across both similar and dissimilar datasets within the same paradigm or across datasets in different paradigms. furthermore, as part of our future research, we plan to extend our investigation to decision tasks where decision-makers make decisions across multiple options rather than make a binary choice. an example of this task is the iowa-gambling task (bechara, damasio, damasio, & anderson, 1994), where the problem consists of making a choice between four options. in this paper, we took problem environments that were static in terms of outcomes and probabilities. thus, outcomes and probabilities in a problem did not change during sampling. in future, it would be worthwhile to extend the evaluation of models in explain individual choices in dynamic environments, where outcomes and probabilities change during information search. some of these ideas form the immediate next steps that we would like to undertake as part of our research. conclusion this paper helped to bridge the gap in literature on how aggregate and hierarchical models with a set of parameter values (either single or distribution) would perform when they are made to account for individual choices. we contributed to this investigation by calibrating different models with a set of parameter values to individual choices across three different datasets. models with constructs that abstract the sampling process performed well when generalized to problems that were similar to the calibration problems. however, generalization to other problems that were structurally different from the calibration problems revealed that model mechanisms like differential valuing of gains and losses, recency, frequency, blending and, underweighting of rare outcomes were important to account for individual choices. also, models using distribution parameters with heuristic rules and biased techniques did not perform well in accounting for individual choices when these models were generalized to different problems. acknowledgements: this research was supported by indian institute of technology mandi and tata consultancy services research scholar program. declaration of conflicting interests: the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. handling editor: andreas fischer author contributions: the authors contributed equally to this work. supplementary material: supplementary material available online. copyright: this work is licensed under a creative commons attribution-noncommercial-noderivatives 4.0 international license. citation: sharma, n., & dutt, v. (2017). modeling decisions from experience: how models with a set of parameters for aggregate choices explain individual choices. journal of dynamic decision making, 3, 3. doi:10.11588/jddm.2017.1.37687 received: 27 april 2017 accepted: 10 september 2017 published: 06 october 2017 references akaike, h. (1974). a new look at the statistical model identification. ieee transactions on automatic control, 19(6), 716–723. doi:10.1109/tac.1974.1100705 anderson, j. r., & lebiere, c. (1998). the atomic components of thought. hillsdale, nj: erlbaum. barron, g., & erev, i. (2003). small feedback-based decisions and their limited correspondence to description-based decisions. journal of behavioral decision making, 16(3), 215–233. doi:10.1002/bdm.443 bechara, a., damasio, a. r., damasio, h., & anderson, s. w. (1994). insensitivity to future consequences following damage to human prefrontal cortex. cognition, 50(1–3), 7–15. doi:10.1016/0010-0277(94)90018-3 birnbaum, m. h. (2008). new paradoxes of risky decision making. psychological review, 115(2), 463. doi:10.1037/0033295x.115.2.463 bishop, c. m. (2006). pattern recognition and machine learning. new york, ny: springer. busemeyer, j. r., & diederich, a. (2010). cognitive modeling. thousand oaks, ca: sage. busemeyer j. r., myung, i. j. (1992). an adaptive approach to human decision making: learning theory, decision theory, and human performance. journal of experimental psychology: general, 121(2), 177–194. doi:10.1037/0096-3445.121.2.177 busemeyer, j. r., & stout, j. c. (2002). a contribution of cognitive decision models to clinical assessment: decomposing performance on the bechara gambling task. psychological assessment, 14(3), 253–262.doi:10.1037//1040-3590.14.3.253 busemeyer, j. r., & wang, y. (2000). model comparisons and model selections based on the generalization criterion methodology. journal of mathematical psychology, 44(1), 171–189. doi:10.1006/jmps.1999.1282 bush, r. r., & mosteller, f. (1955). stochastic models for learning. oxford, england: wiley & sons. 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 16 http://dx.doi.org/10.11588/jddm.2017.1.37687 http://dx.doi.org/10.1109/tac.1974.1100705 http://dx.doi.org/10.1002/bdm.443 http://dx.doi.org/10.1016/0010-0277(94)90018-3 http://dx.doi.org/10.1037/0033-295x.115.2.463 http://dx.doi.org/10.1037/0033-295x.115.2.463 http://dx.doi.org/10.1037/0096-3445.121.2.177 https://doi.org/10.1037//1040-3590.14.3.253 http://dx.doi.org/10.1006/jmps.1999.1282 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices brandstätter, e., gigerenzer, g., & hertwig, r. (2006). the priority heuristic: making choices without trade-offs. psychological review, 113(2), 409–432. doi:10.1037/0033-295x.113.2.409 daw, n. d., gershman, s. j., seymour, b., dayan, p., & dolan, r. j. (2011). model-based influences on humans’ choices and striatal prediction errors. neuron, 69(6), 1204–1215. doi:10.1016/j.neuron.2011.02.027 denrell, j. (2007). adaptive learning and risk taking. psychological review, 114(1), 177–187. doi:10.1037/0033-295x.114.1.177 dutt, v., & gonzalez, c. (2012). the role of inertia in modeling decisions from experience with instance-based learning. frontiers in psychology, 3(177). doi:10.3389/fpsyg.2012.001777 dutt, v. & gonzalez, c. (2015). accounting for outcome and process measures and the effects of model calibration. journal of dynamic decision making, 1(2),1–10. doi:10.11588/jddm.2015.1.17663 erev, i., & barron, g. (2005). on adaptation, maximization, and reinforcement learning among cognitive strategies. psychological review, 112(4), 912–31. doi:10.1037/0033-295x.112.4.912 erev, i., ert, e., plonsky, o., cohen, d., & cohen, o. (2015). from anomalies to forecasts: a choice prediction competition for decisions under risk and ambiguity. mimeo, 1–56. erev, i., ert, e., roth, a. e., haruvy, e., herzog, s. m., & hau, r. (2010). a choice prediction competition: choices from experience and from description. journal of behavioral decision making, 23(1), 15–47. doi:10.1002/bdm.683 erev, i., glozman, i., & hertwig, r. (2008). what impacts the impact of rare events. journal of risk and uncertainty, 36(2), 153–177. doi:10.1007/s11166-008-9035-z estes, w. k., & todd maddox, w. (2005). risks of drawing inferences about cognitive processes from model fits to individual versus average performance. psychonomic bulletin & review, 12(3), 403–408.doi:10.3758/bf03193784 fox, c. r., & tversky, a. (1998). a belief-based account of decision under uncertainty. management science, 44(7), 879– 895.doi:10.1287/mnsc.44.7.879 frey, r., mata, r., & hertwig, r. (2015). the role of cognitive abilities in decisions from experience: age differences emerge as a function of choice set size. cognition, 142, 60–80. doi:10.1016/j.cognition.2015.05.004 gallistel, c. r., fairhurst, s., & balsam, p. (2004). the learning curve: implications of a quantitative analysis. proceedings of the national academy of sciences of the united states of america, 101(36), 13124–13131. doi:10.1073/pnas.0404965101 gigerenzer, g., & goldstein, d. g. (1996). reasoning the fast and frugal way: models of bounded rationality. psychological review, 103(4), 650–669.doi:10.1037//0033-295x.103.4.650 gilboa, i., & schmeidler, d. (1989). maxmin expected utility with non-unique prior. journal of mathematical economics, 18(2), 141–153. doi:10.1016/0304-4068(89)90018-9 gonzalez, c., & dutt, v. (2011). instance-based learning: integrating sampling and repeated decisions from experience. psychological review, 118(4), 523–551. doi:10.1037/a0024558 gonzalez, c., & dutt, v. (2012).refuting data aggregation arguments and how the instance-based learning model stands criticism: a reply to hills and hertwig. psychological review, 119(4), 893–898.doi:10.1037/a0029445 hau, r., pleskac, t. j., kiefer, j., & hertwig, r. (2008). the description-experience gap in risky choice: the role of sample size and experienced probabilities. journal of behavioral decision making, 21(5), 493–518. doi:10.3758/s13423-015-0924-2 hertwig, r. (2012). the psychology and rationality of decisions from experience. synthese, 187(1), 269–292. doi:10.1007/s11229-011-0024-4 hertwig, r., barron, g., weber, e. u., & erev, i. (2004). decisions from experience and the effect of rare events in risky choice. psychological science, 15(8), 534–539. doi:10.1111/j.09567976.2004.00715.x hertwig, r., & erev, i. (2009). the description-experience gap in risky choice. trends in cognitive sciences, 13(12), 517–523. doi:10.1016/j.tics.2009.09.004 hertwig, r., & pleskac, t. j. (2010). decisions from experience: why small samples? cognition, 115(2), 225–237. doi:10.1016/j.cognition.2009.12.009 horrace, r. h., william, c., and jeffrey, m. p. (2009), variety: consumer choice and optimal diversity. food marketing policy, center research report, 115. houck, c. r., joines, j., & kay, m. g. (1995). a genetic algorithm for function optimization: a matlab implementation. north carolina state university, technical report ncsu-ie tr 95-09. jakobsen, t. (2010). genetic algorithms. retrieved from http://subsimple.com/genealgo.asp kahneman, d., & tversky, a. (1979). prospect theory: an analysis of decision under risk. econometrica, 47(2), 263– 291.doi:10.2307/1914185 kudryavtsev, a., & pavlodsky, j. (2012). description-based and experience-based decisions: individual analysis. judgment and decision making, 7(3), 316–331. lebiere, c. (1999). blending: an act-r mechanism for aggregate retrievals. paper presented at the 6th annual act-r workshop at george mason university. fairfax county, va. lee, m. d. (2008). three case studies in the bayesian analysis of cognitive models. psychonomic bulletin & review, 15(1), 1–15. doi:10.3758/pbr.15.1.1 lejarraga, t. & dutt, v. & gonzalez, c. (2012). instancebased learning: a general model of repeated binary choice. journal of behavioral decision making, 25(2),143–153. doi:10.1002/bdm.722 luce, r. d., & raiffa, h. (1957). games and decisions: introduction and critical surveys. new york: wiley. march, j. g. (1996). learning to be risk averse. psychological review, 103(2), 309–319. doi:10.1037/0033-295x.103.2.309 marchiori, d., di guida, s., & erev, i. (2015). noisy retrieval models of over-and under sensitivity to rare events. decision, 2(2), 82–106. doi:10.1037/dec0000023 mathworks (2012). matlab and statistics toolbox release 2012b [computer software]. natick, massachusetts, united states: the mathworks, inc. plonsky, o., teodorescu, k., & erev, i. (2015). reliance on small samples, the wavy recency effect, and similarity-based learning. psychological review, 122(4), 621– 647. doi:10.1037/a0039413 rieskamp, j. (2008). the probabilistic nature of preferential choice. journal of experimental psychology: learning, memory, and cognition, 34(6), 1446–1465. doi:10.1037/a0013646 rouder, j. n., & lu, j. (2005). an introduction to bayesian hierarchical models with an application in the theory of signal detection. psychonomic bulletin & review, 12(4), 573–604. doi:10.3758/bf03196750 shteingart, h., neiman, t., & loewenstein, y. (2013). the role of first impression in operant learning. journal of experimental psychology: general, 142(2), 476–488. doi:10.1037/a0029550 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 17 http://dx.doi.org/10.1037/0033-295x.113.2.409 http://dx.doi.org/10.1016/j.neuron.2011.02.027 http://dx.doi.org/10.1037/0033-295x.114.1.177 http://dx.doi.org/10.3389/fpsyg.2012.001777 http://dx.doi.org/10.11588/jddm.2015.1.17663 http://dx.doi.org/10.1037/0033-295x.112.4.912 http://dx.doi.org/10.1002/bdm.683 http://dx.doi.org/10.1007/s11166-008-9035-z https://doi.org/10.3758/bf03193784 https://doi.org/10.1287/mnsc.44.7.879 http://dx.doi.org/10.1016/j.cognition.2015.05.004 http://dx.doi.org/10.1073/pnas.0404965101 https://doi.org/10.1037//0033-295x.103.4.650 http://dx.doi.org/10.1016/0304-4068(89)90018-9 http://dx.doi.org/10.1037/a0024558 https://doi.org/10.1037/a0029445 http://dx.doi.org/10.3758/s13423-015-0924-2 http://dx.doi.org/10.1007/s11229-011-0024-4 http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.1111/j.0956-7976.2004.00715.x http://dx.doi.org/10.1016/j.tics.2009.09.004 http://dx.doi.org/10.1016/j.cognition.2009.12.009 http://subsimple.com/genealgo.asp https://doi.org/10.2307/1914185 https://doi.org/10.3758/pbr.15.1.1 http://dx.doi.org/10.1002/bdm.722 http://dx.doi.org/10.1037/0033-295x.103.2.309 http://dx.doi.org/10.1037/dec0000023 http://dx.doi.org/10.1037/a0039413 http://dx.doi.org/10.1037/a0013646 http://dx.doi.org/10.3758/bf03196750 http://dx.doi.org/10.1037/a0029550 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices stevens, l. (2016, june 8). survey shows rapid growth in online shopping. the wall street journal. retrieved from https://www.wsj.com/articles/survey-shows-rapid-growth -in-online-shopping-1465358582. sutton, r. s., & barto, a. g. (1998). reinforcement learning: an introduction (vol. 1, no. 1). cambridge: mit press. tversky, a., & kahneman, d. (1986). rational choice and the framing of decisions. journal of business, 59(s4), s251–s278. doi:10.1086/296365 tversky, a., & kahneman, d. (1992). advances in prospect theory: cumulative representation of uncertainty. journal of risk and uncertainty, 5(4), 297–323.doi:10.1007/bf00122574 tversky, a., & fox, c. r. (1995). weighing risk and uncertainty. psychological review, 102(2), 269–283. doi:10.1037/0033295x.102.2.269 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 18 https://www.wsj.com/articles/survey-shows-rapid-growth-in-online-shopping-1465358582 https://www.wsj.com/articles/survey-shows-rapid-growth-in-online-shopping-1465358582 https://doi.org/10.1086/296365 https://doi.org/10.1007/bf00122574 http://dx.doi.org/10.1037/0033-295x.102.2.269 http://dx.doi.org/10.1037/0033-295x.102.2.269 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices appendix appendix a: estimation set (tpt) problem set high p(high) low medium 1 est -0.3 0.96 -2.1 -0.3 2 est -0.9 0.95 -4.2 -1.0 3 est -6.3 0.3 -15.2 -12.2 4 est -10 0.2 -29.2 -25.6 5 est -1.7 0.9 -3.9 -1.9 6 est -6.3 0.99 -15.7 -6.4 7 est -5.6 0.7 -20.2 -11.7 8 est -0.7 0.1 -6.5 -6.0 9 est -5.7 0.95 -16.3 -6.1 10 est -1.5 0.92 -6.4 -1.8 11 est -1.2 0.02 -12.3 -12.1 12 est -5.4 0.94 -16.8 -6.4 13 est -2.0 0.05 -10.4 -9.4 14 est -8.8 0.6 -19.5 -15.5 15 est -8.9 0.08 -26.3 -25.4 16 est -7.1 0.07 -19.6 -18.7 17 est -9.7 0.1 -24.7 -23.8 18 est -4.0 0.2 -9.3 -8.1 19 est -6.5 0.9 -17.5 -8.4 20 est -4.3 0.6 -16.1 -4.5 21 est 2.0 0.1 -5.7 -4.6 22 est 9.6 0.91 -6.4 8.7 23 est 7.3 0.8 -3.6 5.6 24 est 9.2 0.05 -9.5 -7.5 25 est 7.4 0.02 -6.6 -6.4 26 est 6.4 0.05 -5.3 -4.9 27 est 1.6 0.93 -8.3 1.2 28 est 5.9 0.8 -0.8 4.6 29 est 7.9 0.92 -2.3 7.0 30 est 3.0 0.91 -7.7 1.4 31 est 6.7 0.95 -1.8 6.4 32 est 6.7 0.93 -5.0 5.6 33 est 7.3 0.96 -8.5 6.8 34 est 1.3 0.05 -4.3 -4.1 35 est 3.0 0.93 -7.2 2.2 36 est 5.0 0.08 -9.1 -7.9 37 est 2.1 0.8 -8.4 1.3 38 est 6.7 0.07 -6.2 -5.1 39 est 7.4 0.3 -8.2 -6.9 40 est 6.0 0.98 -1.3 5.9 41 est 18.8 0.8 7.6 15.5 42 est 17.9 0.92 7.2 17.1 43 est 22.9 0.06 9.6 9.2 44 est 10.0 0.96 1.7 9.9 45 est 2.8 0.8 1.0 2.2 46 est 17.1 0.1 6.9 8.0 47 est 24.3 0.04 9.7 10.6 48 est 18.2 0.98 6.9 18.1 49 est 13.4 0.5 3.8 9.9 50 est 5.8 0.04 2.7 2.8 51 est 13.1 0.94 3.8 12.8 52 est 3.5 0.09 0.1 0.5 53 est 25.7 0.1 8.1 11.5 54 est 16.5 0.01 6.9 7.0 55 est 11.4 0.97 1.9 11.0 56 est 26.5 0.94 8.3 25.2 57 est 11.5 0.6 3.7 7.9 58 est 20.8 0.99 8.9 20.7 59 est 10.1 0.3 4.2 6.0 60 est 8.0 0.92 0.8 7.7 appendix b: competition set (tpt) problem set high p(high) low medium 1 comp -8.7 0.06 -22.8 -21.4 2 comp -2.2 0.09 -9.6 -8.7 3 comp -2.0 0.1 -11.2 -9.5 4 comp -1.4 0.02 -9.1 -9.0 5 comp -0.9 0.07 -4.8 -4.7 6 comp -4.7 0.91 -18.1 -6.8 7 comp -9.7 0.06 -24.8 -24.2 8 comp -5.7 0.96 -20.6 -6.4 9 comp -5.6 0.1 -19.4 -18.1 10 comp -2.5 0.6 -5.5 -3.6 11 comp -5.8 0.97 -16.4 -6.6 12 comp -7.2 0.05 -16.1 -15.6 13 comp -1.8 0.93 -6.7 -2.0 14 comp -6.4 0.2 -22.4 -18.0 15 comp -3.3 0.97 -10.5 -3.2 16 comp -9.5 0.1 -24.5 -23.5 17 comp -2.2 0.92 -11.5 -3.4 18 comp -1.4 0.93 -4.7 -1.7 19 comp -8.6 0.1 -26.5 -26.3 20 comp -6.9 0.06 -20.5 -20.3 21 comp 1.8 0.6 -4.1 1.7 22 comp 9.0 0.97 -6.7 9.1 23 comp 5.5 0.06 -3.4 -2.6 24 comp 1.0 0.93 -7.1 0.6 25 comp 3.0 0.2 -1.3 -0.1 26 comp 8.9 0.1 -1.4 -0.9 27 comp 9.4 0.95 -6.3 8.5 28 comp 3.3 0.91 -3.5 2.7 29 comp 5.0 0.4 -6.9 -3.8 30 comp 2.1 0.06 -9.4 -8.4 31 comp 0.9 0.2 -5.0 -5.3 32 comp 9.9 0.05 -8.7 -7.6 33 comp 7.7 0.02 -3.1 -3.0 34 comp 2.5 0.96 -2.0 2.3 35 comp 9.2 0.91 -0.7 8.2 36 comp 2.9 0.98 -9.4 2.9 37 comp 2.9 0.05 -6.5 -5.7 38 comp 7.8 0.99 -9.3 7.6 39 comp 6.5 0.8 -4.8 6.2 40 comp 5.0 0.9 -3.8 4.1 41 comp 20.1 0.95 6.5 19.6 42 comp 5.2 0.5 1.4 5.1 43 comp 12.0 0.5 2.4 9.0 44 comp 20.7 0.9 9.1 19.8 45 comp 8.4 0.07 1.2 1.6 46 comp 22.6 0.4 7.2 12.4 47 comp 23.4 0.93 7.6 22.1 48 comp 17.2 0.09 5.0 5.9 49 comp 18.9 0.9 6.7 17.7 50 comp 12.8 0.04 4.7 4.9 51 comp 19.1 0.03 4.8 5.2 52 comp 12.3 0.91 1.3 12.1 53 comp 6.8 0.9 3.0 6.7 54 comp 22.6 0.3 9.2 11.0 55 comp 6.4 0.09 0.5 1.5 56 comp 15.3 0.06 5.9 7.1 57 comp 5.3 0.9 1.5 4.7 58 comp 21.9 0.5 8.1 12.6 59 comp 27.5 0.7 9.2 21.9 60 comp 4.4 0.2 0.7 1.1 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 19 http://dx.doi.org/10.11588/jddm.2017.1.37687 sharma & dutt: how aggregate and hierarchical models explain individual choices appendix c: sc problems set problem set high p(high) low medium 1 sc problems 4 0.8 0 3 2 sc problems 4 0.2 0 3 3 sc problems -3 1 0 -32 4 sc problems -3 1 0 -4 5 sc problems 32 0.1 0 3 6 sc problems 32 0.025 0 3 appendix d: cpt models’ value and weighting functions 10.11588/jddm.2017.1.37687 jddm | 2017 | volume 3 | article 3 | 20 http://dx.doi.org/10.11588/jddm.2017.1.37687